Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maths! Tensors! Sciences! #44

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Maths! Tensors! Sciences! #44

wants to merge 11 commits into from

Conversation

lusbenjamin
Copy link
Owner

@lusbenjamin lusbenjamin commented Sep 21, 2017

  • Adds all the maths and sciences that the cool kids use to do Deep Learning these days.
  • Adds tensor utilities in anticipation of deep learning:
    • pymoji.tensors.one_hot for evaluating Softmax outputs
    • pymoji.tensors.random_mini_batches for batch gradient descent
  • Utilities to read/write large datasets: pymoji.save_dataset and pymoji.load_dataset
  • Helper functions and CLI command to harvest all of the heads from a directory of previous runs!

Head Hunting!

  • swallows errors file-by-file
  • processes all "prior runs" in a directory
  • for now, a prior run is an image file that has a corresponding JSON metadata file

image

@@ -170,6 +171,36 @@ def load_json(json_stream):
return result.data


def json_to_object(name, json_node):
"""Quick-and-dirty recursive conversion from JSON dictionary data to
an object built out of namedtuples.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mebbe about a 5kyu on codewars? I had to write this to be able to re-use the logic in pymoji.emoji. It all relies on face annotation objects with attributes, e.g. face.bounding_box, whereas I was kinda annoyed to learn there was no way to get Marshmallow to give us back an object instead of a dictionary. There may be a better way to do this?

labels = []

def load_heads(input_path):
"""Helper for iteratively loading input feature data."""
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melodylu this relates to your earlier tussle with nonlocal. This helper also "gets away with it" because it mutates features and labels instead of setting them.


for face in faces:
# compute label Y
code = get_emoji_code(None, face, use_big_guns=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update use_big_guns plz 🔫

Copy link
Collaborator

@melodylu melodylu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lusbenjamin
Thanks for the explanation of all the Tensors! Sorry I can't give more helpful ML feedback.

@dnewburger ?

Copy link
Collaborator

@dnewburger dnewburger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! Can't wait to see how your training goes!

One note is that TF has great tools for image manipulation, and these tools help prevent mistakes when converting from images to tensor representations. I remember having a couple bugs when feeding tensors into my models because I was doing the normalization and flattening myself when creating the datasets, then using TF image tools when training the models. I recommend when you start working on the model, either keep using PIL, or convert all the image processing methods to TF.

# pylint: disable=invalid-name


def head_to_ndarray(input_stream, size=HEAD_SIZE):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow has some nice image encoding, decoding, and manipulation methods that could save you some headache. I found manipulating the images myself led to unexpected bugs, and, when it comes to adding more advanced training techniques like changing the image orientation or saturation, you'll probably want to use the tf libraries anyway.

https://www.tensorflow.org/api_guides/python/image

return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig


def random_mini_batches(X, Y, mini_batch_size=MINI_BATCH_SIZE):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to migrate to TensorFlow's queuing system later on, which is one of the worst and best parts of TensorFlow (at least when I was using it). Best because of the convenience, worst because the way it works under the hood is kind of inscrutable.

https://www.tensorflow.org/programmers_guide/threading_and_queues

Advantages include an easy framework for training in parallel and convenience methods for shuffling, combining, and weighting different datasets.

@lusbenjamin
Copy link
Owner Author

Going to leave this around as a reference for a few days, but most likely going to ultimately close this in favor of using functionality built in to the TensorFlow libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants