Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type checking data through axis semantics #13

Open
bartvm opened this issue Feb 23, 2015 · 0 comments
Open

Type checking data through axis semantics #13

bartvm opened this issue Feb 23, 2015 · 0 comments

Comments

@bartvm
Copy link
Member

bartvm commented Feb 23, 2015

Following offline discussion with @lamblin and @vdumoulin, we agreed that the most important kind of type checking to perform in the data processing pipeline is probably the semantics of the axes of the data. Data streams should provide information about what each axis of the input and output represents e.g.

  • An image: (channel, height, width)
  • A batch of images: `(batch, channel, height, width)
  • A sentence (sequence of indices): (features) (maybe a labels role?) or before going into Blocks: (time, batch, features)
  • A set of n-grams from a sentence: (batch, features)

The behaviour of data streams regarding these labels should be configurable, so they can either ignore, warn or raise errors if the data input is not what they expected.

Some things that need to be thought about:

  • Do we just use strings, or do use singletons (allowing us to create a class hierarchy)?
  • Do we want to add dimensionality e.g. each axis has a dimensionality (or can be variable)? This could be useful to check that e.g. an image has exactly 3 colour channels.
    • Longer term, this would also allow for the kind of checking that Pylearn2 performs (e.g. make sure that the data dimension is the same as the input layer/brick).

This is closely related to mila-iqia/blocks#30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants