New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added dtypes to spaces. #512

Merged
merged 155 commits into from Mar 1, 2014

Conversation

Projects
None yet
5 participants
@SuperElectric
Contributor

SuperElectric commented Jan 18, 2014

Summary:

Spaces now (optionally) specify data type in addition to data geometry. The behavior of existing code is unaffected.

Motivation:

All Spaces used to force a cast to float whenever space.np_format_as or space.format_as was called. This was done because Theano didn't support other datatypes on the GPU. However, to those not expecting this feature, it can be a problem, particularly if the batch is not intended to be processed on the GPU. In addition, it seems likely that Theano will support non-float datatypes in the future, for which Pylearn2 should prepare early, before additional code makes doing so more difficult.

Methods affected:

  • format_as(batch, target_space)
    If target_space has a dtype, it will cast the formatted batch to that dtype.
  • validate(batch), np_validate(batch):
    If self has a dtype, this will throw an exception if batch's datatype cannot be trivially cast to self.dtype (e.g. if batch is complex and the space isn't).
  • get_origin_batch(batch_size, dtype), make_theano_batch(...), make_shared_batch(...)
    As before, if dtype is specified, this will return a batch of that dtype.
    If dtype is omitted (None), these will return a batch of self.dtype. If self.dtype is also None, this will throw an exception.

Other changes:

There was a lot of code duplication between format_as() and np_format_as(). The logic of what they do is the same, with only a few differences here and there to accommodate API differences between symbolic and numeric batches. Therefore, these functions now only check whether the batch is symbolic or numeric, before passing it on to their shared implementation method _format_as(). This method only branches on batch type for those few lines that need to, thus eliminating code duplication. Subclasses of Space now only need to override _format_as_impl().

For the same reason, batch_size()/np_batch_size() and validate()/np_valiate() now wrap implementation methods _batch_size() and _validate(). Subclasses implement _batch_size_impl() and _validate_impl().

For CompositeSpaces, the dtype can either be simple or composite. A composite dtype is a nested tuple of dtypes. A simple dtype is interpreted as a nested tuple of dtypes where all the leaves are the same.

As in Theano, 'floatX' is a legal dtype string. It is internally converted to theano.config.floatX.

Backwards compatibility:

All existing code should behave as before. This is because all spaces have theano.config.floatX as their default dtype. Unless they are constructed with a different dtype argument, they should cast all batches to floatX as before.

SuperElectric added some commits Dec 16, 2013

Committing before maybe changing *format_as() methods to support conv…
…erting to/from sparse and non-sparse. This will also require changes to how those functions call np_validate() to sanity-check the input batch.
commit before removing format_as / np_format_as dichotomy. If people …
…object to the API change, we can always trivially add it back in. But for now, the two functions are often implemented by duplicating long stretches of code. Better to have a single function with a single chain of logic, that internally switches on the type of the batch (theano symbol, numpy ndarray, or scipy.sparse array) selectively, only on lines that necessitate making such distinctions.
(almost) merged with master. Will be completely merged when I clean u…
…p format_as/np_format_as, validate/np_validate. Doing that next.

SuperElectric added some commits Jan 19, 2014

added extra 'is_numeric' argument to _is_validate() _format_as() to d…
…isambiguate whether the empty tuple '()' is to be treated as a numeric or symbolic variable
Committing before maybe changing *format_as() methods to support conv…
…erting to/from sparse and non-sparse. This will also require changes to how those functions call np_validate() to sanity-check the input batch.
fixed bug where space1._format_as(batch, space2) wouldn't cast to spa…
…ce2.dtype if space1 == space2. batch.dtype is allowed to differ from S.dtype, so this was a bug. Also removed code that caused FixedDatasetIterator to cast data to theano.config.floatX (that's now the job of Spaces).
Minor fixes suggested by vdmoulin's first review pass. Also added the…
… files generated by utils/setup.py to the .gitignore list
update_callbacks=None,
set_batch_size = False)
algorithm = SGD(learning_rate, cost,
batch_size=2, # TODO: fix test failure when batch_size=1

This comment has been minimized.

@SuperElectric

SuperElectric Mar 1, 2014

Contributor

When batch_size is set to 1 (it's original value), the test fails. An assert downstream expects the batch to be a Theano matrix, but setting batch_size=1 here causes the batch to be a Theano Vector. I will look into this tomorrow, but any suggestions as to what might be happening or where I should look would be very much appreciated.

This comment has been minimized.

@bartvm

bartvm Mar 1, 2014

Contributor

Have a look at https://groups.google.com/forum/#!topic/pylearn-dev/tsLzb7LMRp8 and at PRs #632 and #635, which I think fixed this a few days ago.

This comment has been minimized.

@SuperElectric

SuperElectric Mar 1, 2014

Contributor

Fantastic; that was exactly the problem. I had unwittingly undone that bugfix during my rebase merges. Thanks for the tip; it would've taken me a while to find that on my own.

This comment has been minimized.

@SuperElectric

SuperElectric Mar 1, 2014

Contributor

Looks like Travis may have stalled out. Could someone please restart it?

Thanks,

  • Matt

On Saturday, March 1, 2014, Bart
<notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

In pylearn2/training_algorithms/tests/test_sgd.py:

@@ -1193,11 +1201,13 @@ def test_batch_size_specialization():

 cost = DummyCost()
  • algorithm = SGD(learning_rate, cost, batch_size=1,
  •             monitoring_batches=1, monitoring_dataset=dataset,
    
  •             termination_criterion=EpochCounter(max_epochs=1),
    
  •             update_callbacks=None,
    
  •             set_batch_size = False)
    
  • algorithm = SGD(learning_rate, cost,
  •                batch_size=2,  # TODO: fix test failure when batch_size=1
    

Have a look at
https://groups.google.com/forum/#!topic/pylearn-dev/tsLzb7LMRp8 and at
PRs #632 #632 and #635#635,
which I think fixed this a few days ago.


Reply to this email directly or view it on GitHubhttps://github.com//pull/512/files#r10190707
.

This comment has been minimized.

@bartvm

bartvm Mar 1, 2014

Contributor

Travis seems okay to me? It passes right now: https://travis-ci.org/lisa-lab/pylearn2/builds/19864504

@vdumoulin

This comment has been minimized.

Member

vdumoulin commented Mar 1, 2014

Things look good and tests are passing, I'm merging. Thanks a lot!

vdumoulin added a commit that referenced this pull request Mar 1, 2014

@vdumoulin vdumoulin merged commit 9f35091 into lisa-lab:master Mar 1, 2014

1 check passed

default The Travis CI build passed
Details
@SuperElectric

This comment has been minimized.

Contributor

SuperElectric commented Mar 1, 2014

Awesome!

On Saturday, March 1, 2014, vdumoulin
<notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Merged #512 #512.


Reply to this email directly or view it on GitHubhttps://github.com//pull/512
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment