Added dtypes to spaces.#512
Conversation
…d utils/__init__.py are now pep8-clean.
…ylint error in utils/__init__.py
…erting to/from sparse and non-sparse. This will also require changes to how those functions call np_validate() to sanity-check the input batch.
…able, composite_space).
…object to the API change, we can always trivially add it back in. But for now, the two functions are often implemented by duplicating long stretches of code. Better to have a single function with a single chain of logic, that internally switches on the type of the batch (theano symbol, numpy ndarray, or scipy.sparse array) selectively, only on lines that necessitate making such distinctions.
…ng them into exceptions, as described here: http://docs.python.org/2/library/warnings.html#testing-warnings
…p format_as/np_format_as, validate/np_validate. Doing that next.
…isambiguate whether the empty tuple '()' is to be treated as a numeric or symbolic variable
…ylint error in utils/__init__.py
…erting to/from sparse and non-sparse. This will also require changes to how those functions call np_validate() to sanity-check the input batch.
…ce2.dtype if space1 == space2. batch.dtype is allowed to differ from S.dtype, so this was a bug. Also removed code that caused FixedDatasetIterator to cast data to theano.config.floatX (that's now the job of Spaces).
…ger forces a cast to floatX. Not done yet.
…fix in FiniteDatasetIterator
… files generated by utils/setup.py to the .gitignore list
…t merge into Master until fixed.
There was a problem hiding this comment.
When batch_size is set to 1 (it's original value), the test fails. An assert downstream expects the batch to be a Theano matrix, but setting batch_size=1 here causes the batch to be a Theano Vector. I will look into this tomorrow, but any suggestions as to what might be happening or where I should look would be very much appreciated.
There was a problem hiding this comment.
Have a look at https://groups.google.com/forum/#!topic/pylearn-dev/tsLzb7LMRp8 and at PRs #632 and #635, which I think fixed this a few days ago.
There was a problem hiding this comment.
Fantastic; that was exactly the problem. I had unwittingly undone that bugfix during my rebase merges. Thanks for the tip; it would've taken me a while to find that on my own.
There was a problem hiding this comment.
Looks like Travis may have stalled out. Could someone please restart it?
Thanks,
- Matt
On Saturday, March 1, 2014, Bart
<notifications@github.comjavascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:
In pylearn2/training_algorithms/tests/test_sgd.py:
@@ -1193,11 +1201,13 @@ def test_batch_size_specialization():
cost = DummyCost()
- algorithm = SGD(learning_rate, cost, batch_size=1,
monitoring_batches=1, monitoring_dataset=dataset,termination_criterion=EpochCounter(max_epochs=1),update_callbacks=None,set_batch_size = False)- algorithm = SGD(learning_rate, cost,
batch_size=2, # TODO: fix test failure when batch_size=1Have a look at
https://groups.google.com/forum/#!topic/pylearn-dev/tsLzb7LMRp8 and at
PRs #632 #632 and #635#635,
which I think fixed this a few days ago.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/512/files#r10190707
.
There was a problem hiding this comment.
Travis seems okay to me? It passes right now: https://travis-ci.org/lisa-lab/pylearn2/builds/19864504
|
Things look good and tests are passing, I'm merging. Thanks a lot! |
|
Awesome! On Saturday, March 1, 2014, vdumoulin
|
Summary:
Spaces now (optionally) specify data type in addition to data geometry. The behavior of existing code is unaffected.
Motivation:
All Spaces used to force a cast to float whenever
space.np_format_asorspace.format_aswas called. This was done because Theano didn't support other datatypes on the GPU. However, to those not expecting this feature, it can be a problem, particularly if the batch is not intended to be processed on the GPU. In addition, it seems likely that Theano will support non-float datatypes in the future, for which Pylearn2 should prepare early, before additional code makes doing so more difficult.Methods affected:
format_as(batch, target_space)If target_space has a dtype, it will cast the formatted batch to that dtype.
validate(batch), np_validate(batch):If self has a dtype, this will throw an exception if batch's datatype cannot be trivially cast to self.dtype (e.g. if batch is complex and the space isn't).
get_origin_batch(batch_size, dtype),make_theano_batch(...),make_shared_batch(...)As before, if dtype is specified, this will return a batch of that dtype.
If dtype is omitted (None), these will return a batch of self.dtype. If self.dtype is also None, this will throw an exception.
Other changes:
There was a lot of code duplication between
format_as()andnp_format_as(). The logic of what they do is the same, with only a few differences here and there to accommodate API differences between symbolic and numeric batches. Therefore, these functions now only check whether the batch is symbolic or numeric, before passing it on to their shared implementation method_format_as(). This method only branches on batch type for those few lines that need to, thus eliminating code duplication. Subclasses of Space now only need to override_format_as_impl().For the same reason,
batch_size()/np_batch_size()andvalidate()/np_valiate()now wrap implementation methods_batch_size()and_validate(). Subclasses implement_batch_size_impl()and_validate_impl().For CompositeSpaces, the dtype can either be simple or composite. A composite dtype is a nested tuple of dtypes. A simple dtype is interpreted as a nested tuple of dtypes where all the leaves are the same.
As in Theano, '
floatX' is a legal dtype string. It is internally converted totheano.config.floatX.Backwards compatibility:
All existing code should behave as before. This is because all spaces have
theano.config.floatXas their default dtype. Unless they are constructed with a different dtype argument, they should cast all batches to floatX as before.