chunksize is ignored in TICA #846

fabian-paul · 2016-06-29T12:30:11Z

Apparently setting the chunk size parameter doesn't have any effect in TICA anymore.
I have a long (microsecond) trajectory that I want to use to evaluate a larger set of features. Setting the chunk_size doesn't have any effect on the feature reader as can be seen from the output below.

import pyemma

ft = pyemma.coordinates.featurizer('PMI-amber.pdb')
ft.add_backbone_torsions()
source = pyemma.coordinates.source(['28.xtc'], features=ft, chunk_size=10)
tica = pyemma.coordinates.tica(source, lag=5000)

I added the following print statement here:

class FeatureReaderIterator(DataSourceIterator):
    ...
    def _next_chunk(self):
        ...
        chunk = next(self._mditer)
        shape = chunk.xyz.shape
        print 'chunk shape', chunk.xyz.shape

This is the output I get.

chunk shape (100000, 195, 3)
chunk shape (95000, 195, 3)

I'm not sure how this should be fixed. As a quick fix, I changed https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/coordinates/transform/tica.py#L264
to

it = iterable.iterator(lag=self.lag, return_trajindex=False, chunk=iterable.chunksize)

However this works only, if I instantiate the TICA object directly. When I use the tica() API function, the chunksize is set back to zero.

IMHO this looks very broken to me. There are some lines such as https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/coordinates/api.py#L837 that don't make any sense to me. _param_stage only gets called with its chunk_size parameter left at its default value = 0, so this can only work if the chunk size parameters of the source is also zero.

What's the recommanded way to pass the chunk_size parameter between different transformers? If an iterator is created in _estimate() where should the chunk_size parameter be taken from?

The text was updated successfully, but these errors were encountered:

fabian-paul · 2016-06-29T12:38:35Z

With .get_output() it's the same problem. The ckunksize parameter of the transformer and it's source are not respected.

marscher · 2016-06-29T16:12:10Z

the self.chunk_size property is delegated through the pipeline, so it should be taken from there. Your approach is the correct fix, but do not take the property from the iterable but self instead.

I can not see why the chunksize in Iterable.get_output should not be respected, since it is passed correctly:
https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/coordinates/data/_base/iterable.py#L129

marscher · 2016-06-29T16:25:12Z

And you are right, the handling in param_stage is wrong, since the set chunksize of the reader/datasource is overridden with 0.

Only set it in case the user has overriden it, otherwise take default chunksize from Iterable. Fixes markovmodel#846

marscher · 2016-06-29T16:52:11Z

@fabian-paul can you please confirm that the behaviour correct with the suggested PR?

fabian-paul · 2016-06-30T12:41:48Z

@marscher yes, this solved the problems.

marscher · 2016-06-30T12:46:01Z

Thank you for reporting and testing!

marscher · 2016-06-30T13:35:33Z

@clonker with these changes the test_fragmented_reader_xtc fails with wrong coordinates. Do you have any clue what could be broken now?

marscher · 2016-06-30T13:41:06Z

I'd guess that the chunked mode is somehow broken, since the tested behaviour used always chunksize=0

marscher · 2016-06-30T14:53:33Z

I've found the culprit: in patches.iterload there was no proper setter for chunksize...

cc @clonker

marscher added a commit to marscher/PyEMMA that referenced this issue Jun 29, 2016

[coordinates/api] set chunksize in param_stage for input stage

a415c29

Only set it in case the user has overriden it, otherwise take default chunksize from Iterable. Fixes markovmodel#846

marscher closed this as completed in 582e923 Jun 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunksize is ignored in TICA #846

chunksize is ignored in TICA #846

fabian-paul commented Jun 29, 2016

fabian-paul commented Jun 29, 2016

marscher commented Jun 29, 2016

marscher commented Jun 29, 2016

marscher commented Jun 29, 2016

fabian-paul commented Jun 30, 2016

marscher commented Jun 30, 2016 via email

marscher commented Jun 30, 2016

marscher commented Jun 30, 2016

marscher commented Jun 30, 2016

chunksize is ignored in TICA #846

chunksize is ignored in TICA #846

Comments

fabian-paul commented Jun 29, 2016

fabian-paul commented Jun 29, 2016

marscher commented Jun 29, 2016

marscher commented Jun 29, 2016

marscher commented Jun 29, 2016

fabian-paul commented Jun 30, 2016

marscher commented Jun 30, 2016 via email

marscher commented Jun 30, 2016

marscher commented Jun 30, 2016

marscher commented Jun 30, 2016