Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

[coordinates] handle default_chunksize gracefully. #1251

Merged
merged 15 commits into from
Feb 16, 2018

Conversation

marscher
Copy link
Member

PCA, TICA and VAMP have dynamic dimension methods, so we need to handle
this when accessing the method to compute the desired number of time
steps for the chunk size.

The chunksize argument in the api is also passed when no data is passed.

For the FeatureReader we need to cap the passed chunksize, because we are using low-level reading routines of mdtraj.

@euhruska I'd be happy if you can run a test with your massive data set. I have tested it locally on some huge data set, but the more tests we run, the more problems we can possibly detect. Thank you in advance!

pip install git+https://github.com/marscher/pyemma@chunksize_fixes

Fixes #1250

PCA, TICA and VAMP have dynamic dimension methods, so we need to handle
this when accessing the method to compute the desired number of time
steps for the chunk size.

The chunksize argument in the api is also passed when no data is passed.
When we pass a too large chunksize to read_as_traj function of mdtraj,
this would overflow memory. So we truncate it.
@codecov
Copy link

codecov bot commented Feb 16, 2018

Codecov Report

Merging #1251 into devel will increase coverage by <.01%.
The diff coverage is 96.26%.

Impacted file tree graph

@@            Coverage Diff            @@
##            devel   #1251      +/-   ##
=========================================
+ Coverage   91.19%   91.2%   +<.01%     
=========================================
  Files         219     219              
  Lines       23646   23709      +63     
=========================================
+ Hits        21565   21624      +59     
- Misses       2081    2085       +4
Impacted Files Coverage Δ
pyemma/coordinates/transform/vamp.py 91.15% <ø> (+0.76%) ⬆️
pyemma/coordinates/tests/test_tica.py 99.32% <100%> (ø) ⬆️
...mma/coordinates/tests/test_random_access_stride.py 96.97% <100%> (+0.03%) ⬆️
pyemma/_base/progress/reporter/__init__.py 88.46% <100%> (+0.18%) ⬆️
pyemma/coordinates/data/sources_merger.py 95.08% <100%> (+0.08%) ⬆️
pyemma/coordinates/data/feature_reader.py 95.04% <100%> (-1.27%) ⬇️
pyemma/coordinates/tests/test_vamp.py 98.34% <100%> (+0.02%) ⬆️
pyemma/coordinates/data/_base/datasource.py 91.91% <100%> (+0.33%) ⬆️
pyemma/coordinates/api.py 87.91% <100%> (+0.41%) ⬆️
...mma/coordinates/tests/test_coordinates_iterator.py 99.08% <100%> (+0.03%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38fbb6e...d1f4b31. Read the comment docs.

@marscher marscher merged commit 78dae90 into markovmodel:devel Feb 16, 2018
@marscher marscher deleted the chunksize_fixes branch February 16, 2018 14:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chunk was smaller than time-lagged chunk
2 participants