-
Notifications
You must be signed in to change notification settings - Fork 119
[coor/save_traj]: building of an index file? #788
Comments
We have to find a suitable solution soon.
In the meantime try reverting this change on your local branch:
git revert 9363956
|
I agree. We should build the index lazily (i.e. just remember what you Is this in a release or just devel? Am 28/04/16 um 18:34 schrieb Guillermo Pérez-Hernández:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Actually, I can prepare a PR that gives to option to choose between frames_from_files and frames_from_file here: PyEMMA/pyemma/coordinates/api.py Line 643 in b5d9d6a |
That would cure it. Some optarg of save_traj along the lines of "no_cache=False". |
I think defaults should behave well. Having bad behavior that you can Am 28/04/16 um 18:47 schrieb Guillermo Pérez-Hernández:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Agree, but I the thing is we don't know a priory what's "good" or "bad". In my case is disastrous, but in other it may not. I don't have a strong opinion here...it's really hard to know a priori, i think |
I think it's clear. We are working with mass data, so no proactive Am 28/04/16 um 18:54 schrieb Guillermo Pérez-Hernández:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Got it! What about sth like this?
|
This issue really has nothing to do with this particular function. It is Am 28/04/16 um 18:57 schrieb Guillermo Pérez-Hernández:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Well, then that is much bigger issue, I guess you have to ping @clonker and talk design. I wrote the original save_traj to be able to work with lists of files in case the user has not built (or doesn't want to build) readers...but that got changed along the way |
@marscher, I am not getting the revert to work...am I missing something?
|
plus, my proposed solution (#788 (comment)) would not be of any use, actually... |
What if we only index the files from which actually frames are going to be extracted?
So we just filter the given indices to kick out files, which are not desired.
This would then only trigger a pre-indexing of the desired files, not the whole list.
|
I suppose that makes sense. For xtc you'd anyway have to index the file Can you try it out and @gph82 checks how this behaves on his data? Am 29/04/16 um 13:26 schrieb Martin K. Scherer:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Yeah I'll try it out. Still think that it might be a bit overkill. I don't think there's a I can think of scenarios where one would still choose not to build an On 04/29/2016 01:29 PM, Frank Noe wrote:
Dr. Guillermo Pérez-Hernández http://userpage.fu-berlin.de/gph82/ |
If files have been indexed, there is no reason not to use the Am 29/04/16 um 13:41 schrieb Guillermo Pérez-Hernández:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
What's the situation with this issue? |
I'm resolving the last problem on my bug fixing branch.
|
ok! Am 09/05/16 um 11:51 schrieb Martin K. Scherer:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
Still some hassle to solve with the FragmentedReader. It seems it collects the frames in wrong order. I've implemented a special case for the FeatureReader and also for FragmentedReader to return mdtraj.Trajectory objects (to have unit cell information in place). I think @clonker and me can sort this out tomorrow. |
ok, thanks Am 12/05/16 um 18:57 schrieb Martin K. Scherer:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
If a reader has to be constructed within coordinates.save_traj(s), we only pass the file names actually needed to extract frames from to the reader. This fixes #788 (index unneeded files). Involved changes: * [featurereader] added option to return plain mdtraj.Trajectory objects * [fragmentedreader] added reader_by_filename property * [coor/api] handle chunk_size argument correctly (0 evaluates to False). * default chunksize 1000 * [patches] added setitem method for Trajectory objs * [reader_utils] def chunksize=1000, added setitem method for empty_traj created objs. * [fragreader] use setitem method for traj objects collected by underlying featurereaders * [datasource] handle too large RA indices for stride * added create_traj in test.util.
If a reader has to be constructed within coordinates.save_traj(s), we only pass the file names actually needed to extract frames from to the reader. This fixes markovmodel#788 (index unneeded files). Involved changes: * [featurereader] added option to return plain mdtraj.Trajectory objects * [fragmentedreader] added reader_by_filename property * [coor/api] handle chunk_size argument correctly (0 evaluates to False). * default chunksize 1000 * [patches] added setitem method for Trajectory objs * [reader_utils] def chunksize=1000, added setitem method for empty_traj created objs. * [fragreader] use setitem method for traj objects collected by underlying featurereaders * [datasource] handle too large RA indices for stride * added create_traj in test.util.
Hi, I don't know where this was introduced, but this was unknown to me:
save_traj relies upon indexing xtcs before carrying out its frame-of-file extraction.
If the index has not been constructed before, this is definitely NOT the moment to do it, imho. Notice that I have to wait 20h+ to extract one single frame.
I think that at least the possibility should be offered to skip reader construction (
PyEMMA/pyemma/coordinates/data/util/frames_from_file.py
Line 35 in b5d9d6a
This is a REAL blocker for me now...
The text was updated successfully, but these errors were encountered: