Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reference by UUIDs #443

Closed
wants to merge 51 commits into from
Closed

Conversation

jhprinz
Copy link
Contributor

@jhprinz jhprinz commented Mar 8, 2016

This implements the choice to use UUIDs to reference objects in the storage. Usually for pickling objects are referenced in a storage by using the type of the object and the index in the store.

Now the user has the choice to pick either the old (faster and simpler) way of referencing or switch to using UUIDs. Due to the significant overhead for searching uuids this does not make sense for ToyEngine on a local machine. If you run large systems then the overhead can be neglected.

The benefit is that you can load/save and analyze objects independent of the storage. I will add the possibility to have a split storage. This way you can also run jobs independently and later join the data to one big dataset while maintaining the connections between objects.

Missing

Currently stored and cached CVs cannot be stored in a file with no snapshots. The utility function in it will hence only work for systems where snapshots use kinetics and statics for storage. So the split in
two files works that one contains all trajectories, snapshots, statics and kinetics,

While the other contains all but statics and kinetics.

There is a utility function to split and a function to join Storages

  • update RemoteKernel for an example
  • implement DistributedStorage
  • implement file merger
  • implement file split
  • example where we run independent simulations and analyze the joint results ?

Move to other PRs

@jhprinz jhprinz changed the title UUIDs [WIP] Reference by UUIDs Mar 8, 2016
@jhprinz
Copy link
Contributor Author

jhprinz commented Mar 8, 2016

First step towards an adaptive sampling scheme.

@dwhswenson
Copy link
Member

First step towards an adaptive sampling scheme.

How is this related to adaptive sampling?

@jhprinz
Copy link
Contributor Author

jhprinz commented Mar 9, 2016

Because I might use it then later for adaptive sampling :) It has more todo with what I need to be able to do adative sampling like on a cluster. So it is a step to make OPS more useable for the MSM_TIS Adaptice sampling...

@jhprinz
Copy link
Contributor Author

jhprinz commented Mar 9, 2016

Hmmm, this was more work than expected. I have now something working where you can select the way netcdfplus handles internal referenes. Either by UUID or by integer reference. Default is disabled. That means you can switch to multifile support if you wanted to. using UUID makes the objects unique spanning multiple files, but has some overhead, although the test run almost the same. The mstis_analyis uses 1.5 additional seconds for 500 mcsteps. I guess the benefit for large systems outweighs this, but for small and test systems the current implementation is much better. Single file and short references.

Also an additional UUID per object is required. For a million objects this are (currently) 36 MiB more. I can get this down to 16 MiB when storing the uuid as bytes and not a string. The implementation can also be improved upon, but in general this works.

@jhprinz
Copy link
Contributor Author

jhprinz commented Mar 13, 2016

Trying single trajectory files now. Seems to work very nicely and should make the distributed trajectory generation simpler.

@dwhswenson dwhswenson added this to the 1.0 milestone Jun 2, 2016
@jhprinz
Copy link
Contributor Author

jhprinz commented Jun 23, 2016

Closes #98

@jhprinz jhprinz mentioned this pull request Jul 12, 2016
36 tasks
@jhprinz jhprinz deleted the uuid branch October 6, 2016 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants