Allow output types #28

jhprinz · 2017-03-22T00:16:22Z

This implements all of the discussion in #23 .

You can now make arbitrary mixtures of selection/ stride for your engine and run and extend, etc.

Only point missing is that PyEmma will need either a reduced PDB or a selection string to work with not full atom sets (this is because my pyemma script computed backbone angles and without topology this is difficult)

This is pretty neat now. Also intelligent handling of frame numbers etc...

More tomorrow...

New features

output types: An engine has output_types that you can add. These contain information about striding and selections (atom subsets). You can have an arbitrary set of these output_types. Usually you would have a master with full selection and some stride and a subset like protein with native stride
```
engine.add_output_type('master', 'master.dcd', stride=10)
engine.add_output_type('protein', 'protein.dcd', stride=1, selection='protein')
```
Trajectory objects now require a engine property. I first thought to set this, when you actully run a trajectory, but it makes sense to set this upon creation. The engine contains information about topology and the output types so the trajectory is useless without this information. It also means that you can create the task directly from the trajectory. There are methods .run and .extend now for that.
```
task = project.new_trajectory(pdb_file, 100, engine).run()
```
Engine has now two commands .run() and .extend() instead of the long names for generating tasks. These are the same for all engines
pyemma feature support: This is tricky. I added a way to express pyemma features. What you do is convert the calls of featurizer.add_[someting](arg1, arg2, ...) into a dict like {'add_[something]': [arg1, arg2]} where args can again be calls to the featurizer object. This will allow you basic featurizer construction. If you really need something fancy you have to write your own Analysis class.

jhprinz · 2017-03-22T00:17:20Z

I realized that some PR closed #23. So let's continue here.

jhprinz · 2017-03-22T10:43:52Z

I still need to update the examples. But after that we are good to go and have all the features that we wanted.

franknoe · 2017-03-22T11:05:48Z

Excellent thank you! Let us fully focus on the docs now. Sent from my T-Mobile 4G LTE Device

…

-------- Original message -------- From: Jan-Hendrik Prinz <notifications@github.com> Date: 3/22/17 5:43 AM (GMT-06:00) To: markovmodel/adaptivemd <adaptivemd@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [markovmodel/adaptivemd] [WIP] Allow output types (#28) I still need to update the examples. But after that we are good to go and have all the features that we wanted. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/markovmodel/adaptivemd","title":"markovmodel/adaptivemd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/markovmodel/adaptivemd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@jhprinz in #28: I still need to update the examples. But after that we are good to go and have all the features that we wanted."}],"action":{"name":"View Pull Request","url":"#28 (comment)"}}}

…ttypes

This reverts commit 3eac97f.

…ttypes

jhprinz · 2017-03-22T19:34:59Z

So, examples are up. Please have a look! @nsplattner @thempel @franknoe

I think this is much more powerful now. I will update the docs some more and see to make a decent webpage.

franknoe · 2017-03-22T19:35:54Z

Thank you, will do Am 22/03/17 um 20:35 schrieb Jan-Hendrik Prinz:

…

So, examples are up. Please have a look! @nsplattner <https://github.com/nsplattner> @thempel <https://github.com/thempel> @franknoe <https://github.com/franknoe> I think this is much more powerful now. I will update the docs some more and see to make a decent webpage. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGMeQppWQ7jLjLyIIjymzcBjnivHq5ygks5roXfkgaJpZM4MkkoY>.

--

---------------------------------------------- Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany ----------------------------------------------

thempel · 2017-03-23T09:16:02Z

I just tested this PR as described in the tutorial updated in #34. The following happens when I add the engine to the project generators. Did I miss something?

>>> project.generators.add(engine)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-71-d101ab5a5a33> in <module>()
----> 1 project.generators.add(engine)

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/bundle.pyc in add(self, item)
    321         if self._set is not None and item not in self._set:
    322             logger.info('Added file of type `%s`' % item.__class__.__name__)
--> 323             self._set.save(item)
    324 
    325     @property

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
    703 
    704         try:
--> 705             self._save(obj)
    706             self.cache[uuid] = obj
    707 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
    488 
    489     def _save(self, obj):
--> 490         dct = self.storage.simplifier.to_simple_dict(obj)
    491         self._document.insert(dct)
    492         obj.__store__ = self

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
    524             '_cls': obj.__class__.__name__,
    525             '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526             '_dict': self.simplify(obj.to_dict(), base_type),
    527             '_id': str(UUID(int=obj.__uuid__)),
    528             '_time': int(obj.__time__)}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    166             else:
    167                 result = {
--> 168                     key: self.simplify(o) for key, o in obj.iteritems()
    169                     if key not in self.excluded_keys
    170                 }

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
    167                 result = {
    168                     key: self.simplify(o) for key, o in obj.iteritems()
--> 169                     if key not in self.excluded_keys
    170                 }
    171 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    145                 return None
    146         elif type(obj) is list:
--> 147             return [self.simplify(o, base_type) for o in obj]
    148         elif type(obj) is tuple:
    149             return {'_tuple': [self.simplify(o, base_type) for o in obj]}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    125                         '_cls': obj.__class__.__name__,
    126                         '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 127                         '_dict': self.simplify(obj.to_dict(), base_type)}
    128                 else:
    129                     return {

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    557                         '_store': store.name}
    558 
--> 559         return super(UUIDObjectJSON, self).simplify(obj, base_type)
    560 
    561     def build(self, obj):

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    166             else:
    167                 result = {
--> 168                     key: self.simplify(o) for key, o in obj.iteritems()
    169                     if key not in self.excluded_keys
    170                 }

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in <dictcomp>((key, o))
    167                 result = {
    168                     key: self.simplify(o) for key, o in obj.iteritems()
--> 169                     if key not in self.excluded_keys
    170                 }
    171 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in simplify(self, obj, base_type)
    552                 if not obj._ignore:
    553                     store = self.storage._obj_store[obj.__class__]
--> 554                     store.save(obj)
    555                     return {
    556                         '_hex_uuid': hex(obj.__uuid__),

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in save(self, obj)
    703 
    704         try:
--> 705             self._save(obj)
    706             self.cache[uuid] = obj
    707 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/object.pyc in _save(self, obj)
    488 
    489     def _save(self, obj):
--> 490         dct = self.storage.simplifier.to_simple_dict(obj)
    491         self._document.insert(dct)
    492         obj.__store__ = self

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/dictify.pyc in to_simple_dict(self, obj, base_type)
    524             '_cls': obj.__class__.__name__,
    525             '_obj_uuid': str(UUID(int=obj.__uuid__)),
--> 526             '_dict': self.simplify(obj.to_dict(), base_type),
    527             '_id': str(UUID(int=obj.__uuid__)),
    528             '_time': int(obj.__time__)}

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/file.pyc in to_dict(self)
    368     def to_dict(self):
    369         ret = super(File, self).to_dict()
--> 370         if self._file:
    371             ret['_file_'] = base64.b64encode(self._file)
    372 

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in __get__(self, instance, owner)
     38             if instance.__store__ is not None:
     39                 idx = self._idx(instance)
---> 40                 value = self._update(instance.__store__, idx)
     41                 self.values[instance] = value
     42                 return value

/storage/mi/thempel/adaptive_MD_test/adaptivemd/adaptivemd/mongodb/syncvar.pyc in _update(self, store, idx)
     23         if store is not None:
     24             return store._document.find_one(
---> 25                 {'_id': idx}).get(self.name)
     26 
     27         return None

AttributeError: 'NoneType' object has no attribute 'get'

jhprinz · 2017-03-23T11:05:08Z

This looks like to did not delete the project before restarting. Some of the internals have changed. Try

Project.delete(proj_name)

If that does not help could you post your engine definition?

That the message says is that you try to access an attribute ofan object that is marked as stored but does not exist in the db. That could happen if you reuse a pdb file e.g. after deletion of the project.

thempel · 2017-03-23T11:17:35Z

Ahh, thanks, I tried this but probably mixed something up. Now this error is resolved, but followed by another one:
DocumentTooLarge: BSON document too large (20205194 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
I'm using the same system as before and never had problems to load it into the DB. Looks like it is loading everything twice.

engine.items()

[('pdb_file_stage', 'init_adaptive.pdb'),
 ('integrator_file', 'integrator.xml'),
 ('_executable_file', 'openmmrun.py'),
 ('system_file_stage', 'system.xml'),
 ('pdb_file', 'init_adaptive.pdb'),
 ('integrator_file_stage', 'integrator.xml'),
 ('_executable_file_stage', 'openmmrun.py'),
 ('system_file', 'system.xml')]

jhprinz · 2017-03-23T12:06:54Z

well, strange... let me see...

jhprinz · 2017-03-23T12:12:48Z

All files were stored 2 before. But only the ones without _stage have content. Could you check that?

for k, v in engine.items():
    print len(v._file) if v._file is not None else 0

This works fine for me. So, I suspect that there is something else getting large.

jhprinz · 2017-03-23T12:15:10Z

Can you compare the file sizes with the original files? Just to make sure there is no overhead?

thempel · 2017-03-23T12:49:50Z

This seems to work and also the files on disc show the same number of characters. They have a total size of 11.5 M on disc, so it should be fine.

>>> for k, v in engine.items():
>>>    print v.short, len(v._file) if v._file is not None else 0

staging:///init_adaptive.pdb 0
file://{}/integrator.xml 117
file://{}/openmmrun.py 8828
staging:///system.xml 0
file://{}/init_adaptive.pdb 2204265
staging:///integrator.xml 0
staging:///openmmrun.py 0
file://{}/system.xml 8659243

thempel · 2017-03-23T13:02:12Z

Just scrolled through the above files in my notebook, there content seems fine. Is their anything else being copied?

jhprinz · 2017-03-23T14:11:50Z

This is the question. When exactly did this error happen. I assume you ran the setup from top with PDB, system.xml, etc and then, when storing the engine you got the error? So, it cannot have been caused by some other files, right? There are no other files present.

jhprinz · 2017-03-23T15:36:17Z

Found the bug/storage inefficiency. The file is really stored twice, which is definitely not intended. Will issue a quick fix. Still we should make it use the new storage option

jhprinz · 2017-03-23T22:37:47Z

Wow, this was a real tough one. Involving that weakref.WeakKeyDictionary uses hashing which depends on the pymongodb _id which in my implementation is set after object creation st. Due to the change of the hash you cannot find the same object in the WeakKeyDict... I should give the next seminar on that one...

No idea, how I found this one. That was probably the most hidden error so far...

Still, unfortunately #35 contains the fix and also allowing to store arbitrary large files now.

Problem is that when I will merge #35. This one will be merged as well... So let's at least finish this discussion. Additions from #35 are additional features while this PR changes the general concept of trajectories...

franknoe · 2017-03-24T15:14:44Z

I like the description of this task very much. The only point that concerns me a little bit is the last point (how to featurize), because it creates a relatively hard dependency on PyEMMA and our current naming conventions. There are two issues with this: (1) If you always depend on PyEMMA, this makes the dependencies very heavy (e.g. you also depend on things like matplotlib which are clearly irrelevant for this package) and many dependencies also means there are many ways for the package to break down if dependencies change. (2) Although we don't have a concrete plan for that, it is not impossible that the look+feel of PyEMMA featurization will change at some point. I know there are some deficiencies with the current one.

To address that, please check where you actually depend on PyEMMA and if possible find a way to make that dependency optional to your package, i.e. if the user doesn't need a certain functionality (e.g. writes their own analysis class), it shouldn't automatically install PyEMMA.
For the second point, since you basicly have to look up the PyEMMA API in order to write this pseudocode anyway, why not just use the PyEMMA function names directly (with the 'add_'). In any case this needs to be clearly documented, i.e. add a link to the PyEMMA featurizer in the present API docs.

Looking at the examples now...

jhprinz · 2017-03-27T13:25:33Z

merging this

jhprinz added 8 commits March 21, 2017 23:39

run works

2167d98

extend

389d87f

extend works

f0cc836

write frame 0

718a263

allow selection string

6cfd16b

allow only valid frame indices

44f6c44

fix pyemma to accept outtype

e98a53c

almost

a5e1f2a

jhprinz added 2 commits March 22, 2017 10:12

add pyemma feature support

c8f0eae

add pyemma feature support

244b1b5

nb update

6df5de4

jhprinz changed the title ~~[WIP] Allow output types~~ Allow output types Mar 22, 2017

jhprinz added 11 commits March 22, 2017 12:35

Merge branch 'workercleanup' into outputtypes

af4434e

fix frame picking to allow/existing frames

3c2ddbb

fix frame picking to allow/existing frames

83b87a2

works

cc190b6

Merge branch 'master' of github.com:markovmodel/adaptivemd into outpu…

80183ec

…ttypes

update examples

3eac97f

update examples

5d12e87

Merge branch 'workercleanup' into outputtypes

351a197

Revert "update examples"

c9ed667

This reverts commit 3eac97f.

fix

6b4b266

Merge branch 'master' of github.com:markovmodel/adaptivemd into outpu…

9e5e363

…ttypes

jhprinz mentioned this pull request Mar 22, 2017

Docs #34

Merged

thempel mentioned this pull request Mar 23, 2017

Larger file storage #35

Merged

jhprinz mentioned this pull request Mar 23, 2017

PyEmma Analysis #19

Open

jhprinz added 2 commits March 23, 2017 16:47

update

a63d319

fix

aa23fe6

jhprinz mentioned this pull request Mar 23, 2017

Simulation / Model / Analysis Workflow #25

Open

jhprinz merged commit 226e17e into markovmodel:master Mar 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow output types #28

Allow output types #28

jhprinz commented Mar 22, 2017 •

edited

Loading

jhprinz commented Mar 22, 2017

jhprinz commented Mar 22, 2017

franknoe commented Mar 22, 2017 via email

jhprinz commented Mar 22, 2017

franknoe commented Mar 22, 2017 via email

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

thempel commented Mar 23, 2017

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

franknoe commented Mar 24, 2017

jhprinz commented Mar 27, 2017

Allow output types #28

Allow output types #28

Conversation

jhprinz commented Mar 22, 2017 • edited Loading

New features

jhprinz commented Mar 22, 2017

jhprinz commented Mar 22, 2017

franknoe commented Mar 22, 2017 via email

jhprinz commented Mar 22, 2017

franknoe commented Mar 22, 2017 via email

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

thempel commented Mar 23, 2017

thempel commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

jhprinz commented Mar 23, 2017

franknoe commented Mar 24, 2017

jhprinz commented Mar 27, 2017

jhprinz commented Mar 22, 2017 •

edited

Loading