-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remark/comments field #32
Comments
For many use cases I agree with the possibility of a generic string field. Sounds light-weight and generic enough. For quantities attached to residues and atoms on the other hand (e.g. model quality numbers), it might be nicer to have a standardized way to attach a list of numbers into the mmtf file so that any viewer could color the structure according to one of those quantities... |
That would be nice too... I guess 3 quick ideas:
option 3 gets a little complicated with statically typed languages, but is probably the better option Some keys could be standardized keys like |
+1 for option 3 A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an |
speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: |
Perfect, now I know someone else would use this :p
I guess the only thing to watch out is that we might have |
This is sort of a more formal proposal for a comments field: It seems that myself and other developers are eager to append application specific information into our mmtf files, so having this become part of the standard would be very helpful, and save a lot of re-writing once/if it does eventually become a part of the standard. Does anyone have any objections to this sort of implementation? An example implementation for c++ is available at rcsb/mmtf-cpp#15 extraDataThis is a field to store any extra mmtf associated data. it is packed as a msgpack object, and therefore could contain anything, it is up to you (the developer) how you would like to store / pack / read data. It is sort of the equivalent of the pdb However, we would recommend that you use the format We do request that when using the MAP format described above, that you adhere to the following standardized
more to be decided?
|
Regarding the key, did you imply a convention regarding the prefix, e.g., structureKey (len of 1) |
I wasn't really meaning to, but we could if other people like that! definitely makes sense to me! |
How about an explicit convention by specifying data (or properties?) for
structure, model, chain, group, atom, and bond-level information that must
have a matching number of records.
- structureProperties (1)
- modelProperties (len numModels)
- chainProperties (len numChains)
- groupProperties (len numGroups)
- atomProperties (len numAtoms)
- bondProperties (len numBonds)
Data (properties) that don't fit into the categories above, would go into
extraProperties.
…-Peter
On Tue, Jul 17, 2018 at 12:52 PM, Daniel Farrell ***@***.***> wrote:
I wasn't really meaning to, but we could if other people like that!
definitely makes sense to me!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADuwEP323n3Ii-aNOlH6vDe1xYDnz3k0ks5uHkCDgaJpZM4S2avh>
.
|
A "best practice" naming convention sounds reasonable. @pwrose do you mean that each of those "...Properties" fields would itself contain a msgpack-map with key, value pairs? Doesn't sound too bad actually. Would make it very easy to have generic parsers of it for visualizations or so (could even work in strongly-typed languages like C++). In that case though I would propose to get rid of "extraData" and have those "...Properties" as optional fields at the top-level of the MMTF hierarchy. Otherwise we introduce an extra level of complexity (also there is currently no case of optional fields outside of the top-level of the MMTF hierarchy). |
@pwrose and @gtauriello - if I followed you correctly, example data could look like this: data = {
"mmtfVersion": "1.1",
"numAtoms": 999,
"numModels": 2,
"numChains": 4,
...
"xCoordList": [1.2, 3.4, ...],
"yCoordList": [5.6, 7.8, ...],
"zCoordList": [9.0, 1.2, ...],
...
"structureProperties": {
"foo_id": "ABC",
},
"modelProperties": {
# lists have len numModels=2
"foo_rmsdList": [0.5, 0.8],
"foo_scoreList": [1.2, 3.4],
},
"chainProperties": {
# lists have len numChains=4
"foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"],
"foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
},
"groupProperties": {
# lists have len numGroups
"stride_secStructList": [7, 7, 7, ...],
"sst_secStructList": [7, 7, 7, ...],
},
"atomProperties": {
# lists have len numAtoms=999
"pymol_colorList": [1, 2, 3, ...],
"pymol_repsList": [1, 1, 1, ...],
"apbs_chargeList": [0.1, -0.4, 0.7, ...],
"apbs_radiusList": [1.2, 1.8, 1.5, ...],
},
"bondProperties": {
# lists have len numBonds
"pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
},
"extraProperties": {
"pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
},
} |
Yes, that's a good example of what I had in mind.
…On Wed, Jul 18, 2018 at 9:11 AM, Thomas Holder ***@***.***> wrote:
@pwrose <https://github.com/pwrose> and @gtauriello
<https://github.com/gtauriello> - if I followed you correctly, example
data could look like this:
data = {
"mmtfVersion": "1.1",
"numAtoms": 999,
"numModels": 2,
"numChains": 4,
...
"xCoordList": [1.2, 3.4, ...],
"yCoordList": [5.6, 7.8, ...],
"zCoordList": [9.0, 1.2, ...],
...
"structureProperties": {
"foo_id": "ABC",
},
"modelProperties": {
# lists have len numModels=2
"foo_rmsdList": [0.5, 0.8],
"foo_scoreList": [1.2, 3.4],
},
"chainProperties": {
# lists have len numChains=4
"foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"],
"foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
},
"groupProperties": {
# lists have len numGroups
"stride_secStructList": [7, 7, 7, ...],
"sst_secStructList": [7, 7, 7, ...],
},
"atomProperties": {
# lists have len numAtoms=999
"pymol_colorList": [1, 2, 3, ...],
"pymol_repsList": [1, 1, 1, ...],
"apbs_chargeList": [0.1, -0.4, 0.7, ...],
"apbs_radiusList": [1.2, 1.8, 1.5, ...],
},
"bondProperties": {
# lists have len numBonds
"pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
},
"extraProperties": {
"pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
},
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADuwEAhALBACwXgRpjXIjx1CYBWOWHCkks5uH14egaJpZM4S2avh>
.
|
I like it! |
@danpf The entries contained in the map can still be generic msgpack objects. So it doesn't really simplify parsing in statically typed languages apart from being able to get the keys (which is good I guess). Either way a bit of structure might be good and it's not a big restriction to prescribe that we expect key (string) / value (any object) pairs for extra properties. |
resolved by #36 |
When working on modeling/prediction/design problems I know a lot of people add comments/remarks of various things to their PDB files.
In the case of structures from the PDB, I think it would be best if this field is empty always.
Possible use cases:
It would be very useful to add a field dedicated to this.
probably:
extras
orcomments
and it would just be a string field.The alternative is to just to use
title
orstructureId
for this kind of stuff since in most modeling they don't exist. I'm not against that either, but the spec documentation should just note which one applications should use so it's standardized.~Dan
The text was updated successfully, but these errors were encountered: