Remark/comments field #32

danpf · 2018-03-22T03:58:57Z

When working on modeling/prediction/design problems I know a lot of people add comments/remarks of various things to their PDB files.
In the case of structures from the PDB, I think it would be best if this field is empty always.

Possible use cases:

protein design scores/parameters
application runtime flags/commands
model quality numbers
rmsd to native for bench marking

It would be very useful to add a field dedicated to this.
probably:
extras or comments and it would just be a string field.

The alternative is to just to use title or structureId for this kind of stuff since in most modeling they don't exist. I'm not against that either, but the spec documentation should just note which one applications should use so it's standardized.
~Dan

The text was updated successfully, but these errors were encountered:

gtauriello · 2018-03-22T16:59:16Z

For many use cases I agree with the possibility of a generic string field. Sounds light-weight and generic enough.

For quantities attached to residues and atoms on the other hand (e.g. model quality numbers), it might be nicer to have a standardized way to attach a list of numbers into the mmtf file so that any viewer could color the structure according to one of those quantities...

danpf · 2018-03-22T17:53:51Z

That would be nice too...

I guess 3 quick ideas:

Pack as raw-string-json. let application handle json parsing
Pack as dictionary of strings. let application handle going from string to int/double
Pack via msgpack, let user handle msgpack obj decoding.

option 3 gets a little complicated with statically typed languages, but is probably the better option

Some keys could be standardized keys like color or atom_color or residue_color for molecular viewers? should probably ask a few mol-viewer people their thoughts on that.

speleo3 · 2018-03-22T18:14:27Z

+1 for option 3
+1 for standardized keys like atomColorList - also chargeList (or partialChargeList) and radiusList to replace formats like PQR

A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an <appname>_ or <organization>_ prefix for custom keys could never lead to a naming conflict.

speleo3 · 2018-03-22T18:22:21Z

speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList (encoded with strategy type 7) and pymolColorList (plain msgpack array).

danpf · 2018-03-22T18:32:31Z

speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList (encoded with strategy type 7) and pymolColorList (plain msgpack array).

Perfect, now I know someone else would use this :p

A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an _ or _ prefix for custom keys could never lead to a naming conflict.

I guess the only thing to watch out is that we might have pymolColorList and chimeraColorList and nglColorList... But i think pymol::ColorList or pymol::color_list would be best if we were to standardize it, pymol people love their underscores. I'd feel bad taking them away from them hah

danpf · 2018-07-16T10:21:43Z

@arose @pwrose

This is sort of a more formal proposal for a comments field:

It seems that myself and other developers are eager to append application specific information into our mmtf files, so having this become part of the standard would be very helpful, and save a lot of re-writing once/if it does eventually become a part of the standard.

Does anyone have any objections to this sort of implementation?
The alternative as @speleo3 mentioned above, is to pack any extraData directly into the base dictionary of the packed mmtf file

An example implementation for c++ is available at rcsb/mmtf-cpp#15

extraData

This is a field to store any extra mmtf associated data. it is packed as a msgpack object, and therefore could contain anything, it is up to you (the developer) how you would like to store / pack / read data. It is sort of the equivalent of the pdb REMARKlines.

However, we would recommend that you use the format MAP< string, msgpack object > this allows standardized read in between applications, and is easily understandable and extensible across languages.

We do request that when using the MAP format described above, that you adhere to the following standardized key, value pairs:

key	value description	encoding
groupColorList	list[hex code strings (len of numGroups)]	None
atomColorList	list[hex code strings (len of numAtoms)]	None
etc	etc	etc

more to be decided?

pwrose · 2018-07-17T19:37:36Z

Regarding the key, did you imply a convention regarding the prefix, e.g.,

structureKey (len of 1)
modelKey(len of numModels)
chainKey (len of numChains)
groupKey (len of numGroups)
atomKey (len of numAtoms)
bondKey (len of numBonds)

danpf · 2018-07-17T19:52:14Z

I wasn't really meaning to, but we could if other people like that! definitely makes sense to me!

pwrose · 2018-07-17T20:23:06Z

How about an explicit convention by specifying data (or properties?) for structure, model, chain, group, atom, and bond-level information that must have a matching number of records. - structureProperties (1) - modelProperties (len numModels) - chainProperties (len numChains) - groupProperties (len numGroups) - atomProperties (len numAtoms) - bondProperties (len numBonds) Data (properties) that don't fit into the categories above, would go into extraProperties.

…

-Peter

On Tue, Jul 17, 2018 at 12:52 PM, Daniel Farrell ***@***.***> wrote: I wasn't really meaning to, but we could if other people like that! definitely makes sense to me! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADuwEP323n3Ii-aNOlH6vDe1xYDnz3k0ks5uHkCDgaJpZM4S2avh> .

gtauriello · 2018-07-18T09:55:42Z

A "best practice" naming convention sounds reasonable.

@pwrose do you mean that each of those "...Properties" fields would itself contain a msgpack-map with key, value pairs? Doesn't sound too bad actually. Would make it very easy to have generic parsers of it for visualizations or so (could even work in strongly-typed languages like C++). In that case though I would propose to get rid of "extraData" and have those "...Properties" as optional fields at the top-level of the MMTF hierarchy. Otherwise we introduce an extra level of complexity (also there is currently no case of optional fields outside of the top-level of the MMTF hierarchy).

speleo3 · 2018-07-18T16:11:10Z

@pwrose and @gtauriello - if I followed you correctly, example data could look like this:

data = {
  "mmtfVersion": "1.1",
  "numAtoms": 999,
  "numModels": 2,
  "numChains": 4,
  ...
  "xCoordList": [1.2, 3.4, ...],
  "yCoordList": [5.6, 7.8, ...],
  "zCoordList": [9.0, 1.2, ...],
  ...
  "structureProperties": {
    "foo_id": "ABC",
  },
  "modelProperties": {
    # lists have len numModels=2
    "foo_rmsdList": [0.5, 0.8],
    "foo_scoreList": [1.2, 3.4],
  },
  "chainProperties": {
    # lists have len numChains=4
    "foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"],
    "foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
  },
  "groupProperties": {
    # lists have len numGroups
    "stride_secStructList": [7, 7, 7, ...],
    "sst_secStructList": [7, 7, 7, ...],
  },
  "atomProperties": {
    # lists have len numAtoms=999
    "pymol_colorList": [1, 2, 3, ...],
    "pymol_repsList": [1, 1, 1, ...],
    "apbs_chargeList": [0.1, -0.4, 0.7, ...],
    "apbs_radiusList": [1.2, 1.8, 1.5, ...],
  },
  "bondProperties": {
    # lists have len numBonds
    "pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
  },
  "extraProperties": {
    "pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
  },
}

pwrose · 2018-07-18T17:25:49Z

Yes, that's a good example of what I had in mind.

…

On Wed, Jul 18, 2018 at 9:11 AM, Thomas Holder ***@***.***> wrote: @pwrose <https://github.com/pwrose> and @gtauriello <https://github.com/gtauriello> - if I followed you correctly, example data could look like this: data = { "mmtfVersion": "1.1", "numAtoms": 999, "numModels": 2, "numChains": 4, ... "xCoordList": [1.2, 3.4, ...], "yCoordList": [5.6, 7.8, ...], "zCoordList": [9.0, 1.2, ...], ... "structureProperties": { "foo_id": "ABC", }, "modelProperties": { # lists have len numModels=2 "foo_rmsdList": [0.5, 0.8], "foo_scoreList": [1.2, 3.4], }, "chainProperties": { # lists have len numChains=4 "foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"], "foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00], }, "groupProperties": { # lists have len numGroups "stride_secStructList": [7, 7, 7, ...], "sst_secStructList": [7, 7, 7, ...], }, "atomProperties": { # lists have len numAtoms=999 "pymol_colorList": [1, 2, 3, ...], "pymol_repsList": [1, 1, 1, ...], "apbs_chargeList": [0.1, -0.4, 0.7, ...], "apbs_radiusList": [1.2, 1.8, 1.5, ...], }, "bondProperties": { # lists have len numBonds "pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...], }, "extraProperties": { "pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"} }, } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADuwEAhALBACwXgRpjXIjx1CYBWOWHCkks5uH14egaJpZM4S2avh> .

danpf · 2018-07-18T20:49:31Z

I like it!
Re-> extraProperties
this is more for statically typed languages (like c++)
I wrote extraData so that it didn't have to be a map<string, msgpack::object>, rather that it could be anything, (a simple list, a number, a custom serialized object, etc)... Do you think that's useless? and that extraProperties should just always be a map<string, msgpack::object>?

gtauriello · 2018-07-19T13:41:38Z

@danpf The entries contained in the map can still be generic msgpack objects. So it doesn't really simplify parsing in statically typed languages apart from being able to get the keys (which is good I guess). Either way a bit of structure might be good and it's not a big restriction to prescribe that we expect key (string) / value (any object) pairs for extra properties.

danpf · 2018-10-22T16:47:46Z

resolved by #36

This was referenced Jul 13, 2018

Adds 'extraData' field rcsb/mmtf-cpp#14

Closed

Support for extraData field + test rcsb/mmtf-cpp#15

Merged

gtauriello mentioned this issue Jul 18, 2018

Type guessing in MapDecoder? rcsb/mmtf-cpp#16

Open

danpf mentioned this issue Aug 31, 2018

Add extra fields #36

Merged

danpf closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remark/comments field #32

Remark/comments field #32

danpf commented Mar 22, 2018

gtauriello commented Mar 22, 2018

danpf commented Mar 22, 2018

speleo3 commented Mar 22, 2018

speleo3 commented Mar 22, 2018

danpf commented Mar 22, 2018

danpf commented Jul 16, 2018

pwrose commented Jul 17, 2018

danpf commented Jul 17, 2018

pwrose commented Jul 17, 2018 via email

gtauriello commented Jul 18, 2018

speleo3 commented Jul 18, 2018

pwrose commented Jul 18, 2018 via email

danpf commented Jul 18, 2018

gtauriello commented Jul 19, 2018

danpf commented Oct 22, 2018

Remark/comments field #32

Remark/comments field #32

Comments

danpf commented Mar 22, 2018

gtauriello commented Mar 22, 2018

danpf commented Mar 22, 2018

speleo3 commented Mar 22, 2018

speleo3 commented Mar 22, 2018

danpf commented Mar 22, 2018

danpf commented Jul 16, 2018

extraData

pwrose commented Jul 17, 2018

danpf commented Jul 17, 2018

pwrose commented Jul 17, 2018 via email

gtauriello commented Jul 18, 2018

speleo3 commented Jul 18, 2018

pwrose commented Jul 18, 2018 via email

danpf commented Jul 18, 2018

gtauriello commented Jul 19, 2018

danpf commented Oct 22, 2018