Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remark/comments field #32

Closed
danpf opened this issue Mar 22, 2018 · 15 comments
Closed

Remark/comments field #32

danpf opened this issue Mar 22, 2018 · 15 comments

Comments

@danpf
Copy link
Contributor

danpf commented Mar 22, 2018

When working on modeling/prediction/design problems I know a lot of people add comments/remarks of various things to their PDB files.
In the case of structures from the PDB, I think it would be best if this field is empty always.

Possible use cases:

  • protein design scores/parameters
  • application runtime flags/commands
  • model quality numbers
  • rmsd to native for bench marking

It would be very useful to add a field dedicated to this.
probably:
extras or comments and it would just be a string field.

The alternative is to just to use title or structureId for this kind of stuff since in most modeling they don't exist. I'm not against that either, but the spec documentation should just note which one applications should use so it's standardized.
~Dan

@gtauriello
Copy link

For many use cases I agree with the possibility of a generic string field. Sounds light-weight and generic enough.

For quantities attached to residues and atoms on the other hand (e.g. model quality numbers), it might be nicer to have a standardized way to attach a list of numbers into the mmtf file so that any viewer could color the structure according to one of those quantities...

@danpf
Copy link
Contributor Author

danpf commented Mar 22, 2018

That would be nice too...

I guess 3 quick ideas:

  1. Pack as raw-string-json. let application handle json parsing
  2. Pack as dictionary of strings. let application handle going from string to int/double
  3. Pack via msgpack, let user handle msgpack obj decoding.

option 3 gets a little complicated with statically typed languages, but is probably the better option

Some keys could be standardized keys like color or atom_color or residue_color for molecular viewers? should probably ask a few mol-viewer people their thoughts on that.

@speleo3
Copy link
Contributor

speleo3 commented Mar 22, 2018

+1 for option 3
+1 for standardized keys like atomColorList - also chargeList (or partialChargeList) and radiusList to replace formats like PQR

A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an <appname>_ or <organization>_ prefix for custom keys could never lead to a naming conflict.

@speleo3
Copy link
Contributor

speleo3 commented Mar 22, 2018

speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList (encoded with strategy type 7) and pymolColorList (plain msgpack array).

@danpf
Copy link
Contributor Author

danpf commented Mar 22, 2018

speaking of custom keys: PyMOL 2.1 exports MMTF files with two custom keys: pymolRepsList (encoded with strategy type 7) and pymolColorList (plain msgpack array).

Perfect, now I know someone else would use this :p

A convention for non-standard keys would also be useful, this could prevent name clashes with future standard keys. E.g. if standard keys never use underscores, then an _ or _ prefix for custom keys could never lead to a naming conflict.

I guess the only thing to watch out is that we might have pymolColorList and chimeraColorList and nglColorList... But i think pymol::ColorList or pymol::color_list would be best if we were to standardize it, pymol people love their underscores. I'd feel bad taking them away from them hah

@danpf
Copy link
Contributor Author

danpf commented Jul 16, 2018

@arose @pwrose

This is sort of a more formal proposal for a comments field:

It seems that myself and other developers are eager to append application specific information into our mmtf files, so having this become part of the standard would be very helpful, and save a lot of re-writing once/if it does eventually become a part of the standard.

Does anyone have any objections to this sort of implementation?
The alternative as @speleo3 mentioned above, is to pack any extraData directly into the base dictionary of the packed mmtf file

An example implementation for c++ is available at rcsb/mmtf-cpp#15


extraData

This is a field to store any extra mmtf associated data. it is packed as a msgpack object, and therefore could contain anything, it is up to you (the developer) how you would like to store / pack / read data. It is sort of the equivalent of the pdb REMARKlines.

However, we would recommend that you use the format MAP< string, msgpack object > this allows standardized read in between applications, and is easily understandable and extensible across languages.

We do request that when using the MAP format described above, that you adhere to the following standardized key, value pairs:

key value description encoding
groupColorList list[hex code strings (len of numGroups)] None
atomColorList list[hex code strings (len of numAtoms)] None
etc etc etc

more to be decided?

@pwrose
Copy link
Collaborator

pwrose commented Jul 17, 2018

Regarding the key, did you imply a convention regarding the prefix, e.g.,

structureKey (len of 1)
modelKey(len of numModels)
chainKey (len of numChains)
groupKey (len of numGroups)
atomKey (len of numAtoms)
bondKey (len of numBonds)

@danpf
Copy link
Contributor Author

danpf commented Jul 17, 2018

I wasn't really meaning to, but we could if other people like that! definitely makes sense to me!

@pwrose
Copy link
Collaborator

pwrose commented Jul 17, 2018 via email

@gtauriello
Copy link

A "best practice" naming convention sounds reasonable.

@pwrose do you mean that each of those "...Properties" fields would itself contain a msgpack-map with key, value pairs? Doesn't sound too bad actually. Would make it very easy to have generic parsers of it for visualizations or so (could even work in strongly-typed languages like C++). In that case though I would propose to get rid of "extraData" and have those "...Properties" as optional fields at the top-level of the MMTF hierarchy. Otherwise we introduce an extra level of complexity (also there is currently no case of optional fields outside of the top-level of the MMTF hierarchy).

@speleo3
Copy link
Contributor

speleo3 commented Jul 18, 2018

@pwrose and @gtauriello - if I followed you correctly, example data could look like this:

data = {
  "mmtfVersion": "1.1",
  "numAtoms": 999,
  "numModels": 2,
  "numChains": 4,
  ...
  "xCoordList": [1.2, 3.4, ...],
  "yCoordList": [5.6, 7.8, ...],
  "zCoordList": [9.0, 1.2, ...],
  ...
  "structureProperties": {
    "foo_id": "ABC",
  },
  "modelProperties": {
    # lists have len numModels=2
    "foo_rmsdList": [0.5, 0.8],
    "foo_scoreList": [1.2, 3.4],
  },
  "chainProperties": {
    # lists have len numChains=4
    "foo_uniprotIdList": ["HBB_HUMAN", "HBA_HUMAN", "HBB_HUMAN", "HBA_HUMAN"],
    "foo_chainColorList": [0xFF0000, 0x00FF00, 0xFF0000, 0x00FF00],
  },
  "groupProperties": {
    # lists have len numGroups
    "stride_secStructList": [7, 7, 7, ...],
    "sst_secStructList": [7, 7, 7, ...],
  },
  "atomProperties": {
    # lists have len numAtoms=999
    "pymol_colorList": [1, 2, 3, ...],
    "pymol_repsList": [1, 1, 1, ...],
    "apbs_chargeList": [0.1, -0.4, 0.7, ...],
    "apbs_radiusList": [1.2, 1.8, 1.5, ...],
  },
  "bondProperties": {
    # lists have len numBonds
    "pymol_bondTypeList": [1, 1, 1, 4, 4, 4, 4, 4, 4, 1, ...],
  },
  "extraProperties": {
    "pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
  },
}

@pwrose
Copy link
Collaborator

pwrose commented Jul 18, 2018 via email

@danpf
Copy link
Contributor Author

danpf commented Jul 18, 2018

I like it!
Re-> extraProperties
this is more for statically typed languages (like c++)
I wrote extraData so that it didn't have to be a map<string, msgpack::object>, rather that it could be anything, (a simple list, a number, a custom serialized object, etc)... Do you think that's useless? and that extraProperties should just always be a map<string, msgpack::object>?

@gtauriello
Copy link

@danpf The entries contained in the map can still be generic msgpack objects. So it doesn't really simplify parsing in statically typed languages apart from being able to get the keys (which is good I guess). Either way a bit of structure might be good and it's not a big restriction to prescribe that we expect key (string) / value (any object) pairs for extra properties.

@danpf danpf mentioned this issue Aug 31, 2018
@danpf
Copy link
Contributor Author

danpf commented Oct 22, 2018

resolved by #36

@danpf danpf closed this as completed Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants