-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aromaticity field + clear up some discrepancies #35
Conversation
spec.md
Outdated
@@ -1057,6 +1091,7 @@ The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx. | |||
"formalChargeList": [ 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2 ], | |||
"bondOrderList": [ 1, 1, 2 ], | |||
"bondAromaticity": [ 0, 0, 1 ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
spec.md
Outdated
@@ -388,6 +399,7 @@ The following table lists all top level fields, including their [type](#types) a | |||
| [groupList](#grouplist) | [Array](#types) | Y | | |||
| [bondAtomList](#bondatomlist) | [Binary](#types) | | | |||
| [bondOrderList](#bondorderlist) | [Binary](#types) | | | |||
| [bondAromaticityrList](#bondaromaticitylist)| [Binary](#types) | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo (extra r
)
spec.md
Outdated
@@ -337,7 +346,8 @@ First create a `Array` to hold values that are referable by indices. In the foll | |||
"elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ], | |||
"formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], | |||
"bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] | |||
"bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ], | |||
"bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change the example to reflect the phenylring in PHE, so the aromaticity example makes sense.
spec.md
Outdated
@@ -347,7 +357,8 @@ First create a `Array` to hold values that are referable by indices. In the foll | |||
"elementList": [ "N", "C", "C", "O", "C", "O" ], | |||
"formalChargeList": [ 0, 0, 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], | |||
"bondOrderList": [ 1, 1, 2, 1, 1 ] | |||
"bondOrderList": [ 1, 1, 2, 1, 1 ], | |||
"bondAromaticity": [ 0, 0, 1, 0, 0 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
spec.md
Outdated
@@ -1057,6 +1091,7 @@ The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx. | |||
"formalChargeList": [ 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2 ], | |||
"bondOrderList": [ 1, 1, 2 ], | |||
"bondAromaticity": [ 0, 0, 1 ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
spec.md
Outdated
@@ -1066,7 +1101,8 @@ The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx. | |||
"elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ], | |||
"formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], | |||
"bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] | |||
"bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ], | |||
"bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
spec.md
Outdated
@@ -1076,7 +1112,8 @@ The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx. | |||
"elementList": [ "N", "C", "C", "O", "C", "O" ], | |||
"formalChargeList": [ 0, 0, 0, 0, 0, 0 ], | |||
"bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], | |||
"bondOrderList": [ 1, 1, 2, 1, 1 ] | |||
"bondOrderList": [ 1, 1, 2, 1, 1 ], | |||
"bondAromaticity": [ 0, 0, 1, 0, 0 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing List
suffix?
spec.md
Outdated
|
||
*Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers. | ||
|
||
*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 1 and 4, defining single, double, triple, and quadruple bonds. | ||
*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 0 and 4, defining unknown, single, double, triple, and quadruple bonds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about -1
being the unknown
value? We also use -1
for undefined secondary structure.
Motivation: One day we might discuss adding zero-order bonds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for consistency, so a -1 would be a good option.
If I understand you correctly, all 3 bond...List fields must be given if bonds are present? In that case we would need to increment the version to 2.0 since 1.0-compliant files would not be valid anymore. Alternatively, a default behavior for the case of missing |
@speleo3 Thank you for the comments! I think I fixed all of them. Sorry I missed so much in my first pass. @pwrose & @speleo3 I updated @gtauriello I suppose that's what's up for discussion. I tried be as specific as possible for this initial draft, and if everyone else thinks differently then we can change it! In my opinion it will be easier for everyone involved if all 3 fields exist when bonds are present. Otherwise I can see applications having io code that has a lot of branch points that need to be built to adhere to the spec. But that's just an opinion, and maybe some people that have deeper integration with mmtf in their applications could speak about some pros/cons from their ends. I guess this is V2 of the PR now, I made some updates with some more notes/examples. The most important part to me is to make sure that applications don't have to have to account for a bunch of different corner cases when doing io. But you can let me know if I overreached. Additionally: |
spec.md
Outdated
@@ -1062,7 +1062,7 @@ The mmCIF format allows for so-called micro-heterogeneity on the group-level. Fo | |||
| elementList | [Array](#types) | Array of elements, 0 to 3 character [Strings](#types) | Y | | |||
| bondAtomList | [Array](#types) | Array of bonded atom indices, [Integers](#types) | | | |||
| bondOrderList | [Array](#types) | Array of bond orders as [Integers](#types) between 1 and 4 | | | |||
| bondAromaticityList | [Array](#types) | Array of bond aromaticity as [Integers](#types) between 0 and 4 | | | |||
| bondAromaticityList | [Array](#types) | Array of bond aromaticity as [Integers](#types) -1, 1,2,3 and 3 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read: Array of bond aromaticity as Integers 0 or 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks!
@danpf Sounds good to me. Thanks for the update. I agree that application developers shouldn't have to do many if-branches. For the decoders, on the other hand, we may still need/want(?) to support reading of 1.0-compliant files where we then need some default behavior to deal with missing aromaticities flags. Once the file is decoded, the user won't have to do any branching anymore, since the decoder can guarantee that all 3 fields have full data. But unless we drop backwards compatibility in our decoders, this implies having a default behavior for decoders to deal with the field missing from the file... |
-1 for requiring |
Agree with @speleo3 . I'd keep both order and aromaticity optional. There can be applications that have no idea about the bond types at all. |
I am unsure if the run-length encoding of bondAromaticityList is worth it. In the case where nothing is aromatic, sure, but then you can leave it out altogether. For the case of proteins the "runs" can be quite short, i.e. from one aromatic side chain to another. For secondary structure data run-length encoding was generally not worth it and bondAromaticityList seems similar. @danpf did you calculate if it is worth it? |
@arose
which yeilds: I don't really mind either way, but if we're not going to encode but I would expect the vast majority of bonds in |
Also, I was just thinking... Maybe we shouldn't call it |
I made some changes to fit what was discussed. let me know your thoughts! |
right, ok then |
This is a first pass at adding the aromaticity field.
Please post any suggestions you might have!
This is meant to clear up:
#34
#33