Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial C++ implementation #1

Merged
merged 8 commits into from
Mar 16, 2017
Merged

Initial C++ implementation #1

merged 8 commits into from
Mar 16, 2017

Conversation

gtauriello
Copy link
Collaborator

This is an initial implementation for a C++ header only decoder for MMTF. The signature for the encoder was added as a preview. Please let us know how you like the usage and layout of the code and we will continue with the encoder (probably not before April though).

The README includes all information on how to use it and there are example and test codes. To test the decoder, I wrote a simple json-dumper for the previous C decoder and kept the results. The C++ decoder can generate json files which are formatted identically and I use a simple diff to ensure that we do the same job. The test can run automatically as described in the README. For the future we should aim for a cleaner unit testing approach though (using Boost or CppUnit?).
Limitations of the current testing:

  • I can only check fields and encodings actually contained in the test MMTFs. Some nastier cases with missing fields might exist and I cannot guarantee that they are handled well. Also, not all binary encodings are used (I have only observed strategies 2, 4, 5, 6, 8, 9, 10)
  • The C decoder was lacking the ncsOperatorList field and I have therefore excluded it for the testing. Also, the field is not contained in any of the test MMTFs and so I couldn't test it anyways.

Another point we might consider is to include msgpack-c headers in the repository too. I have seen msgpack-c including boost headers somehow automatically via github. Seems cool to avoid duplication of files but I would have to dig into how they do that...

@speleo3
Copy link
Collaborator

speleo3 commented Mar 3, 2017

I vote against checking in the generated doxygen documentation. I've discussed with Yana that I prefer to also remove it from the mmtf-c repo.

Also updated README and Doxygen file to ensure that anyone can cleanly generate
it from scratch.
@gtauriello
Copy link
Collaborator Author

Ok. Yeah I agree with you on that. I had just pushed it in there to be consistent with the C version. I removed it from the repository and added some text to the README on how to generate it from scratch.

@speleo3
Copy link
Collaborator

speleo3 commented Mar 3, 2017

the mmtf-c repo is not particularly clean, no need to be consistent on that level :)

@josemduarte
Copy link
Member

Regarding ncsOperatorList, we should definitely have that in mmtf-c and here. I'll open a new issue for mmtf-c. A good example of file containing data in that field would be 1auy (http://mmtf.rcsb.org/v1.0/full/1AUY).

As a note, we started some time ago a test suite at https://github.com/rcsb/mmtf/tree/master/test-suite where we want to compile a list of structures to test. We don't have the structure data in json format there yet. But at least we do have the list of test structures in https://github.com/rcsb/mmtf/blob/master/test-suite/file-list.json. I've just submitted a PR to add 1auy there.

@gtauriello
Copy link
Collaborator Author

Thanks for the example. I managed to read it and compared it with the mmCIF file. Not sure I understand the field completely, but it seems to have parsed it correctly.
If any of you wants to double-check, I get the following 14 matrices (partial output from "./traverse.exe 1AUY.mmtf txt"):

ncsOperatorList[0]: [0.5, -0.809017, -0.309017, 128.875, 0.809017, 0.309017, 0.5, -208.524, -0.309017, -0.5, 0.809017, 79.6491, 0, 0, 0, 1]
ncsOperatorList[1]: [-0.309017, -0.5, -0.809017, 337.399, 0.5, -0.809017, 0.309017, -128.875, -0.809017, -0.309017, 0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[2]: [-0.309017, 0.5, -0.809017, 337.399, -0.5, -0.809017, -0.309017, 128.875, -0.809017, 0.309017, 0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[3]: [0.5, 0.809017, -0.309017, 128.875, -0.809017, 0.309017, -0.5, 208.524, -0.309017, 0.5, 0.809017, 79.6491, 0, 0, 0, 1]
ncsOperatorList[4]: [-0.809017, -0.309017, -0.5, 466.274, -0.309017, -0.5, 0.809017, 79.6491, -0.5, 0.809017, 0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[5]: [-0.5, 0.809017, -0.309017, 386.625, -0.809017, -0.309017, 0.5, 208.524, 0.309017, 0.5, 0.809017, -79.6491, 0, 0, 0, 1]
ncsOperatorList[6]: [0.5, 0.809017, 0.309017, 128.875, -0.809017, 0.309017, 0.5, 208.524, 0.309017, -0.5, 0.809017, -79.6491, 0, 0, 0, 1]
ncsOperatorList[7]: [0.809017, -0.309017, 0.5, 49.2259, -0.309017, 0.5, 0.809017, 79.6491, -0.5, -0.809017, 0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[8]: [0, -1, 0, 257.75, 0, 0, 1, 0, -1, 0, 0, 257.75, 0, 0, 0, 1]
ncsOperatorList[9]: [0.809017, -0.309017, -0.5, 49.2259, 0.309017, -0.5, 0.809017, -79.6491, -0.5, -0.809017, -0.309017, 128.875, 0, 0, 0, 1]
ncsOperatorList[10]: [0.309017, -0.5, -0.809017, 178.101, -0.5, -0.809017, 0.309017, 128.875, -0.809017, 0.309017, -0.5, 208.524, 0, 0, 0, 1]
ncsOperatorList[11]: [0, 0, -1, 257.75, -1, 0, 0, 257.75, 0, 1, 0, 0, 0, 0, 0, 1]
ncsOperatorList[12]: [0.309017, 0.5, -0.809017, 178.101, -0.5, 0.809017, 0.309017, 128.875, 0.809017, 0.309017, 0.5, -208.524, 0, 0, 0, 1]
ncsOperatorList[13]: [0.809017, 0.309017, -0.5, 49.2259, 0.309017, 0.5, 0.809017, -79.6491, 0.5, -0.809017, 0.309017, -128.875, 0, 0, 0, 1]

@josemduarte
Copy link
Member

@gtauriello awesome thanks! For context, the field is used in some PDB entries that need extra operators to generate the full asymmetric unit. This is usually the case for highly symmetrical viral capsid proteins. In those cases you need the extra information in this field to be able to reconstruct the full unit cell. See more info under "Example of a viral capsid" section in here

Copy link
Member

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic. I'd only add that it'd be good to have some unit tests. But that can be done in separate PRs.

@speleo3
Copy link
Collaborator

speleo3 commented Mar 3, 2017

This looks really great, excellent job!

One thought regarding the test files: Currently there is 68KB of code, and 45MB of example MMTF files. Are those really necessary here? I really like that there is a separate repo for test files (rcsb/mmtf/test-suite), if those files are really needed here for unit tests, maybe it could be added as a git submodule?

@josemduarte
Copy link
Member

I'd support the idea of @speleo3 to move the data files to the test suite in the mmtf repo.

@gtauriello
Copy link
Collaborator Author

I agree on both.
For unit tests I already mentioned above that we should aim for a proper unit testing approach (would it be ok to use Boost there?).
The mmtf files are exactly the same as in the mmtf/test-suite I believe. Only difference is my custom json-style output (out_json_ref.tar.gz) that I compare with. But the latter is also just an artifact of not having decent unit tests yet.
But I guess we can do both those changes at a later point in time.
Also the testing will be easier with an encoder in place...

@pwrose
Copy link
Collaborator

pwrose commented Mar 4, 2017

Outstanding job!

Can you please remove the copyright line from the files:
// Copyright [2017] [RCSB]

@gtauriello
Copy link
Collaborator Author

Ok. I removed those lines. That was one more thing I blindly copied from the C library. Also the Apache License is had copied from there. If that one is not suitable here, please let me know.

@speleo3
Copy link
Collaborator

speleo3 commented Mar 5, 2017

Regarding the Apache license: Did the RCSB have specific reasons to pick the Apache license over e.g. the BSD-3-clause or MIT license? I understand that for practical reasoning, they are basically the same. But the Apache license seems to be a bit higher maintenance, given the requirement to explicitly list all changes in derived work, and the much longer and "heavily lawyered" license text.

@pwrose
Copy link
Collaborator

pwrose commented Mar 6, 2017 via email

@gtauriello
Copy link
Collaborator Author

Ok I switched to the MIT license. Let me know if you need anything else from me at the moment. Otherwise, I expect to be working on encoder and more testing in April.

Also for a separate PR: if anyone of you has experience with git submodules, it would be cool to add the MMTF test suite as @speleo3 suggested instead of having a separate copy in here. One could probably even do the same with the msgpack-c headers to avoid having external dependencies...

@josemduarte
Copy link
Member

Regarding test data files, I think the best solution is to have them all in the test suite. I'd propose to add the json files that @gtauriello produced to the rcsb/mmtf/test-suite and also keep there the test mmtf files. I'll submit a PR for that in the mmtf project.

@gtauriello
Copy link
Collaborator Author

Ok then I should probably add new json files there for the the new test files?
I would have to check though whether we handle the edge cases correctly. Also I opened an issue there since I am unsure on whether sequenceIndexList is a required field or not...

@josemduarte
Copy link
Member

The rcsb/mmtf/test-suite is updated with json files and latest mmtf files. Please feel free to add any missing file.

@josemduarte
Copy link
Member

Anything against merging? I think re-pointing tests to the files in test-suite can be done later. I'll merge this tomorrow if there's no comments.

@pwrose
Copy link
Collaborator

pwrose commented Mar 15, 2017 via email

@gtauriello
Copy link
Collaborator Author

Yes I agree that we can merge this and updating the test files can be done later. I just quickly pushed a fix that ensures that the traverse example works fine for the new test files with optional fields (the parser was fine anyways).
I didn't get to update the test cases yet (I will also need to update the json files to include the new test files with optional fields), but I hope to find time to do it in a separate pull request soon...

@josemduarte josemduarte merged commit 69239c7 into rcsb:master Mar 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants