Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CIF #51

Merged
merged 81 commits into from May 9, 2021
Merged

Support CIF #51

merged 81 commits into from May 9, 2021

Conversation

mittinatten
Copy link
Owner

@mittinatten mittinatten commented Oct 24, 2020

Related to #48. Using gemmi (https://gemmi.readthedocs.io/en/latest/index.html).

  • Move CLI to be in C++11.
  • Make PoC.
  • Make autotools treat C++ properly
  • Investigate using Gemmi as PDB parser for regular PDBs as well (means the library also needs to be in C++, which in turn could affect Python bindings). - save for later
  • Investigate cost of/implement support for .gz-variants too. - save for later
  • Handle insertion codes
  • Handle (ignore) alternative conformations
  • Consider using Gemmi's structure class to avoid handling alternative conformations explicitly - save for later
  • Add changes to documentation

Support all CLI options also for CIF

  • --hetatm
  • --unkown
  • --separate-models
  • --join-models
  • --separate-chains
  • --chain-groups

Testing

  • Test on large set of PDB codes and make sure the calculation results are the same with .pdb files and .cif files, and that the code doesn't crash, etc.

@mittinatten
Copy link
Owner Author

History is a bit messy because I moved some changes that were due to changing editor setup to the dev branch in order to keep the diff here only related to the new feature.

* Convert CLI main to C++
* Add GEMMI as submodule
…s as in regular PDB files.

Handle insertion codes in CIF files.
@mittinatten
Copy link
Owner Author

Did some rebasing to make the commits clearer

mittinatten and others added 22 commits January 5, 2021 19:10
Addressed AltLoc CIF bug in structure_from_doc
* Add C++ lib to travis packages.

* Change travis language to C++.

* Test all branches in travis.

* Add -lsubunit linking for unit tests.

* Comment out unstable test.
… lacking freesasa_structure building implementation.
…ication; unable to remove duplication of func structure_from_doc with std::is_same, need to try something else.
@danny305
Copy link
Contributor

Super excited! Let me know if you encounter any bugs you need me to address!

src/cif.cc Outdated
Comment on lines 353 to 357
friend std::ostream &operator<<(std::ostream &os, const freesasa_MCRA &mcra)
{
os << "Atom(" << mcra._model << " " << mcra._chain << " " << mcra._res_num << " [" << mcra._residue << "] " << mcra._atom << ")";
return os;
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was only used for debugging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially for debugging but allows for a nice output of the atom. You can keep it or drop it. I don't think its currently being used anywhere.

@mittinatten
Copy link
Owner Author

So I'm going through everything one last time, and have committed a few minor things. A few things are stylistic. And I found a better solution for handling output. There is an option --output to specify another file than stdout. So I pass a filename instead of a file pointer to the CIF output function, see d609021.

I noticed we are missing docs for CIF output, so I'll fix that too.

@danny305
Copy link
Contributor

danny305 commented May 5, 2021

Awesome! yeah I originally wrote it to output to a file so I am glad you added that additional capability back.

@mittinatten
Copy link
Owner Author

I made one other minor change. RSA-values are output regardless of FREESASA_OUTPUT_SKIP_REL, in the case where there is no reference available the output will then look like this, which I think is still useful.

loop_
_freeSASA_rsa.asym_id
_freeSASA_rsa.seq_id
_freeSASA_rsa.comp_id
_freeSASA_rsa.abs_total
_freeSASA_rsa.rel_total
_freeSASA_rsa.abs_side_chain
_freeSASA_rsa.rel_side_chain
_freeSASA_rsa.abs_main_chain
_freeSASA_rsa.rel_main_chain
_freeSASA_rsa.abs_apolar
_freeSASA_rsa.rel_apolar
_freeSASA_rsa.abs_polar
_freeSASA_rsa.rel_polar
A 1 MET 54.393508 ? 19.085046 ? 35.308462 ? 28.483151 ? 25.910357 ?
A 2 GLN 74.212106 ? 70.132368 ? 4.079738 ? 11.657433 ? 62.554673 ?
...

@danny305
Copy link
Contributor

danny305 commented May 5, 2021

Yeah I made it output RSA regardless bc why not. Its a separate block if you don't need it then ignore it. If values are missing then they will be either ? or .. Which is standard for cif.

I don't see the change you made?

@mittinatten
Copy link
Owner Author

See c204ce7

Copy link
Contributor

@danny305 danny305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing that is missing from all this is where you specify the unit of the freeSASA value. In your original output there is a unit and I did not know where to place that in the output? Maybe it should go in the docs?

@mittinatten
Copy link
Owner Author

Yeah I thought about that too. I think the docs probably makes the most sense, otherwise we would need to add it to all the fields, I assume: _freeSASA_rsa.abs_side_chain_AA2 or _freeSASA_rsa.abs_side_chain_Angstrom^2, or something, which looks kind of strange.

@danny305
Copy link
Contributor

danny305 commented May 5, 2021

Yeah it looks super clunky. I think the docs is a better place.

@mittinatten
Copy link
Owner Author

I'm done for the day, but think we're really close now!

@mittinatten
Copy link
Owner Author

One more thing, if you can think of any CLI tests that we can use to check that everything works and catch regressions later that would be great. At least we should have one simple case where we compare with a stored CIF output so that we get an error if future modifications changes the format by accident.

@danny305
Copy link
Contributor

danny305 commented May 5, 2021

Im not sure I understand? Gemmi writes out the format and enforces it? if we did not follow the cif format then gemmi would error out? Could you elaborate/clarify exactly the kind of test you want?

@mittinatten
Copy link
Owner Author

I was simply thinking of one or more test that runs through the different branches of the CIF output code, so that if something breaks we catch it automatically. For instance if the freesasa_node interface changes somehow in the future, the test suite should trigger an error also for CIF output, to avoid regressions. Also we should check that illegal CLI options are caught with an error message, i.e. --format=cif 1234.pdb and --format=pdb --cif, etc.

Add more regression tests for CIF-related CLI options.
@mittinatten
Copy link
Owner Author

I added a few, that's probably enough for now.

@mittinatten
Copy link
Owner Author

I've been through the code enough times now to be pretty sure this is ok to publish. Thanks so much for your effort, really nice to collaborate!

I will merge it to the dev-branch and add some version information, etc, before merging to master. As mentioned I will release it as 2.1-beta to allow it to mature a little bit more. That means anyone who clones the repository will get the new features, but they won't be in the published docs yet, and we are still free to make minor changes to for example CLI options, etc, if we realize something is inconsistent in the interface.

@mittinatten mittinatten merged commit 2c4a002 into dev May 9, 2021
@mittinatten mittinatten deleted the feature/cif branch October 28, 2021 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants