Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structural bioinformatics #831

Merged
merged 7 commits into from
Feb 18, 2022

Conversation

gtauriello
Copy link
Contributor

Added page for Structural Bioinformatics domain as discussed in email exchange with Munazah Andrabi (@smza ).

Updated table with tool and resource list at https://docs.google.com/spreadsheets/d/16RESor_qQ_ygI0lQYHR23kbZJUobOWZUbOwhJbLptDE/edit#gid=268211668 using "struct bioinfo" as page ID. From the existing entries, the text of ModelArchive (maintained by my team) was updated as well and new entries were added at the end of the table.

@smza
Copy link
Collaborator

smza commented Feb 11, 2022

@gtauriello is this the updated version of your text or do you need to add more content?

Co-authored-by: Bert Droesbeke <44875756+bedroesb@users.noreply.github.com>
@gtauriello
Copy link
Contributor Author

@smza Yes this is the updated text. No more content to be added here.

@smza
Copy link
Collaborator

smza commented Feb 11, 2022

@smza Yes this is the updated text. No more content to be added here.

Thanks. I'll start reviewing it then and come back to you with comments. Cheers

Copy link
Collaborator

@smza smza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gtauriello

The content of this section looks good. However as I pointed out this only addresses the deposition of models. There is no mention of Data Management while the models are being created. Is that something you're planning to do?

## Introduction

<!--- In this section you should provide a brief overview of the domain from the data management perspective, mentioning and putting into context the challenges that are particular to the domain, which will be the object of sections below. --->

Structural bioinformatics provides scientific methods to analyse, predict, and validate the three-dimensional structure of biological macromolecules such as proteins, RNA, DNA, or carbohydrates including small molecules bound to them. It also provides an important link with the genomics and structural biology communities. One objective of structural bioinformatics is the creation of new methods of analysis and manipulation of biological macromolecular data in order to predict their structures, function and interactions. This document describes guidelines to deposit these predictions together with relevant metadata according to FAIR principles.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about data management while handling the data being used for predictions and analysis? Shouldn't that be addressed before deposition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure if I understand the comment. The data needed to be kept while models are being generated is the same data that we want to have during deposition. Indeed model predictors will need to keep track of those things while they do the predictions since otherwise they may simply not have the data required for the deposition. Should I add a sentence in the intro section to make this more explicit?
E.g. "Note that predictors should collect the relevant metadata already while doing the predictions to make sure the data is available during deposition."?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that is what I meant. A bit more details about metadata that needs to be generated while the models are being created.

Copy link
Collaborator

@smza smza Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this I'm also concerned about other aspects of Structural Bioinformatics that are not included here e.g. the binding site predictions using 3D models. This can be created as separate sub section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smza I updated the text to clarify the need to collect data already while doing the predictions. From conversations with Sameer, it seems that this is in line with the "data harvesting" process done for experimental structures. He also commented on function predictions as being a sub section that could be added later by someone in his team. Would that be ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gtauriello. What Sameer has suggested is reasonable.


### Considerations <!--- (optional) --->
<!--- Direct and concise considerations, structured in bullet points and typically framed as questions RDMkit reader should ask themselves in order to arrive at the best solution among those listed below. One level of nesting of bullet points within considerations is fine, but more levels should be avoided. --->
Researchers in the field should be able to find predictions of macromolecular structures, access their coordinates, understand how and why they were produced, and have estimates of model quality to assess the applicability of the model for specific applications. The considerations and solutions described below are written from the perspective of protein structure predictions but they also apply to other types of macromolecular structures.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I believe is the description for the sections to follow, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is the description and is accordingly part of the "Description" subsection in the file.

* Make available using a dedicated web service for large-scale modelling efforts which are updated on a regular basis using automated prediction methods. Unified access to such services can be provided with the [3D-Beacons network](https://3d-beacons.org) which is being developed by the [ELIXIR 3D-BioInfo Community](https://elixir-europe.org/communities/3d-bioinfo). The data providers currently connected in the network are listed in [the 3D-Beacons documentation](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/docs#partners). An appropriate licence must be associated with the models (check the [RDMkit licensing page](https://rdmkit.elixir-europe.org/licensing) for guidance on this) and must be compatible with CC-BY 4.0 if the models are to be distributed in the 3D-Beacons network.
* Model coordinates are preferably stored in the standard PDB archive format [PDBx/mmCIF](https://mmcif.wwpdb.org/). While, for many purposes, the legacy PDB format may suffice to store model coordinates and is still widely used, the format is no longer being modified or extended.
* Model quality estimates can be computed globally, per-residue, and per-residue-pair. The estimates should be computed using a relatively recent and well benchmarked tool or by the structure prediction method itself. Please check [CAMEO](https://cameo3d.org), [CASP](https://predictioncenter.org), and [CAPRI](https://www.ebi.ac.uk/pdbe/complex-pred/capri/) to find suitable quality estimators. The [3D-BioInfo Community](https://elixir-europe.org/communities/3d-bioinfo) is also currently working to further improve benchmarking for protein complexes, protein-ligand interactions, and nucleic acid structures. By convention, the main per-residue quality estimates are stored in place of B-factors in model coordinate files. In mmCIF files any number of quality estimates can be properly described and stored in the ma_qa_metric category of the PDBx/mmCIF ModelArchive Extension Dictionary described below.
* Metadata for theoretical models of macromolecular structures can be provided during the deposition in ModelArchive but should preferably be stored using the [PDBx/mmCIF ModelCIF Extension Dictionary](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index) prior to deposition. The extension is being developed by the [ModelCIF working group](https://wwpdb.org/task/modelcif) with input from the community. Feedback and change requests are welcome and can be given on [github](https://github.com/ihmwg/ModelCIF). ModelArchive includes [documentation](https://modelarchive.org/help) on how to provide metadata and minimal requirements for it. Generally, the metadata must include:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this section up after point 1 since this is description of how to deposit to ModelArchive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here relates to the ModelCIF format in general. The format is not only used by ModelArchive but is generally the preferred way to store the listed metadata. That being we can reorder the bullet points as desired. Although I am not sure where "point 1" would be in this context. Can you elaborate on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By Point 1 I meant the first point in this section.
The way this section is written gives the impression that the formats for ModelArchive are being addressed in this section. That is why I suggested this to be in continuation of the first point describing ModelArchive. If you think this is better placed where it is then maybe adding a line describing the usability of ModelCIF format will be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that text was written too ModelArchive-centric. I rephrased it to clarify the general use of ModelCIF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks

@bedroesb bedroesb linked an issue Feb 14, 2022 that may be closed by this pull request
Restricting guidelines to structure predictions until a section on functional predictions is added.
gtauriello and others added 2 commits February 17, 2022 12:18
Added coordinator

Co-authored-by: Bert Droesbeke <44875756+bedroesb@users.noreply.github.com>
@bedroesb
Copy link
Member

@gtauriello one last thing on my side:

Feel free to add some metadata of the contributors the the CONTRIBUTORS file. Contributors will get listed anyway.

@smza smza merged commit cce8eef into elixir-europe:master Feb 18, 2022
@gtauriello gtauriello deleted the add_structural_bioinformatics branch February 18, 2022 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New page - Structural Bioinformatics
3 participants