Add structural bioinformatics#831
Conversation
|
@gtauriello is this the updated version of your text or do you need to add more content? |
Co-authored-by: Bert Droesbeke <44875756+bedroesb@users.noreply.github.com>
|
@smza Yes this is the updated text. No more content to be added here. |
Thanks. I'll start reviewing it then and come back to you with comments. Cheers |
There was a problem hiding this comment.
Hi @gtauriello
The content of this section looks good. However as I pointed out this only addresses the deposition of models. There is no mention of Data Management while the models are being created. Is that something you're planning to do?
|
|
||
| <!--- In this section you should provide a brief overview of the domain from the data management perspective, mentioning and putting into context the challenges that are particular to the domain, which will be the object of sections below. ---> | ||
|
|
||
| Structural bioinformatics provides scientific methods to analyse, predict, and validate the three-dimensional structure of biological macromolecules such as proteins, RNA, DNA, or carbohydrates including small molecules bound to them. It also provides an important link with the genomics and structural biology communities. One objective of structural bioinformatics is the creation of new methods of analysis and manipulation of biological macromolecular data in order to predict their structures, function and interactions. This document describes guidelines to deposit these predictions together with relevant metadata according to FAIR principles. |
There was a problem hiding this comment.
How about data management while handling the data being used for predictions and analysis? Shouldn't that be addressed before deposition?
There was a problem hiding this comment.
I am not 100% sure if I understand the comment. The data needed to be kept while models are being generated is the same data that we want to have during deposition. Indeed model predictors will need to keep track of those things while they do the predictions since otherwise they may simply not have the data required for the deposition. Should I add a sentence in the intro section to make this more explicit?
E.g. "Note that predictors should collect the relevant metadata already while doing the predictions to make sure the data is available during deposition."?
There was a problem hiding this comment.
Indeed that is what I meant. A bit more details about metadata that needs to be generated while the models are being created.
There was a problem hiding this comment.
Aside from this I'm also concerned about other aspects of Structural Bioinformatics that are not included here e.g. the binding site predictions using 3D models. This can be created as separate sub section.
There was a problem hiding this comment.
@smza I updated the text to clarify the need to collect data already while doing the predictions. From conversations with Sameer, it seems that this is in line with the "data harvesting" process done for experimental structures. He also commented on function predictions as being a sub section that could be added later by someone in his team. Would that be ok?
There was a problem hiding this comment.
Thanks @gtauriello. What Sameer has suggested is reasonable.
|
|
||
| ### Considerations <!--- (optional) ---> | ||
| <!--- Direct and concise considerations, structured in bullet points and typically framed as questions RDMkit reader should ask themselves in order to arrive at the best solution among those listed below. One level of nesting of bullet points within considerations is fine, but more levels should be avoided. ---> | ||
| Researchers in the field should be able to find predictions of macromolecular structures, access their coordinates, understand how and why they were produced, and have estimates of model quality to assess the applicability of the model for specific applications. The considerations and solutions described below are written from the perspective of protein structure predictions but they also apply to other types of macromolecular structures. |
There was a problem hiding this comment.
This I believe is the description for the sections to follow, right?
There was a problem hiding this comment.
Yes this is the description and is accordingly part of the "Description" subsection in the file.
| * Make available using a dedicated web service for large-scale modelling efforts which are updated on a regular basis using automated prediction methods. Unified access to such services can be provided with the [3D-Beacons network](https://3d-beacons.org) which is being developed by the [ELIXIR 3D-BioInfo Community](https://elixir-europe.org/communities/3d-bioinfo). The data providers currently connected in the network are listed in [the 3D-Beacons documentation](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/docs#partners). An appropriate licence must be associated with the models (check the [RDMkit licensing page](https://rdmkit.elixir-europe.org/licensing) for guidance on this) and must be compatible with CC-BY 4.0 if the models are to be distributed in the 3D-Beacons network. | ||
| * Model coordinates are preferably stored in the standard PDB archive format [PDBx/mmCIF](https://mmcif.wwpdb.org/). While, for many purposes, the legacy PDB format may suffice to store model coordinates and is still widely used, the format is no longer being modified or extended. | ||
| * Model quality estimates can be computed globally, per-residue, and per-residue-pair. The estimates should be computed using a relatively recent and well benchmarked tool or by the structure prediction method itself. Please check [CAMEO](https://cameo3d.org), [CASP](https://predictioncenter.org), and [CAPRI](https://www.ebi.ac.uk/pdbe/complex-pred/capri/) to find suitable quality estimators. The [3D-BioInfo Community](https://elixir-europe.org/communities/3d-bioinfo) is also currently working to further improve benchmarking for protein complexes, protein-ligand interactions, and nucleic acid structures. By convention, the main per-residue quality estimates are stored in place of B-factors in model coordinate files. In mmCIF files any number of quality estimates can be properly described and stored in the ma_qa_metric category of the PDBx/mmCIF ModelArchive Extension Dictionary described below. | ||
| * Metadata for theoretical models of macromolecular structures can be provided during the deposition in ModelArchive but should preferably be stored using the [PDBx/mmCIF ModelCIF Extension Dictionary](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index) prior to deposition. The extension is being developed by the [ModelCIF working group](https://wwpdb.org/task/modelcif) with input from the community. Feedback and change requests are welcome and can be given on [github](https://github.com/ihmwg/ModelCIF). ModelArchive includes [documentation](https://modelarchive.org/help) on how to provide metadata and minimal requirements for it. Generally, the metadata must include: |
There was a problem hiding this comment.
Move this section up after point 1 since this is description of how to deposit to ModelArchive.
There was a problem hiding this comment.
The description here relates to the ModelCIF format in general. The format is not only used by ModelArchive but is generally the preferred way to store the listed metadata. That being we can reorder the bullet points as desired. Although I am not sure where "point 1" would be in this context. Can you elaborate on this?
There was a problem hiding this comment.
By Point 1 I meant the first point in this section.
The way this section is written gives the impression that the formats for ModelArchive are being addressed in this section. That is why I suggested this to be in continuation of the first point describing ModelArchive. If you think this is better placed where it is then maybe adding a line describing the usability of ModelCIF format will be helpful.
There was a problem hiding this comment.
Indeed that text was written too ModelArchive-centric. I rephrased it to clarify the general use of ModelCIF.
Restricting guidelines to structure predictions until a section on functional predictions is added.
Added coordinator Co-authored-by: Bert Droesbeke <44875756+bedroesb@users.noreply.github.com>
|
@gtauriello one last thing on my side: Feel free to add some metadata of the contributors the the CONTRIBUTORS file. Contributors will get listed anyway. |
Added page for Structural Bioinformatics domain as discussed in email exchange with Munazah Andrabi (@smza ).
Updated table with tool and resource list at https://docs.google.com/spreadsheets/d/16RESor_qQ_ygI0lQYHR23kbZJUobOWZUbOwhJbLptDE/edit#gid=268211668 using "struct bioinfo" as page ID. From the existing entries, the text of ModelArchive (maintained by my team) was updated as well and new entries were added at the end of the table.