Skip to content

marcellszi/rna3db

Repository files navigation

RNA3DB

A dataset of non-redundant RNA structures from the PDB. RNA3DB contains:

  • All RNA chains in the PDB, labelled with non-coding RNA families
  • Non-redundant clustering of the above chains, suitable for training and benchmarking deep learning models

Getting started

We provide periodically updated versions of RNA3DB in JSON format, along with several intermediate steps used to generate the files.

Additionally, as part of RNA3DB, we release the results of Infernal homology search on all RNA chains found in the PDB. For a short demonstration on how RNA3DB can be used to parse these files, see tabular_demo.

For more general help getting started, see RNA3DB's Wiki.

Download

The latest version of RNA3DB can be found under releases.

We provide the following files:

  • rna3db-cmscans.tar.gz [Download]
    • Results of two-step Infernal homology seach on all RNA chains in the PDB
    • See tabular_demo...
  • rna3db-jsons.tar.gz [Download]
    • All JSON files generated by RNA3DB
  • rna3db-mmcifs.tar.xz [Download]
    • Hierarchical folders of the training/testing sets containing single-chain PDBx/mmCIF files
    • Most convenient for getting started with training and testing using RNA3DB
    • This format is currently experimental. If you find any problems, please submit an issue.
    • Note: rna3db-mmcifs.v2.tar.xz was compressed using LMZA. Most installations of GNU tar can usually uncompress these files without an issue. If not, you may need to install XZ utils.

Generating the dataset from scratch

If you wish to build your own dataset from scratch, please follow see Building RNA3DB from scratch.