Skip to content

MDAnalysis/MDAnalysisData

Repository files navigation

MDAnalysisData

Build Status codecov docs PRs welcome Anaconda-Server Badge DOI

Access to data for workshops and extended tests of MDAnalysis.

Data sets are stored at external stable URLs (e.g., on figshare, zenodo, or DataDryad) and this package provides a simple interface to download, cache, and access data sets.

Installation

To use, install the package

pip install --upgrade MDAnalysisData

or install with conda

conda install --channel conda-forge mdanalysisdata

Accessing data sets

Import the datasets and access your data set of choice:

from MDAnalysisData import datasets

adk = datasets.fetch_adk_equilibrium()

The returned object contains attributes with the paths to topology and trajectory files so that you can use it directly with, for instance, MDAnalysis:

import MDAnalysis as mda
u = mda.Universe(adk.topology, adk.trajectory)

The metadata object also contains a DESCR attribute with a description of the data set, including relevant citations:

print(adk.DESCR)

Managing data

Data are locally stored in the data directory ~/MDAnalysis_data (i.e., in the user's home directory). This location can be changed by setting the environment variable MDANALYSIS_DATA, for instance

export MDANALYSIS_DATA=/tmp/MDAnalysis_data

The location of the data directory can be obtained with

MDAnalysisData.base.get_data_home()

If the data directory is removed then data are downloaded again. Data file integrity is checked with a SHA256 checksum when the file is downloaded.

The data directory can we wiped with the function

MDAnalysisData.base.clear_data_home()

Contributing new datasets

Please add new datasets to MDAnalysisData. See Contributing new datasets for details, but in short:

  1. raise an issue in the issue tracker describing what you want to add; this issue will become the focal point for discussions where the developers can easily give advice
  2. deposit data in an archive under an Open Data compatible license (CC0 or CC-BY preferred)
  3. write accessor code in MDAnalysisData

Credits

This package is modelled after sklearn.datasets. It uses code from sklearn.datasets (under the BSD 3-clause license).

No data are included; please see the DESCR attribute for each data set for authorship, citation, and license information for the data.