Metadata management for the National Microbiome Data Collaborative

Warning

This repository is deprecated. Please consider putting issues in:

NMDC Schema for schema changes/issues.
NMDC Ontology for issues related to producing terms or term subsets.
NMDC Runtime for isseus related to ETL issues.

Metadata management for the National Microbiome Data Collaborative

The purpose of this repository is to manage metadata for the National Microbiome Data Collaborative (NMDC). The NMDC is a multi-organizational effort to enable integrated microbiome data across diverse areas in medicine, agriculture, bioenergy, and the environment. This integrated platform facilitates comprehensive discovery of and access to multidisciplinary microbiome data in order to unlock new possibilities with microbiome data science.

Tasks managed by the repository are:

Generating the schema
Deploying the documentation
Integrating metadata from multiple environmental data repositories

Background

The NMDC Introduction to metadata and ontologies primer describes the context for this project.

Schema

See the slides describing the schema

The NMDC schema is used during the translation process to specify how metadata elements are related.

The schema is also available as:

Documentation

Documentation for the NMDC schema can be browsed here:

https://microbiomedata.github.io/nmdc-metadata/

NMDC data

A zipped file of the NMDC can be downloaded here (JSON format).

Mapping resource

We use SSSOM to map fields in primary data sources to standard terms. The mapping between the GOLD data and MIxS terms this SSSOM file.

Standardization of characteristics

Entities in the schema are annotated with characteristics. When possible, we use standard terminologies and ontologies to define these characteristics. These standards include:

We are actively involved in updating the MIxS standards (mixs-ng) and creating an RDF version of MIxS (mixs-rdf).

See also our analysis of MIxS descriptors

Metadata sources

At present, we ingest metadata from the Joint Genome Institute (JGI) and the Environmental Molecular Sciences Lab (EMSL).

The NMDC schema and translation process will be modified as more metadata sources become available.

Metadata integration

We use Jupyter notebooks to integrate the metadata sources. This allows us to iterate quickly in a transparent and interactive manner as new metadata sources become available.

Development of more comprehensive ETL pipeline will progress as the metadata sources and schema become more concrete.

Identifiers

See identifiers documentation

Name		Name	Last commit message	Last commit date
Latest commit History 907 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
identify		identify
scratch		scratch
scripts		scripts
.gitignore		.gitignore
Installation.md		Installation.md
LICENSE.md		LICENSE.md
Makefile		Makefile
Pipfile		Pipfile
README.md		README.md
__init__.py		__init__.py
_config.yml		_config.yml
mkdocs.yml		mkdocs.yml
pybuild.sh		pybuild.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
update-docs.md		update-docs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata management for the National Microbiome Data Collaborative

Background

Schema

Documentation

NMDC data

Mapping resource

Standardization of characteristics

Metadata sources

Metadata integration

Identifiers

About

Releases

Packages

Contributors 13

Languages

License

microbiomedata/nmdc-metadata

Folders and files

Latest commit

History

Repository files navigation

Metadata management for the National Microbiome Data Collaborative

Background

Schema

Documentation

NMDC data

Mapping resource

Standardization of characteristics

Metadata sources

Metadata integration

Identifiers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages