MESH

Python package helping to work with the MESH dataset. This package is currently primarily focused on the chemicals and drugs category of the MESH dataset and integrates the associated PubChem database SMILES and InChI keys.

Installation

At this moment, the package is not available on PyPI. To install it, you can clone the repository and install it using pip:

pip install .

Usage

The package provides two main functionalities: downloading a pre-built MESH dataset and generating a custom MESH dataset. Once you have the dataset, you can use the Dataset class to work with it.

Downloading a pre-built MESH dataset

While this package allows you to build a custom MESH dataset, since building the dataset requires reources, we also provide pre-built datasets which we host on Zenodo. The structure of any of the hosted tarballs is as follows:

mesh_chemistry_2024.tar.gz
├── chemicals.csv
├── descriptors.csv
├── chemicals_to_descriptors.csv
├── mesh_dag.csv
├── metadata.json

Where (you can see examples of these files just below):

chemicals.csv contains information about chemicals and drugs.
descriptors.csv contains information about descriptors.
chemicals_to_descriptors.csv contains the relationships between chemicals and descriptors.
mesh_dag.csv contains the Directed Acyclic Graph (DAG) of the MESH dataset.
metadata.json contains metadata about the dataset.

To download a pre-built dataset, you can use the following code:

from mesh import Dataset

dataset = Dataset.load("mesh_chemistry_2024")

Find the available rasterized datasets on Zenodo.

Here's some statistics regarding the rasterized MESH datasets, all created with the same settings described in the next section:

Version name	Number of nodes	Number of edges	Number of chemicals	Number of descriptors
MESH 2024	334220	367694	323679	10542
MESH 2023	332999	365801	322591	10409
MESH 2022	330106	364653	319739	10367
MESH 2021	328884	363505	318391	10325

Generating a custom MESH dataset

The package provides a Dataset class that allows you to work with the MESH dataset. The dataset is built using the DatasetSettings class, which allows you to specify which parts of the dataset you want to include. The ChemicalsAndDrugsSettings class allows you to specify which parts of the chemicals and drugs category you want to include.

Particularly helpful, is the ability to include SMILES and InChI keys for the chemicals and drugs. This is done by specifying the include_smiles and include_inchi_keys methods of the ChemicalsAndDrugsSettings class.

from mesh.settings import DatasetSettings, ChemicalsAndDrugsSettings
from mesh import Dataset


def build_mesh_chemistry_2024() -> Dataset:
    """Build MESH 2024 dataset."""
    # First, we need to define the settings for the dataset.
    cad: ChemicalsAndDrugsSettings = (
        ChemicalsAndDrugsSettings()
        # In this case, we are including all of the submodules of
        # categories of chemicals and drugs.
        .include_all_submodules()
        # We also want to include SMILES, which we obtain from the
        # PUBCHEM database.
        .include_smiles()
        # Analogously, we want to include InChI keys, which we obtain
        # from the PUBCHEM database.
        .include_inchi_keys()
    )
    settings = (
        # We are using the MESH 2024 version.
        DatasetSettings(version=2024)
        # We want to retrieve data only regarding chemicals and drugs.
        .include_chemicals_and_drugs(cad)
        # And we want to print the progress of the dataset retrieval.
        .set_verbose(True)
    )
    # Now, we build the dataset. This will download the necessary files
    # and rasterize the dataset.
    dataset = Dataset.build(settings)
    return dataset


if __name__ == "__main__":
    # We build the MESH 2024 dataset.
    mesh_chemistry_2024: Dataset = build_mesh_chemistry_2024()
    # And we save it to disk.
    mesh_chemistry_2024.save("mesh_chemistry_2024", tarball=False)

Resulting CSVs

The resulting CSVs will be saved in the mesh_chemistry_2024 directory. The directory will contain the following CSVs:

`chemicals.csv`

unique_identifier	name	compound_id	substance_id	smiles	inchi	inchikey
C000002	bevonium	31800.0	500762995.0	C[N+]1(CCCCC1COC(=O)C(C2=CC=CC=C2)(C3=CC=CC=C3)O)C	InChI=1S/C22H28NO3/c1-23(2)16-10-9-15-20(23)17-26-21(24)22(25,18-11-5-3-6-12-18)19-13-7-4-8-14-19/h3-8,11-14,20,25H,9-10,15-17H2,1-2H3/q+1	UHUMRJKDOOEQIG-UHFFFAOYSA-N
C000009	N-acetylglucosaminylasparagine	123826.0	500203198.0	CC(=O)N[C@@H]1C@HO	InChI=1S/C12H21N3O8/c1-4(17)14-8-10(20)9(19)6(3-16)23-11(8)15-7(18)2-5(13)12(21)22/h5-6,8-11,16,19-20H,2-3,13H2,1H3,(H,14,17)(H,15,18)(H,21,22)/t5-,6+,8+,9+,10+,11+/m0/s1	YTTRPBWEMMPYSW-HRRFRDKFSA-N
C000011	5-(n-acetaminophenylazo)-8-oxyquinoline	114081.0	484035752.0	CC(=O)NC1=CC=C(C=C1)N=NC2=C3C=CC=NC3=C(C=C2)O	InChI=1S/C17H14N4O2/c1-11(22)19-12-4-6-13(7-5-12)20-21-15-8-9-16(23)17-14(15)3-2-10-18-17/h2-10,23H,1H3,(H,19,22)	DKRPSSOODLBKPQ-UHFFFAOYSA-N
C000015	N-acetyl-L-arginine	67427.0	500710457.0	CC(=O)NC@@HC(=O)O	InChI=1S/C8H16N4O3/c1-5(13)12-6(7(14)15)3-2-4-11-8(9)10/h6H,2-4H2,1H3,(H,12,13)(H,14,15)(H4,9,10,11)/t6-/m0/s1	SNEIUMQYRCDYCH-LURJTMIESA-N
C000020	N-acetylneuraminoyllactose		489852514.0
C000021	acetylnovadral

`descriptors.csv`

unique_identifier	name	compound_id	substance_id	smiles	inchikey
D000001	Calcimycin	139593372.0	500766157.0	C[C@@H]1CCC2(C@HC)O[C@@H]1CC4=NC5=C(O4)C=CC(=C5C(=O)O)NC	HIYAVKIYRIFSCZ-LGHBZWQHSA-N
D000002	Temefos	5392.0	500974612.0	COP(=S)(OC)OC1=CC=C(C=C1)SC2=CC=C(C=C2)OP(=S)(OC)OC	WWJZWCUNLNYYAU-UHFFFAOYSA-N
D000017	ABO Blood-Group System
D000019	Abortifacient Agents
D000020	Abortifacient Agents, Nonsteroidal
D000021	Abortifacient Agents, Steroidal
D000036	Abrin		486451862.0
D000040	Abscisic Acid	5702609.0	500195639.0	CC1=CC(=O)CC([C@]1(/C=C/C(=C/C(=O)O)/C)O)(C)C	JLIDBLDQVAYHNE-IBPUIESWSA-N

`chemicals_to_descriptors.csv`

chemical	descriptor
C000002	D001561
C000006	D061389
C000009	D000117
C000011	D015125
C000015	D001120
C000020	D007785

`mesh_dag.csv`

parent	child
D000001	D000095662
D000001	D001583
D000002	D063086
D000017	D001789
D000019	D012102
D000020	D000019
D000021	D000019

`metadata.json`

{
    "version": {
        "version": 2024,
        "descriptors": "https://nlmpubs.nlm.nih.gov/projects/mesh/2024/asciimesh/20240101/d2024.bin",
        "chemicals": "https://nlmpubs.nlm.nih.gov/projects/mesh/2024/asciimesh/20240101/c2024.bin"
    },
    "roots": [
        {
            "root": "Chemicals and Drugs",
            "included_codes": [
                "D01",
                "D02",
                "D03",
                "D04",
                "D05",
                "D06",
                "D08",
                "D09",
                "D10",
                "D12",
                "D13",
                "D20",
                "D23",
                "D25",
                "D26",
                "D27"
            ],
            "include_smiles": true
        }
    ],
    "downloads_directory": "downloads"
}

To NetworkX

Since the MESH dataset is a Directed Acyclic Graph (DAG), you can convert it to a NetworkX graph. This is done by calling the to_networkx method of the Dataset class.

import networkx as nx

# We convert the MESH dataset to a NetworkX graph.
graph: nx.DiGraph = mesh_chemistry_2024.to_networkx()

# Now, we can use the NetworkX graph as we would any other NetworkX graph.
print(nx.info(graph))

In this case, the output will be:

DiGraph with 334220 nodes and 367694 edges

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
mesh		mesh
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MESH

Installation

Usage

Downloading a pre-built MESH dataset

Generating a custom MESH dataset

Resulting CSVs

`chemicals.csv`

`descriptors.csv`

`chemicals_to_descriptors.csv`

`mesh_dag.csv`

`metadata.json`

To NetworkX

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

LucaCappelletti94/mesh

Folders and files

Latest commit

History

Repository files navigation

MESH

Installation

Usage

Downloading a pre-built MESH dataset

Generating a custom MESH dataset

Resulting CSVs

chemicals.csv

descriptors.csv

chemicals_to_descriptors.csv

mesh_dag.csv

metadata.json

To NetworkX

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`chemicals.csv`

`descriptors.csv`

`chemicals_to_descriptors.csv`

`mesh_dag.csv`

`metadata.json`

Packages