# Building a MatGraphDB Example with MPNearHull Data

In this notebook, we demonstrate how to build a materials graph database using the
[MatGraphDB](https://github.com/your/matgraphdb) framework with the MPNearHull dataset.

The steps include:
1. Importing required libraries and setting up configuration paths.
2. Downloading and extracting the dataset (and raw materials data if needed).
3. Creating a MatGraphDB instance.
4. Initializing node generators.
5. Initializing edge generators.
6. Verifying the database setup.

Follow along and run each cell to see how the database is constructed.

## Setup

In [1]:
import os
from pathlib import Path

FILE_DIR = Path(".")
DATA_DIR = FILE_DIR / "data"

### Define Function for Downloading and Extracting Data

In [2]:
from matgraphdb.datasets import MPNearHull

DB_PATH = DATA_DIR / "MPNearHull"
mpdb = MPNearHull(storage_path=DB_PATH,initialize_from_scratch=False)

[INFO] 2025-05-11 10:21:41 - parquetdb.utils.config[37][load_config] - Config file: C:\Users\lllang\AppData\Local\parquetdb\parquetdb\config.yml
[INFO] 2025-05-11 10:21:41 - parquetdb.utils.config[41][load_config] - Setting data_dir to C:\Users\lllang\Desktop\Current_Projects\MatGraphDB\data


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

.gitattributes:   0%|          | 0.00/2.46k [00:00<?, ?B/s]

generator_dependency.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

ModuleNotFoundError: No module named 'matgraphdb.materials'

## Initialization

### Initialize a Materials Store

In [4]:
from matgraphdb import MaterialStore

materials_store = MaterialStore(storage_path=MATERIALS_PATH)
print(materials_store)

NODE STORE SUMMARY
Node type: material
• Number of nodes: 80643
• Number of features: 136
Storage path: ..\..\data\examples\01\material


############################################################
METADATA
############################################################
• class: MaterialStore
• class_module: matgraphdb.materials.nodes.materials
• node_type: material
• name_column: id

############################################################
NODE DETAILS
############################################################



## Initialize a MatGraphDB Instance

In [5]:
from matgraphdb import MatGraphDB

if not os.path.exists(MATGRAPHDB_PATH):
    shutil.rmtree(MATGRAPHDB_PATH)
mdb = MatGraphDB(storage_path=MATGRAPHDB_PATH,materials_store=materials_store)

print(mdb)

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 1
------------------------------------------------------------
• Node type: material
  - Number of nodes: 80643
  - Number of features: 136
  - db_path: ..\..\data\examples\01\MatGraphDB\nodes\material
------------------------------------------------------------

############################################################
EDGE DETAILS
#########

## Adding Nodes

In this section, we will add the nodes to the MatGraphDB instance. We will be using some of the built-in node generators to add the nodes to the MatGraphDB instance.

In [6]:
from matgraphdb import generators

# Here we define the generator functions and arguments if they are needed. 
# For instance, to get the materials sites and lattices, we need to pass the materials store to the generator function.
node_generators = [
    {"generator_func": generators.element},
    {"generator_func": generators.chemenv},
    {"generator_func": generators.crystal_system},
    {"generator_func": generators.magnetic_state},
    {"generator_func": generators.oxidation_state},
    {"generator_func": generators.space_group},
    {"generator_func": generators.wyckoff},
    {
        "generator_func": generators.material_site,
        "generator_args": {"material_store": mdb.node_stores["material"]},
    },
    {
        "generator_func": generators.material_lattice,
        "generator_args": {"material_store": mdb.node_stores["material"]},
    },
]


Now we can add the node generators to the MatGraphDB instance. When we add the generator, it will immediately execute and add the nodes to the database.

In [7]:
# Add each node generator to the database.
for generator in node_generators:
    generator_func = generator.get("generator_func")
    generator_args = generator.get("generator_args", None)
    print(f"Adding node generator: {generator_func.__name__}")
    mdb.add_node_generator(generator_func=generator_func, generator_args=generator_args)

print("Node generators have been initialized.")

print(mdb)


Adding node generator: element
Adding node generator: chemenv
Adding node generator: crystal_system
Adding node generator: magnetic_state
Adding node generator: oxidation_state
Adding node generator: space_group
Adding node generator: wyckoff
Adding node generator: material_site
Adding node generator: material_lattice
Node generators have been initialized.
GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total 

## Adding Edges

In this section, we will add the edges to the MatGraphDB instance. We will be using some of the built-in edge generators to add the edges to the MatGraphDB instance.

In [8]:

# List of edge generator configurations.
edge_generators = [
    {
        "generator_func": generators.element_element_neighborsByGroupPeriod,
        "generator_args": {"element_store": mdb.node_stores["element"]},
    },
    {
        "generator_func": generators.element_oxiState_canOccur,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "oxiState_store": mdb.node_stores["oxidation_state"],
        },
    },
    {
        "generator_func": generators.material_chemenv_containsSite,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "chemenv_store": mdb.node_stores["chemenv"],
        },
    },
    {
        "generator_func": generators.material_crystalSystem_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "crystal_system_store": mdb.node_stores["crystal_system"],
        },
    },
    {
        "generator_func": generators.material_element_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "element_store": mdb.node_stores["element"],
        },
    },
    {
        "generator_func": generators.material_lattice_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "lattice_store": mdb.node_stores["material_lattice"],
        },
    },
    {
        "generator_func": generators.material_spg_has,
        "generator_args": {
            "material_store": mdb.node_stores["material"],
            "spg_store": mdb.node_stores["space_group"],
        },
    },
    {
        "generator_func": generators.element_chemenv_canOccur,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "chemenv_store": mdb.node_stores["chemenv"],
            "material_store": mdb.node_stores["material"],
        },
    },
    {
        "generator_func": generators.spg_crystalSystem_isApart,
        "generator_args": {
            "spg_store": mdb.node_stores["space_group"],
            "crystal_system_store": mdb.node_stores["crystal_system"],
        },
    },
    {
        "generator_func": generators.element_element_bonds,
        "generator_args": {
            "element_store": mdb.node_stores["element"],
            "material_store": mdb.node_stores["material"],
        },
    },
]


# Add each edge generator to the database and run them immediately.
for generator in edge_generators:
    generator_func = generator.get("generator_func")
    generator_args = generator.get("generator_args", None)
    print(f"Adding edge generator: {generator_func.__name__}")
    mdb.add_edge_generator(generator_func=generator_func, generator_args=generator_args, run_immediately=True)

print("Edge generators have been initialized.")
print(mdb)

Adding edge generator: element_element_neighborsByGroupPeriod
Adding edge generator: element_oxiState_canOccur
Adding edge generator: material_chemenv_containsSite
Adding edge generator: material_crystalSystem_has
Adding edge generator: material_element_has
Adding edge generator: material_lattice_has
Adding edge generator: material_spg_has
Adding edge generator: element_chemenv_canOccur
Adding edge generator: spg_crystalSystem_isApart
Adding edge generator: element_element_bonds
Edge generators have been initialized.
GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples

## Verifying the Database


In [9]:
print(mdb)

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: ..\..\data\examples\01\MatGraphDB
└── Repository structure:
    ├── nodes/                 (..\..\data\examples\01\MatGraphDB\nodes)
    ├── edges/                 (..\..\data\examples\01\MatGraphDB\edges)
    ├── edge_generators/       (..\..\data\examples\01\MatGraphDB\edge_generators)
    ├── node_generators/       (..\..\data\examples\01\MatGraphDB\node_generators)
    └── graph/                 (..\..\data\examples\01\MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 10
------------------------------------------------------------
• Node type: material
  - Number of nodes: 80643
  - Number of features: 136
  - db_path: ..\..\data\examples\01\MatGraphDB\nodes\material
------------------------------------------------------------
• Node type: element
  - Number of nodes: 118
  - Number of features: 99
  - db_pat