In [1]:
!pip install matgraphdb
!pip install ipykernel



# Example 2 - Managing Graphs in MatGraphDB

In this notebook, we'll learn how to:

1. Add new nodes and node types.
2. Add new edges and edge types.
3. Create node generators that automatically produce nodes based on a predefined function.
4. Create edge generators that automatically produce edges based on a predefined function.

We'll use the `MatGraphDB` class from `matgraphdb` to demonstrate these features. If you haven't already installed `matgraphdb`, run the previous cell.


Next, we can load the materials data into `MatGraphDB` on initialization. We do this by providing a `MaterialStore` instance to the `materials_store` argument.

In [2]:
import os
import shutil

from matgraphdb import MatGraphDB

storage_path = "MatGraphDB"
if os.path.exists(storage_path):
    shutil.rmtree(storage_path)

mdb = MatGraphDB(storage_path=storage_path)

print(mdb.summary())

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: MatGraphDB
└── Repository structure:
    ├── nodes/                 (MatGraphDB\nodes)
    ├── edges/                 (MatGraphDB\edges)
    ├── edge_generators/       (MatGraphDB\edge_generators)
    ├── node_generators/       (MatGraphDB\node_generators)
    └── graph/                 (MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 1
------------------------------------------------------------
• Node type: materials
  - Number of nodes: 0
  - Number of features: 1
  - db_path: MatGraphDB\nodes\materials
------------------------------------------------------------

############################################################
EDGE DETAILS
############################################################
Total edge types: 0
------------------------------------------------------------

###############################

As you can see, the `MatGraphDB` instance has been initialized with a `MaterialStore` instance.

Currently, there are 1000 'materials' nodes in the `MaterialStore` instance, where the materials nodes have 136 columns.


> Note: For some of these materials, the column values may be null.


## 1. New Nodes

By default, no custom node types (besides any internal ones MatGraphDB might create) exist in a fresh `MatGraphDB`. You can add your own node types via `add_node_type(...)`. This creates an empty `NodeStore` for that type.

In [3]:
# Add a node type called 'user' and 'item'
custom_node_type = "custom_node_1"

mdb.add_node_type(custom_node_type)

# These nodes will be stored in MatGraphDB/nodes/custom_node_1/custom_node_1.parquet
print("Current node_stores:", list(mdb.node_stores.keys()))
print(mdb.summary())

Current node_stores: ['materials', 'custom_node_1']
GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: MatGraphDB
└── Repository structure:
    ├── nodes/                 (MatGraphDB\nodes)
    ├── edges/                 (MatGraphDB\edges)
    ├── edge_generators/       (MatGraphDB\edge_generators)
    ├── node_generators/       (MatGraphDB\node_generators)
    └── graph/                 (MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 2
------------------------------------------------------------
• Node type: materials
  - Number of nodes: 0
  - Number of features: 1
  - db_path: MatGraphDB\nodes\materials
------------------------------------------------------------
• Node type: custom_node_1
  - Number of nodes: 0
  - Number of features: 1
  - db_path: MatGraphDB\nodes\custom_node_1
------------------------------------------------------------

###############

### Adding Nodes

Once a node type is registered, you can add nodes to it using the `add_nodes(node_type, data)` method. The `data` argument is a list of dictionaries, where each dictionary represents a node.

> Note: you can also automatically register a new node type by calling the `add_nodes` as well

In [4]:
# Add some user nodes
users = [
    {"name": "Jimmy"},
    {"name": "John"},
]

computers = [
    {
        "name": "Computer1",
        "specs": {"cpu": "AMD Ryzen 9", "ram": "32GB", "storage": "1TB"},
    },
    {
        "name": "Computer2",
        "specs": {"cpu": "Intel i7", "ram": "16GB", "storage": "512GB"},
    },
]

users_node_type = "users"
computers_node_type = "computers"

mdb.add_nodes(node_type=users_node_type, data=users)
mdb.add_nodes(node_type=computers_node_type, data=computers)

### Managing the node store

Once the data is registered, you can access it through the corresponding node store. You can get the node store either through the `node_stores` attribute or the `get_node_store(node_type)` method.


In [5]:
computers_node_store = mdb.get_node_store(computers_node_type)
print(type(computers_node_store))
print(computers_node_store)


users_node_store = mdb.node_stores[users_node_type]
print(type(users_node_store))

print(users_node_store)

<class 'matgraphdb.core.nodes.NodeStore'>
NODE STORE SUMMARY
Node type: computers
• Number of nodes: 2
• Number of features: 5
Storage path: MatGraphDB\nodes\computers


############################################################
METADATA
############################################################
• class: NodeStore
• class_module: matgraphdb.core.nodes
• node_type: computers
• name_column: id

############################################################
NODE DETAILS
############################################################
• Columns:
    - id
    - name
    - specs.cpu
    - specs.ram
    - specs.storage

<class 'matgraphdb.core.nodes.NodeStore'>
NODE STORE SUMMARY
Node type: users
• Number of nodes: 2
• Number of features: 2
Storage path: MatGraphDB\nodes\users


############################################################
METADATA
############################################################
• class: NodeStore
• class_module: matgraphdb.core.nodes
• node_type: users
• name_colum

### Reading from the node store

There are multiple ways to read from the node store. You can use the `read_nodes` method from the `MatGraphDB` instance, you can use the `read_nodes` method from the `NodeStore` instance, or you can use the `read` method from the `NodeStore` instance. These reads methods behave very similarly as the read features introduced in the previous notebook, such as you can read columns using filters or columns

In [6]:
import pyarrow.compute as pc

df = mdb.read_nodes(node_type=users_node_type).to_pandas()
print(df)

# Notice if you rebuild the nestes struct, the way you access the nested data is different
df = computers_node_store.read_nodes(
    columns=["name", "id", "specs"],
    filters=[pc.field("specs", "cpu") == "AMD Ryzen 9"],
    rebuild_nested_struct=True,
).to_pandas()
print(df)

df = computers_node_store.read(
    filters=[pc.field("specs.cpu") == "Intel i7"]
).to_pandas()
print(df)

   id   name
0   0  Jimmy
1   1   John
        name  id                                              specs
0  Computer1   0  {'cpu': 'AMD Ryzen 9', 'ram': '32GB', 'storage...
   id       name specs.cpu specs.ram specs.storage
0   1  Computer2  Intel i7      16GB         512GB


### Updating the node store

You can update the node store by using the `update_nodes` method from the `MatGraphDB` instance, or the `update_nodes` method from the `NodeStore` instance.

In [7]:
computer_update_data = [
    {"name": "Computer1", "specs": {"ram": "128GB", "storage": "1TB"}},
    {"name": "Computer2", "specs": {"ram": "256GB", "storage": "2TB"}},
]

mdb.update_nodes(
    node_type=computers_node_type, data=computer_update_data, update_keys=["name"]
)

df = mdb.read_nodes(node_type=computers_node_type).to_pandas()
print(df)

   id       name    specs.cpu specs.ram specs.storage
0   0  Computer1  AMD Ryzen 9     128GB           1TB
1   1  Computer2     Intel i7     256GB           2TB


## 2. Adding New Edges

Edges are managed in the same way as nodes, but they are stored in the `EdgeStore` instance. EdgeStores differ from NodeStores as they have to store the source and target node ids, as well as the edge type. These must be specified to add an edge.

You can create a new edge type using `add_edge_type(edge_type)`. Then, you can add edges by calling `add_edges(edge_type, data)`.
- `source_id` and `source_type`
- `target_id` and `target_type`


The `ids` and `types` must match the node types and ids nodes in `MatGraphDB`.

In [9]:
# Add edge type
edge_type_test = "user_access"

# We'll connect the 'user' nodes to the 'item' nodes
edge_data = [
    {
        "source_id": 0,  # This is the id of the user node
        "source_type": users_node_type,
        "target_id": 0,  # This is the id of the computer node
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "Jimmy has access to Computer1",
    },
    {
        "source_id": 0,  # This is the id of the user node
        "source_type": users_node_type,
        "target_id": 1,  # This is the id of the computer node
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "Jimmy has access to Computer2",
    },
    {
        "source_id": 1,
        "source_type": users_node_type,
        "target_id": 1,
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "John has access to Computer2",
    },
    {
        "source_id": 0,
        "source_type": computers_node_type,
        "target_id": 1,
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "Computer1 has access to Computer2",
    },
    {
        "source_id": 1,
        "source_type": computers_node_type,
        "target_id": 0,
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "Computer2 has access to Computer1",
    },
    {
        "source_id": 0,
        "source_type": computers_node_type,
        "target_id": 0,
        "target_type": computers_node_type,
        "edge_type": edge_type_test,
        "name": "Computer1 has access to Computer1",
        "extra_detail": "This is the main computer",
    },
]

mdb.add_edges(edge_type=edge_type_test, data=edge_data)

edges = mdb.read_edges(edge_type=edge_type_test)
print("Number of edges of type 'test_edge':", len(edges))
df_edges = edges.to_pandas()
print(df_edges)

Number of edges of type 'test_edge': 6
     edge_type               extra_detail  id  \
0  user_access                       None   0   
1  user_access                       None   1   
2  user_access                       None   2   
3  user_access                       None   3   
4  user_access                       None   4   
5  user_access  This is the main computer   5   

                                name  source_id source_type  target_id  \
0      Jimmy has access to Computer1          0       users          0   
1      Jimmy has access to Computer2          0       users          1   
2       John has access to Computer2          1       users          1   
3  Computer1 has access to Computer2          0   computers          1   
4  Computer2 has access to Computer1          1   computers          0   
5  Computer1 has access to Computer1          0   computers          0   

  target_type  
0   computers  
1   computers  
2   computers  
3   computers  
4   computers  
5 

In this example we have defined the computer access edges between users and computers. Note that we can specify self-edges and directionality of the edges by choosing which node is the source and which is the target.

Also we are free to add additional columns/features to the edges, such as `extra_detail` in this case.

### Updating the edges

You can update the edges by using the `update_edges` method from the `MatGraphDB` instance, or the `update_edges` method from the `EdgeStore` instance.


In [10]:
update_data = [
    {"id": 0, "weight": 1.0},
    {"id": 1, "weight": 1.0},
]

mdb.update_edges(edge_type=edge_type_test, data=update_data)

edges = mdb.read_edges(
    edge_type=edge_type_test, columns=["id", "source_id", "target_id", "weight", "name"]
).to_pandas()
print("Number of edges of type 'test_edge':", len(edges))
print(edges)

Number of edges of type 'test_edge': 6
   id  source_id  target_id  weight                               name
0   0          0          0     1.0      Jimmy has access to Computer1
1   1          0          1     1.0      Jimmy has access to Computer2
2   2          1          1     NaN       John has access to Computer2
3   3          0          1     NaN  Computer1 has access to Computer2
4   4          1          0     NaN  Computer2 has access to Computer1
5   5          0          0     NaN  Computer1 has access to Computer1


We can also update by specifying the source and target ids and types. To do this we need to specify `source_id`, `target_id`, `source_type`, and `target_type` in the `update_keys` argument.


In [11]:
update_data = [
    {
        "source_id": 0,
        "source_type": users_node_type,
        "target_id": 0,
        "target_type": computers_node_type,
        "weight": 0.5,
    },
]

mdb.update_edges(
    edge_type=edge_type_test,
    data=update_data,
    update_keys=["source_id", "target_id", "source_type", "target_type"],
)


edges = mdb.read_edges(
    edge_type=edge_type_test, columns=["id", "source_id", "target_id", "weight", "name"]
).to_pandas()
print("Number of edges of type 'test_edge':", len(edges))
print(edges)

Number of edges of type 'test_edge': 6
   id  source_id  target_id  weight                               name
0   0          0          0     0.5      Jimmy has access to Computer1
1   1          0          1     1.0      Jimmy has access to Computer2
2   2          1          1     NaN       John has access to Computer2
3   3          0          1     NaN  Computer1 has access to Computer2
4   4          1          0     NaN  Computer2 has access to Computer1
5   5          0          0     NaN  Computer1 has access to Computer1


In [12]:
print(mdb)

GRAPH DATABASE SUMMARY
Name: MatGraphDB
Storage path: MatGraphDB
└── Repository structure:
    ├── nodes/                 (MatGraphDB\nodes)
    ├── edges/                 (MatGraphDB\edges)
    ├── edge_generators/       (MatGraphDB\edge_generators)
    ├── node_generators/       (MatGraphDB\node_generators)
    └── graph/                 (MatGraphDB\graph)

############################################################
NODE DETAILS
############################################################
Total node types: 4
------------------------------------------------------------
• Node type: materials
  - Number of nodes: 0
  - Number of features: 1
  - Columns:
       - id
  - db_path: MatGraphDB\nodes\materials
------------------------------------------------------------
• Node type: custom_node_1
  - Number of nodes: 0
  - Number of features: 1
  - Columns:
       - id
  - db_path: MatGraphDB\nodes\custom_node_1
------------------------------------------------------------
• Node type: users

## Conclusion

In this notebook, we explored the process of managing graphs using MatGraphDB. Specifically, we:

- Added new node types and registered nodes within those types.
- Learned how to create and manage edge types, including adding and updating edges.
- Explored the functionality of reading and updating data from both node and edge stores.

These capabilities form the foundation for representing and manipulating complex graph-based data efficiently. 

### What's Next?

In the next notebook, we will go into adding node and edge generators. Generators allow the creation of nodes and edges dynamically based on predefined functions. This allows MatGraphDB to propagate updates to dependent nodes and edges if there are any changes to the parent nodes or edges.
