# 3. Graph analytics and machine learning
This notebook will go through hwo to apply common graph analytics algorithms to your graph data using
NetworkX. In an upcoming version of Kùzu (0.5.0 and beyond), we will be releasing an algorithms package
that allow you to run some of these algorithms from within Kùzu without having to resort to NetworkX.

However, because NetworkX is a very mature library, it has an extensive list of graph algorithms that
may not be available in Kùzu, so it's still useful to know how to use it alongside Kùzu.

## Create a database and start a connection
We can start by creating an empty Kùzu database and opening a connection to it.

In [2]:
import kuzu
import pandas as pd
import networkx as nx

db = kuzu.Database("db")
conn = kuzu.Connection(db)

In [3]:
res = conn.execute('MATCH (c:Customer)-[p:Purchased]->(w:Wine)<-[t:Tasted]-(ts:Taster) RETURN *')
G = res.get_as_networkx(directed=False)

In [4]:
pageranks = nx.pagerank(G)

In [5]:
pagerank_df = pd.DataFrame.from_dict(pageranks, orient="index", columns=["pagerank"])
taster_df = pagerank_df[pagerank_df.index.str.contains("Taster")]
taster_df.index = taster_df.index.str.replace("Taster_", "")
taster_df = taster_df.reset_index(names=["id"])
taster_df.head()

Unnamed: 0,id,pagerank
0,jim_gordon,0.016046
1,kerin_o_keefe,0.042609
2,virginie_boone,0.016046
3,paul_gregutt,0.024901
4,matt_kettmann,0.033755


In [6]:
try:
  # Alter original node table schemas to add pageranks
  conn.execute("ALTER TABLE Taster ADD pagerank DOUBLE DEFAULT 0.0;")
except RuntimeError:
  # If the column already exists, do nothing
  pass

In [7]:
# Copy pagerank values to Taster nodes
x = conn.execute(
  """
  LOAD FROM taster_df
  MERGE (ts:Taster {taster_id: id})
  ON MATCH SET ts.pagerank = pagerank
  RETURN ts.taster_id AS taster_id, ts.pagerank AS pagerank
  ORDER BY ts.pagerank DESC
  """
)
x.get_as_df()


Unnamed: 0,taster_id,pagerank
0,roger_voss,0.095734
1,kerin_o_keefe,0.042609
2,matt_kettmann,0.033755
3,paul_gregutt,0.024901
4,jim_gordon,0.016046
5,virginie_boone,0.016046


As can be seen, Roger Voss is the most influential taster in the network, with the highest Pagerank score. This makes sense intuitively, because from the first notebook (where we saw that he'd tasted more than 25,000 wines), he has the most indirect connections to customers who bought wines tasted by him.

## Graph machine learning

Next, we'll look at how to use Kùzu as the backend for PyTorch Geometric, a popular library for graph machine learning. We'll extract the graph data from Kùzu and organize it into a `feature_store` and the `graph_store` objects, which provide their respective data to PyTorch Geometric for downstream tasks. Once you have these two objects, you can proceed with your PyTorch Geometric workflow as you would with any other dataset.

In [8]:
feature_store, graph_store = db.get_torch_geometric_remote_backend()

The feature store stores the node properties, while the graph store stores the edges and their properties.

In [9]:
feature_store.get_all_tensor_attrs()

[TensorAttr(group_name='Taster', attr_name='pagerank', index=<_FieldStatus.UNSET: None>),
 TensorAttr(group_name='Wine', attr_name='id', index=<_FieldStatus.UNSET: None>),
 TensorAttr(group_name='Wine', attr_name='points', index=<_FieldStatus.UNSET: None>),
 TensorAttr(group_name='Wine', attr_name='price', index=<_FieldStatus.UNSET: None>),
 TensorAttr(group_name='Customer', attr_name='customer_id', index=<_FieldStatus.UNSET: None>),
 TensorAttr(group_name='Customer', attr_name='age', index=<_FieldStatus.UNSET: None>)]

You can access the properties within each node as tensors, as follows.

In [10]:
feature_store.get_tensor("Customer", "age", None)

tensor([66, 34, 51, 30, 34, 38, 60, 61, 24, 49, 49, 27, 56, 62, 21, 38, 62, 36,
        26, 49, 62, 29, 42, 43, 22])

To see the edge indices that are stored in the graph store, you can inspect the edge attributes as follows.

In [11]:
graph_store.get_all_edge_attrs()

[EdgeAttr(edge_type=('Wine', 'IsFrom', 'Country'), layout=<EdgeLayout.COO: 'coo'>, is_sorted=True, size=(129971, 43)),
 EdgeAttr(edge_type=('Customer', 'Follows', 'Taster'), layout=<EdgeLayout.COO: 'coo'>, is_sorted=True, size=(25, 19)),
 EdgeAttr(edge_type=('Customer', 'LivesIn', 'Country'), layout=<EdgeLayout.COO: 'coo'>, is_sorted=True, size=(25, 43)),
 EdgeAttr(edge_type=('Customer', 'Purchased', 'Wine'), layout=<EdgeLayout.COO: 'coo'>, is_sorted=True, size=(25, 129971)),
 EdgeAttr(edge_type=('Taster', 'Tasted', 'Wine'), layout=<EdgeLayout.COO: 'coo'>, is_sorted=True, size=(19, 129971))]

In [12]:
graph_store.get_edge_index(edge_type=('Customer', 'Purchased', 'Wine'), layout='coo')

tensor([[    11,      2,     18,     24,     14,     15,     21,     10,     16,
             22,      1,      4,     12,      9,      5,      0,     20,      7,
             17,     23,      8,      6,      3,     19,     13],
        [  4645,  12006,  14136,  36952,  48395,  57375,  59880,  67504,  69208,
          78730,  79659,  80978,  82711,  83155,  87219,  89391,  89515,  97885,
         103430, 104272, 109386, 110352, 122593, 128779, 128939]])

Once you have the `feature_store` and `graph_store` objects available, you can use them as datasets in PyTorch Geometric. In a few lines of code, Kùzu can function as your go-to backend for graph machine learning tasks!

## Further reading and examples

See our [documentation page](https://docs.kuzudb.com/tutorials/#python) for tutorials and notebooks on using Kùzu with PyTorch Geometric for node/link prediction
and how to set up a model train/test workflow.