# Part 2: Graph construction
This demo will cover how to construct a Kùzu graph using data obtained from multiple files. For
simplicity, we'll use Parquet files, but the data could come from external sources too, like
DuckDB or PostgreSQL tables. See the documentation on [Kùzu extensions](https://docs.kuzudb.com/extensions/)
to achieve integration with external DBs.

We can start by creating an empty Kùzu database and opening a connection to it.

In [1]:
import shutil
from pathlib import Path

import kuzu

Path.mkdir(Path("db/"), exist_ok=True)
shutil.rmtree("db/kuzudb", ignore_errors=True)
db = kuzu.Database("db/kuzudb")
kuzu_conn = kuzu.Connection(db)

### Data modeling

The following raw data files are available in the `data/final/` directory. The data contains information
about customers who purchased wines from the reviews dataset, follow reviewers, live in certain
countries, and the original wine reviews from the previous section.

```
.
├── final
    ├── customers.parquet
    ├── follows.parquet
    ├── lives_in.parquet
    ├── purchases.parquet
    ├── tasted.parquet
    ├── tasters.parquet
    └── winemag-reviews.parquet
```

Some of these are structured as node files, with each column representing the node's properties.
Others are structured as edge files, with the first and second columns representing the source (FROM)
and target (TO) nodes, respectively. The files are shown here in Parquet format, but they could
just as well have been sitting in a relational database or datalake.

Our goal is to use this data to construct a graph with the following nodes and relationships:

<img src="./assets/graph_schema_wines.png" height=300/>

We first define the graph schema by creating the nodes and relationships and their associated properties.

In [2]:
# Create customer node table
def create_customer_node_table(conn: kuzu.Connection) -> None:
    conn.execute(
        """
        CREATE NODE TABLE
            Customer(
                customer_id INT64,
                name STRING,
                age INT64,
                PRIMARY KEY (customer_id)
            )
        """
    )

# Create taster node table
def create_taster_node_table(conn: kuzu.Connection) -> None:
    conn.execute(
        """
        CREATE NODE TABLE
            Taster(
                taster_twitter_handle STRING,
                taster_name STRING,
                taster_id STRING,
                PRIMARY KEY (taster_id)
            )
        """
    )

# Create wine node table
def create_wine_node_table(conn: kuzu.Connection) -> None:
    conn.execute(
        """
        CREATE NODE TABLE
            Wine(
                id INT64,
                title STRING,
                country STRING,
                description STRING,
                variety STRING,
                points INT64,
                price DOUBLE,
                state STRING,
                taster_name STRING,
                taster_twitter_handle STRING,
                PRIMARY KEY (id)
            )
        """
    )

# Create country node table
def create_country_node_table(conn: kuzu.Connection) -> None:
    conn.execute(
        """
        CREATE NODE TABLE
            Country(
                country STRING,
                PRIMARY KEY (country)
            )
        """
    )

In [3]:
# Run node table creation
create_customer_node_table(kuzu_conn)
create_wine_node_table(kuzu_conn)
create_taster_node_table(kuzu_conn)
create_country_node_table(kuzu_conn)

## Insert data into the graph
Once the tables are created, it's time to insert the data into the node and relationship tables.
This is done without any for-loops in Python by using the `COPY` command in Cypher, which is 
the fastest way to bulk-insert data into a node or relationship table.

In [4]:
# Insert nodes into graph
kuzu_conn.execute("COPY Customer FROM 'data/final/customers.parquet'");
kuzu_conn.execute("COPY Wine FROM 'data/final/winemag-reviews.parquet'");
kuzu_conn.execute("COPY Taster FROM 'data/final/tasters.parquet'");
kuzu_conn.execute("COPY Country FROM (LOAD FROM 'data/final/winemag-reviews.parquet' WHERE country IS NOT NULL RETURN DISTINCT country)");

In [5]:
# Check number of nodes
kuzu_conn.execute("MATCH (w:Wine) RETURN count(w) AS num_wines").get_as_pl()

num_wines
i64
129971


In [6]:
# Check number of customer nodes
kuzu_conn.execute("MATCH (c:Customer) RETURN count(c) AS num_customers").get_as_pl()

num_customers
i64
25


In a similar way, we can create relationship tables and insert the necessary data into them.
Note that for the final relationship table, `IsFrom`, we can directly obtain the necessary information from the `winemag-reviews.parquet` file
by running a predicate filter via the `LOAD FROM` subquery.

In [7]:
# Create relationship tables
kuzu_conn.execute("CREATE REL TABLE LivesIn(FROM Customer TO Country)");
kuzu_conn.execute("CREATE REL TABLE Purchased(FROM Customer TO Wine)");
kuzu_conn.execute("CREATE REL TABLE Follows(FROM Customer TO Taster)");
kuzu_conn.execute("CREATE REL TABLE Tasted(FROM Taster TO Wine)");
kuzu_conn.execute("CREATE REL TABLE IsFrom(FROM Wine TO Country)");

# Insert relationships into graph
kuzu_conn.execute("COPY LivesIn FROM 'data/final/lives_in.parquet'");
kuzu_conn.execute("COPY Purchased FROM 'data/final/purchases.parquet'");
kuzu_conn.execute("COPY Follows FROM 'data/final/follows.parquet'");
kuzu_conn.execute("COPY Tasted FROM 'data/final/tasted.parquet'");
kuzu_conn.execute("COPY IsFrom FROM (LOAD FROM 'data/final/winemag-reviews.parquet' WHERE country IS NOT NULL RETURN id, country)");

## Query graph
We can now run some queries that ask questions of the connected data.

In [8]:
# Number of customers who purchased wines reviewed by Roger Voss
kuzu_conn.execute(
    """
    MATCH (c:Customer)-[p:Purchased]->(w:Wine)<-[t:Tasted]-(r:Taster)
    WHERE r.taster_name = "Roger Voss"
    RETURN count(*) AS num_customers
    """
).get_as_pl()

num_customers
i64
10


## Visualize the graph schema
We can also inspect the graph visually using [Kùzu Explorer](https://docs.kuzudb.com/visualization/)
and run more complex queries to answer questions on customer-taster-wine relationships.

Use the provided compose file to start the Kùzu Explorer in Docker and connect to the database.

```bash
docker compose up
```