# Kuzu

Welcome to the Kuzu docs!

Kuzu is an embedded graph database built for query speed and scalability. It is optimized for handling complex join-heavy analytical workloads on very large graphs and has the following core features:

- Property Graph data model and Cypher query language
- Embedded (in-process) integration with applications
- Columnar disk-based storage
- Columnar and compressed sparse row-based (CSR) adjacency list and join indices
- Vectorized and factorized query processing
- Novel and efficient join algorithms
- Multi-core query parallelism
- Serializable ACID transactions

## Why Kuzu?

Although there are many graph database management systems (GDBMSs) in the market today, Kuzu stands apart because its design and implementation address specific trade-offs that make it a compelling choice for analytical query workloads on large graphs. Below, we list some of the key reasons why you should consider using Kuzu.

- Performance and scalability
- Usability
- Interoperability
- Structured property graph model
- Open source

## [Install Kuzu](https://kuzudb.github.io/docs/installation/)

### Command Line (Shell)

```bash
# Linux
curl -L -O https://github.com/kuzudb/kuzu/releases/download/v0.11.3/kuzu_cli-linux-x86_64.tar.gz
curl -L -O https://github.com/kuzudb/kuzu/releases/download/v0.11.3/kuzu_cli-linux-aarch64.tar.gz

tar xzf kuzu_cli-*.tar.gz

# You can now run Kuzu from the command line.
./kuzu

# macOS
brew install kuzu

# You can now run Kuzu from the command line.
kuzu
```

### Python
```bash
# UV
uv add kuzu

# pip
pip install kuzu
```

### Node.js
```bash
npm install kuzu
```

### Java
```
<dependency>
  <groupId>com.kuzudb</groupId>
  <artifactId>kuzu</artifactId>
  <version>0.11.3</version>
</dependency>
```

### Rust
```
cargo add kuzu
```

### Go
```
go get github.com/kuzudb/go-kuzu@v0.11.3
```

...

## Kuzu Explorer

Kuzu Explorer is a web-based GUI for Kuzu. It allows you to explore and query your Kuzu database using a web browser. Refer to the Kuzu Explorer [GitHub repo](https://github.com/kuzudb/explorer) for more details.

## Kuzu MCP Server

Our Model Context Protocol server allows you to expose your Kuzu database as a tool that can be used by LLMs and agents. Refer to the Kuzu-MCP [GitHub repo](https://github.com/kuzudb/kuzu-mcp-server) for more details.

# Create your first graph

Kuzu implements a **structured property graph model** and requires a pre-defined schema.

- Schema definition involves node and relationship tables and their associated properties.
- Each property key is strongly typed and these types must be explicitly declared.
- For node tables, a primary key must be defined.
- For relationship tables, no primary key is required.

## Persistence

Kuzu supports both **on-disk** and **in-memory** modes of operation. The mode is determined at the time of creating the database, as explained below.

### On-disk database

If you specify a database path when initializing a database, such as `example.kuzu`, Kuzu will operate in the **on-disk** mode. In this mode, Kuzu persists all data to disk at the given path. All transactions are logged to a Write-Ahead Log (WAL) and updates are periodically merged into the database files during checkpoints.

### In-memory database

If you omit the database path, by specifying it as `""` or `:memory:`, Kuzu will operate in the in-memory mode. In this mode, there are no writes to the WAL, and no data is persisted to disk. All data is lost when the process finishes.

# Quickstart

Install `kuzu`: `uv pip install kuzu`


Download data

```bash
mkdir ./data/
curl -L -o ./data/city.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/city.csv
curl -L -o ./data/user.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/user.csv
curl -L -o ./data/follows.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/follows.csv
curl -L -o ./data/lives-in.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/lives-in.csv
```

In [1]:
# Inspect data
import pandas as pd

df_user = pd.read_csv("./data/user.csv")
df_user.head(2)

Unnamed: 0,Adam,30
0,Karissa,40
1,Zhang,50


In [2]:
df_city = pd.read_csv("./data/city.csv")
df_city.head(2)

Unnamed: 0,Waterloo,150000
0,Kitchener,200000
1,Guelph,75000


In [3]:
df_follows = pd.read_csv("./data/follows.csv")
df_follows.head(2)

Unnamed: 0,Adam,Karissa,2020
0,Adam,Zhang,2020
1,Karissa,Zhang,2021


In [4]:
df_livesin = pd.read_csv("./data/lives-in.csv")
df_livesin.head(2)

Unnamed: 0,Adam,Waterloo
0,Karissa,Waterloo
1,Zhang,Kitchener


In [5]:
import kuzu

# Create an empty on-disk database and connect to it
db = kuzu.Database("example.kuzu")
conn = kuzu.Connection(db)

In [6]:
# Create schema
conn.execute("CREATE NODE TABLE User(name STRING PRIMARY KEY, age INT64)")
conn.execute("CREATE NODE TABLE City(name STRING PRIMARY KEY, population INT64)")
conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")
conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")

<kuzu.query_result.QueryResult at 0x7faa8ae98e30>

In [7]:
# Insert data
conn.execute('COPY User FROM "./data/user.csv"')
conn.execute('COPY City FROM "./data/city.csv"')
conn.execute('COPY Follows FROM "./data/follows.csv"')
conn.execute('COPY LivesIn FROM "./data/lives-in.csv"')

<kuzu.query_result.QueryResult at 0x7faa8a776180>

In [8]:
import textwrap

# Execute Cypher query
response = conn.execute(
    textwrap.dedent(
        """
        MATCH (a:User)-[f:Follows]->(b:User)
        RETURN a.name, b.name, f.since;
        """
    )
)

for row in response:
    print(row)

['Adam', 'Karissa', 2020]
['Adam', 'Zhang', 2020]
['Karissa', 'Zhang', 2021]
['Zhang', 'Noura', 2022]


In [9]:
# Output as a dictionary
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
for row in response.rows_as_dict():
    print(row)

{'a.name': 'Adam', 'b.name': 'Karissa', 'f.since': 2020}
{'a.name': 'Adam', 'b.name': 'Zhang', 'f.since': 2020}
{'a.name': 'Karissa', 'b.name': 'Zhang', 'f.since': 2021}
{'a.name': 'Zhang', 'b.name': 'Noura', 'f.since': 2022}


In [10]:
# pip install pandas
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_df())

    a.name   b.name  f.since
0     Adam  Karissa     2020
1     Adam    Zhang     2020
2  Karissa    Zhang     2021
3    Zhang    Noura     2022


In [11]:
!uv pip install polars pyarrow

[2mUsing Python 3.12.11 environment at: /home/locch/Works/graph-exp/.venv[0m
[2mAudited [1m2 packages[0m [2min 1ms[0m[0m


In [12]:
# pip install polars
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_pl())

shape: (4, 3)
┌─────────┬─────────┬─────────┐
│ a.name  ┆ b.name  ┆ f.since │
│ ---     ┆ ---     ┆ ---     │
│ str     ┆ str     ┆ i64     │
╞═════════╪═════════╪═════════╡
│ Adam    ┆ Karissa ┆ 2020    │
│ Adam    ┆ Zhang   ┆ 2020    │
│ Karissa ┆ Zhang   ┆ 2021    │
│ Zhang   ┆ Noura   ┆ 2022    │
└─────────┴─────────┴─────────┘


In [13]:
# pip install pyarrow
response = conn.execute(
    """
    MATCH (a:User)-[f:Follows]->(b:User)
    RETURN a.name, b.name, f.since;
    """
)
print(response.get_as_arrow())

pyarrow.Table
a.name: string
b.name: string
f.since: int64
----
a.name: [["Adam","Adam","Karissa","Zhang"]]
b.name: [["Karissa","Zhang","Zhang","Noura"]]
f.since: [[2020,2020,2021,2022]]
