# Databento data catalog

This tutorial will walk through how to setup a Nautilus Parquet data catalog with databento order book data.

We choose to work with the MBP-10 schema (which is just an aggregation of the top 10 levels) so that the data is more manageable and easier to work with for the example.

In [None]:
import databento as db

client = db.Historical()  # This will use the DATABENTO_API_KEY environment variable (recommended best practice)

## Request data

Use the historical API to request the front-month ES futures contract for January 2024.

**CAUTION: This will incur a cost for every request (only run the request cell once)**

In [None]:
# Path we'll use for persisting this request to disk
path = "es-front-glbx-mbp10.dbn.zst"

# Request lead month
data = client.timeseries.get_range(
    dataset="GLBX.MDP3",
    symbols=["ES.n.0"],
    stype_in="continuous",
    schema="mbp-10",
    start="2023-12-06T14:30:00",
    end="2023-12-06T20:30:00",
    path=path,
)

In [None]:
df = data.to_df()
df

## Write to data catalog

In [None]:
import shutil
from pathlib import Path

from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader
from nautilus_trader.model.identifiers import InstrumentId
from nautilus_trader.persistence.catalog import ParquetDataCatalog

In [None]:
instrument_id = InstrumentId.from_str("ES.n.0")  # This should be the raw symbol (update)
loader = DatabentoDataLoader()
depth10 = loader.from_dbn_file(
    path=path,
    instrument_id=instrument_id,  # Not required but makes data loading faster (symbology mapping not required)
    as_legacy_cython=False,  # This will load Rust pyo3 objects to write to the catalog
)

In [None]:
CATALOG_PATH = Path.cwd() / "catalog"

# Clear if it already exists, then create fresh
if CATALOG_PATH.exists():
    shutil.rmtree(CATALOG_PATH)
CATALOG_PATH.mkdir()

# Create a catalog instance
catalog = ParquetDataCatalog(CATALOG_PATH)

In [None]:
# Write instrument and ticks to catalog (this takes ~20 seconds)
catalog.write_data(depth10)

In [None]:
import pyarrow.parquet as pq

In [None]:
depth10_parquet_path = "catalog/data/order_book_depth10/ES.n.0/part-0.parquet"

In [None]:
table = pq.read_table(depth10_parquet_path)
table.schema