# Pneuma: Quick Start

In this notebook, we show how to use each of Pneuma's features, from registering a dataset to querying the index.

## Offline Stage

In the offline stage, we set up Pneuma, including initializing the database, registering dataset and metadata, generating summaries, and generating both vector and keyword index.

To use pneuma, we import the class Pneuma from the pneuma module. 
- CUBLAS_WORKSPACE_CONFIG is set to a certain value (in this demo `:4096:8`) to enforce more deterministic behavior in cuBLAS operations.
- CUDA_VISIBLE_DEVICES is used to select the GPU. 
- The out_path is used to determine where the dataset and indexes will be stored. If not set, it will be defaulted to ~/.local/share/Pneuma/out on Linux or /Documents/Pneuma/out on Windows.

In [None]:
from pneuma import Pneuma
import os
import json

In [None]:
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"  # Ensure deterministic behavior
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Select GPU

out_path = "out_demo/storage"

We initialize the pneuma object with out_path and call the setup() function to initialize the database.

In [None]:
pneuma = Pneuma(
    out_path=out_path,
    llm_path="experiments/models/qwen",  # E.g., Qwen/Qwen2.5-7B-Instruct
    embed_path="experiments/models/bge-base",  # E.g., BAAI/bge-base-en-v1.5
)
pneuma.setup()

### Registration

To register a dataset, we call the add_tables function while pointing to a directory and specifying the data creator.

In [None]:
data_path = "data_src/sample_data/csv"

response = pneuma.add_tables(path=data_path, creator="demo_user")
response = json.loads(response)
print(response)

Register context or summaries for dataset with the add_metadata function.

In [None]:
metadata_path = "data_src/sample_data/metadata.csv"

response = pneuma.add_metadata(metadata_path=metadata_path)
response = json.loads(response)
print(response)

### Summarization
By default, calling the summarize function will create summaries for all unsummarized tables.

In [None]:
response = pneuma.summarize()
response = json.loads(response)
print(response)

### Index Generation
To generate both vector and keyword index, we call the generate_index function while specifying a name for the index. By default, this function will index all registered tables.

In [None]:
response = pneuma.generate_index(index_name="demo_index")
response = json.loads(response)
print(response)

## Online Stage (Querying)
To retrieve a ranked list of tables, we use the query_index function.

In [None]:
response = pneuma.query_index(
    index_name="demo_index",
    query="Which dataset contains climate issues?",
    k=1,
    n=5,
    alpha=0.5,
)
response = json.loads(response)
query = response["data"]["query"]
retrieved_tables = response["data"]["response"]

print(f"Query: {query}")
print("Retrieved tables:")
for table in retrieved_tables:
    print(table)