# PostgreSQL Quickstart: Getting Started

This notebook shows how to use **agent-data-toolkit** to connect to a local PostgreSQL instance started via Docker, run a quick smoke test, and query a demo table into a DataFrame.

**What you'll do:**
- Read `PG_DSN` from your environment (set via `.env`)
- Create a `PostgresClient`
- Run a smoke test (`SELECT 1`)
- Query the demo table into a DataFrame


## Imports
We import the PostgreSQL client from `agent_data_toolkit.postgresql`, plus a few helpers. If `.env` isn't auto-loaded by your shell, the cell below will *optionally* attempt to load it via `python-dotenv` if available.

In [None]:
import os
from datetime import datetime, timezone

from dotenv import load_dotenv

from agent_data_toolkit.postgresql import PostgresClient

load_dotenv()

utc_now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")  # noqa: UP017
print("Datetime:", utc_now)


## Configure DSN
The toolkit reads connection info from environment variables inside the kernel. This quickstart expects `PG_DSN` to be present. For the provided Docker quickstart, it typically looks like:

```
PG_DSN=postgresql://postgres:postgres@127.0.0.1:5432/demo?sslmode=disable
```
If you launched Jupyter from a shell that loaded `.env`, you should be good to go.

In [None]:
dsn = os.environ.get("PG_DSN")
if not dsn:
    raise RuntimeError(
        "PG_DSN is not set in the kernel environment.\n"
        "Tip: In your terminal, run `set -a && source .env && set +a` before starting Jupyter, "
        "or install python-dotenv and keep this notebook's optional auto-load.")
print("PG_DSN present ✔\n", dsn)

## Create a client
We'll create a `PostgresClient` using the DSN above. The client manages connections and gives you ergonomic helpers for queries and schema inspection.

In [None]:
pg = PostgresClient.from_dsn(dsn)
print("Client ready ✔")

## Smoke test
Run a quick `SELECT 1` and return a small DataFrame.

In [None]:
pg.query_df("SELECT 1 AS ok")

## Query the demo table
The Docker seeding scripts create `analytics.users` with a few rows. Let's fetch them.

In [None]:
df = pg.query_df(
    "SELECT id, email, full_name, created_at FROM analytics.users ORDER BY id LIMIT 10"
)
df

## (Optional) Export results to Parquet
You can stream larger queries directly to Parquet using a server-side cursor.

In [None]:
out_path = pg.stream_to_parquet(
    "SELECT * FROM analytics.users ORDER BY id",
    "data/users.parquet",
)
print("Wrote:", out_path)

### Cleanup: remove the written Parquet file and its "data" folder (if safe)


In [None]:
import shutil
from pathlib import Path

p = Path(out_path).resolve()
cwd = Path.cwd().resolve()

# 1) Delete the file (only if it's inside the current project)
if p.is_file() and cwd in p.parents:
    p.unlink(missing_ok=True)
    print(f"Deleted file: {p}")
else:
    print(f"Skip deleting file (not found or outside project): {p}")

# 2) Delete the parent folder if it's exactly named "data" and inside the project
data_dir = p.parent
if data_dir.name == "data" and cwd in data_dir.parents:
    shutil.rmtree(data_dir, ignore_errors=True)
    print(f"Deleted folder: {data_dir}")
else:
    print(f"Skip deleting folder (not 'data' or outside project): {data_dir}")


## Cleanup (optional)
If you're done, you can close the client to release resources.

In [None]:
pg.close()
print("Closed ✔")