Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,58 @@ if supports_where(obj):
result = obj.where("condition")
```

### Mypyc-Compatible Metadata Class Pattern

When defining data-holding classes intended for core modules (`sqlspec/core/`, `sqlspec/driver/`) that will be compiled with MyPyC, use regular classes with `__slots__` and explicitly implement `__init__`, `__repr__`, `__eq__`, and `__hash__`. This approach ensures optimal performance and MyPyC compatibility, as `dataclasses` are not directly supported by MyPyC for compilation.

**Key Principles:**

- **`__slots__`**: Reduces memory footprint and speeds up attribute access.
- **Explicit `__init__`**: Defines the constructor for the class.
- **Explicit `__repr__`**: Provides a clear string representation for debugging.
- **Explicit `__eq__`**: Enables correct equality comparisons.
- **Explicit `__hash__`**: Makes instances hashable, allowing them to be used in sets or as dictionary keys. The hash implementation should be based on all fields that define the object's identity.

**Example Implementation:**

```python
class MyMetadata:
__slots__ = ("field1", "field2", "optional_field")

def __init__(self, field1: str, field2: int, optional_field: str | None = None) -> None:
self.field1 = field1
self.field2 = field2
self.optional_field = optional_field

def __repr__(self) -> str:
return f"MyMetadata(field1={self.field1!r}, field2={self.field2!r}, optional_field={self.optional_field!r})"

def __eq__(self, other: object) -> bool:
if not isinstance(other, MyMetadata):
return NotImplemented
return (
self.field1 == other.field1
and self.field2 == other.field2
and self.optional_field == other.optional_field
)

def __hash__(self) -> int:
return hash((self.field1, self.field2, self.optional_field))
```

**When to Use:**

- For all new data-holding classes in performance-critical paths (e.g., `sqlspec/driver/_common.py`).
- When MyPyC compilation is enabled for the module containing the class.

**Anti-Patterns to Avoid:**

- Using `@dataclass` decorators for classes intended for MyPyC compilation.
- Omitting `__slots__` when defining performance-critical data structures.
- Relying on default `__eq__` or `__hash__` behavior for complex objects, especially for equality comparisons in collections.

---

### Performance Patterns (MANDATORY)

**PERF401 - List Operations**:
Expand Down
73 changes: 50 additions & 23 deletions docs/examples/usage/usage_drivers_and_querying_10.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,30 +12,57 @@

def test_example_10_duckdb_config(tmp_path: Path) -> None:
# start-example
import tempfile

from sqlspec import SQLSpec
from sqlspec.adapters.duckdb import DuckDBConfig

spec = SQLSpec()
# In-memory
config = DuckDBConfig()

# Persistent
database_file = tmp_path / "analytics.duckdb"
config = DuckDBConfig(pool_config={"database": database_file.name, "read_only": False})

with spec.provide_session(config) as session:
# Create table from Parquet
session.execute(f"""
CREATE TABLE if not exists users AS
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
""")

# Analytical query
session.execute("""
SELECT date_trunc('day', created_at) as day,
count(*) as user_count
FROM users
GROUP BY day
ORDER BY day
""")
# Use a temporary directory for the DuckDB database for test isolation
with tempfile.TemporaryDirectory() as tmpdir:
db_path = Path(tmpdir) / "analytics.duckdb"

spec = SQLSpec()
# In-memory
in_memory_db = spec.add_config(DuckDBConfig())
persistent_db = spec.add_config(DuckDBConfig(pool_config={"database": str(db_path)}))

try:
# Test with in-memory config
with spec.provide_session(in_memory_db) as session:
# Create table from Parquet
session.execute(f"""
CREATE TABLE if not exists users AS
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
""")

# Analytical query
session.execute("""
SELECT date_trunc('day', created_at) as day,
count(*) as user_count
FROM users
GROUP BY day
ORDER BY day
""")

# Test with persistent config
with spec.provide_session(persistent_db) as session:
# Create table from Parquet
session.execute(f"""
CREATE TABLE if not exists users AS
SELECT * FROM read_parquet('{Path(__file__).parent.parent / "queries/users.parquet"}')
""")

# Analytical query
session.execute("""
SELECT date_trunc('day', created_at) as day,
count(*) as user_count
FROM users
GROUP BY day
ORDER BY day
""")
finally:
# Close the pool for the persistent config
spec.get_config(in_memory_db).close_pool()
spec.get_config(persistent_db).close_pool()
# The TemporaryDirectory context manager handles directory cleanup automatically
# end-example
53 changes: 32 additions & 21 deletions docs/examples/usage/usage_drivers_and_querying_6.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Test module converted from docs example - code-block 6
"""Minimal smoke test for drivers_and_querying example 6."""

import tempfile
from pathlib import Path

from sqlspec import SQLSpec
Expand All @@ -12,24 +13,34 @@ def test_example_6_sqlite_config(tmp_path: Path) -> None:
# start-example
from sqlspec.adapters.sqlite import SqliteConfig

spec = SQLSpec()

database_file = tmp_path / "myapp.db"
config = SqliteConfig(pool_config={"database": database_file.name, "timeout": 5.0, "check_same_thread": False})

with spec.provide_session(config) as session:
# Create table
session.execute("""
CREATE TABLE IF NOT EXISTS usage6_users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
)
""")

# Insert with parameters
session.execute("INSERT INTO usage6_users (name) VALUES (?)", "Alice")

# Query
result = session.execute("SELECT * FROM usage6_users")
result.all()
# end-example
# Use a temporary file for the SQLite database for test isolation
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_db_file:
db_path = tmp_db_file.name

spec = SQLSpec()

db = spec.add_config(
SqliteConfig(pool_config={"database": db_path, "timeout": 5.0, "check_same_thread": False})
)

try:
with spec.provide_session(db) as session:
# Create table
session.execute("""
CREATE TABLE IF NOT EXISTS usage6_users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
)
""")

# Insert with parameters
session.execute("INSERT INTO usage6_users (name) VALUES (?)", "Alice")

# Query
result = session.execute("SELECT * FROM usage6_users")
result.all()
finally:
# Clean up the temporary database file
spec.get_config(db).close_pool()
Path(db_path).unlink()
# end-example
2 changes: 2 additions & 0 deletions docs/extensions/aiosql/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ AiosqlAsyncAdapter
:members:
:undoc-members:
:show-inheritance:
:no-index:

AiosqlSyncAdapter
-----------------
Expand All @@ -32,6 +33,7 @@ AiosqlSyncAdapter
:members:
:undoc-members:
:show-inheritance:
:no-index:

Query Operators
===============
Expand Down
1 change: 1 addition & 0 deletions docs/extensions/litestar/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ SQLSpecPlugin
:members:
:undoc-members:
:show-inheritance:
:no-index:

Configuration
=============
Expand Down
83 changes: 83 additions & 0 deletions docs/guides/architecture/data-dictionary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Data Dictionary & Introspection

SQLSpec provides a unified Data Dictionary API to introspect database schemas across all supported adapters. This allows you to retrieve table metadata, columns, indexes, and foreign keys in a consistent format, regardless of the underlying database engine.

## Core Concepts

The `DataDictionary` is accessed via the `driver.data_dictionary` property. It provides methods to query the database catalog.

### Introspection Capabilities

- **Tables**: List tables in a schema.
- **Columns**: Get column details (name, type, nullable, default).
- **Indexes**: Get index definitions (columns, uniqueness).
- **Foreign Keys**: Get foreign key constraints and relationships.
- **Topological Sorting**: Get tables sorted by dependency order (useful for cleanups or migrations).

## Usage

### Basic Introspection

```python
async with config.provide_session() as session:
# Get all tables in the default schema
tables = await session.data_dictionary.get_tables(session)
print(f"Tables: {tables}")

# Get columns for a specific table
columns = await session.data_dictionary.get_columns(session, "users")
for col in columns:
print(f"{col['column_name']}: {col['data_type']}")
```

### Topological Sort (Dependency Ordering)

`get_tables` now returns table names sorted such that parent tables appear before child tables (tables with foreign keys to parents).

This is essential for:

- **Data Loading**: Insert into parents first.
- **Cleanup**: Delete in reverse order to avoid foreign key violations.

```python
async with config.provide_session() as session:
# Get tables sorted parent -> child
sorted_tables = await session.data_dictionary.get_tables(session)

print("Insertion Order:", sorted_tables)
print("Deletion Order:", list(reversed(sorted_tables)))
```

**Implementation Details**:

- **Postgres / SQLite / MySQL 8+**: Uses efficient Recursive CTEs in SQL.
- **Oracle**: Uses `CONNECT BY` queries.
- **Others (BigQuery, MySQL 5.7)**: Falls back to a Python-based topological sort using `graphlib`.

### Metadata Types

SQLSpec uses regular classes with __slots__ for metadata results to ensure mypyc compatibility and memory efficiency.

```python
from sqlspec.driver import ForeignKeyMetadata

async with config.provide_session() as session:
fks: list[ForeignKeyMetadata] = await session.data_dictionary.get_foreign_keys(session, "orders")

for fk in fks:
print(f"FK: {fk.column_name} -> {fk.referenced_table}.{fk.referenced_column}")
```

## Adapter Support Matrix

| Feature | Postgres | SQLite | Oracle | MySQL | DuckDB | BigQuery |
|---------|----------|--------|--------|-------|--------|----------|
| Tables | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Columns | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Indexes | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| Foreign Keys | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Topological Sort | ✅ (CTE) | ✅ (CTE) | ✅ (Connect By) | ✅ (CTE/Python) | ✅ (CTE) | ✅ (Python) |

## API Reference

For a complete API reference of the Data Dictionary components, including `DataDictionaryMixin`, `AsyncDataDictionaryBase`, `SyncDataDictionaryBase`, and the metadata classes (`ForeignKeyMetadata`, `ColumnMetadata`, `IndexMetadata`), please refer to the :doc:`/reference/driver`.
37 changes: 37 additions & 0 deletions docs/reference/driver.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,43 @@ Connection Pooling
:undoc-members:
:show-inheritance:

Data Dictionary
===============

The Data Dictionary API provides standardized introspection capabilities across all supported databases.

.. currentmodule:: sqlspec.driver

.. autoclass:: DataDictionaryMixin
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: AsyncDataDictionaryBase
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: SyncDataDictionaryBase
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: ForeignKeyMetadata
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: ColumnMetadata
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: IndexMetadata
:members:
:undoc-members:
:show-inheritance:

Driver Protocols
================

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/drivers_and_querying.rst
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ Performance Tips
:start-after: # start-example
:end-before: # end-example
:caption: ``asyncpg connection pooling``
:dedent: 4
:dedent: 2

**2. Batch Operations**

Expand Down
Loading