## 1. Table/Column metadata in SQLAlchemy

### üß© What SQLAlchemy is

SQLAlchemy is a database toolkit for Python.
It has two main layers:
1) Core ‚Äì low-level, explicit SQL abstraction. You build queries with Python objects that compile to SQL.
2) ORM (Object Relational Mapper) ‚Äì higher-level layer that maps your Python classes to database tables. 

You can use one, the other, or both ‚Äî that flexibility is a big reason why devs love it.

### ‚öôÔ∏è Typical Features
#### Feature	Description
| Feature | Description |
|----------|--------------|
| **Cross-DB support** | Works with SQLite, PostgreSQL, MySQL, SQL Server, Oracle, etc. |
| **Connection pooling** | Built-in connection management and pooling. |
| **Transactions** | Automatic handling with `session.commit()` and rollback on errors. |
| **Migrations** | Via **Alembic** (official companion tool). |
| **Relationships** | Define one-to-many, many-to-many mappings via `relationship()`. |
| **Hybrid mode** | You can freely mix ORM objects with Core queries. |

#### üß∞ When to use which layer
| Use case | Recommended layer |
|-----------|-------------------|
| Fast prototyping | ORM |
| Simple queries, small project	 | ORM |
| Heavy analytics, complex joins, raw SQL tuning | Core |
| Performance-critical backend | Mix both |

#### üöÄ Common connection URIs
| Database | Example URI |
|-----------|-------------------|
| SQLite | sqlite:///data.db |
| PostgreSQL | postgresql+psycopg2://user:pass@localhost/dbname |
| MySQL | mysql+pymysql://user:pass@localhost/dbname |
| MSSQL | mssql+pyodbc://user:pass@dsn |

#### üß© Integration
You‚Äôll see SQLAlchemy everywhere:
- FastAPI / Flask backends
- LangChain and LlamaIndex use it to store metadata or cache results
- Alembic handles schema migrations
- Works well with pandas (pd.read_sql, df.to_sql)

#### üß© When the ORM shines
‚úÖ Use the ORM when you:
- Build an application (API, web backend, admin dashboard, etc.).
- Need to store and retrieve objects in a consistent, type-safe way.
- Want to let developers focus on logic, not SQL details.
- Have moderate query complexity ‚Äî not huge analytics workloads.
- Want database independence (SQLite locally, Postgres in prod).

Example: FastAPI app, Flask app, or LangChain tool that stores user data.

In [2]:
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
import json

In [3]:
# Connect to SQLite (you can use PostgreSQL, MySQL, etc.)
engine = create_engine("sqlite:///data/sqlalchemy.db")
metadata = MetaData()

In [4]:
#creatie van de tabellen users and products 
users = Table(
    "users", metadata,
    Column("id", Integer, primary_key=True, comment="Surrogate key"),
    Column("name", String, comment="Full display name of the user"),
    Column("email", String, comment="Primary email address"),
    comment="End-user accounts table (one row per person).",
    info={
        "purpose": "Holds user identities for authentication & notifications",
        "owner": "product-platform",
        "pii": True,
        "llm_notes": "Names/emails are personal data; avoid dumping raw values."
    },
)


products = Table(
    "products", metadata,
    Column("id", Integer, primary_key=True, comment="Surrogate key"),
    Column("name", String, comment="Name of the product"),
    Column("price", Integer, comment="Price of the product"),
    info={
        "purpose": "Holds products for sale",
        "owner": "product-platform",
        "pii": False,
        "llm_notes": "Products are not personal data; dump raw values."
    },
)

object=metadata.create_all(engine)
print(object)


None


In [5]:
# Nice print of the metadata
def serialize_metadata(metadata):
    result = {}
    for table_name, table in metadata.tables.items():
        result[table_name] = {
            "comment": table.comment,
            "info": table.info,
            "columns": [
                {
                    "name": col.name,
                    "type": str(col.type),
                    "primary_key": col.primary_key,
                    "nullable": col.nullable,
                    "comment": col.comment,
                    "info": col.info,
                }
                for col in table.columns
            ],
            "constraints": [str(c) for c in table.constraints],
        }
    return result

# Example usage:
print(json.dumps(serialize_metadata(metadata), indent=2))

{
  "users": {
    "comment": "End-user accounts table (one row per person).",
    "info": {
      "purpose": "Holds user identities for authentication & notifications",
      "owner": "product-platform",
      "pii": true,
      "llm_notes": "Names/emails are personal data; avoid dumping raw values."
    },
    "columns": [
      {
        "name": "id",
        "type": "INTEGER",
        "primary_key": true,
        "nullable": false,
        "comment": "Surrogate key",
        "info": {}
      },
      {
        "name": "name",
        "type": "VARCHAR",
        "primary_key": false,
        "nullable": true,
        "comment": "Full display name of the user",
        "info": {}
      },
      {
        "name": "email",
        "type": "VARCHAR",
        "primary_key": false,
        "nullable": true,
        "comment": "Primary email address",
        "info": {}
      }
    ],
    "constraints": [
      "PrimaryKeyConstraint(Column('id', Integer(), table=<users>, primary_key=True, n

In [6]:
# Insert data
with engine.connect() as conn:
    conn.execute(users.insert().values(name="Jan", email="jan@sirus.ai"))
    conn.execute(products.insert().values(name="Product 1", price=100))
    conn.execute(products.insert().values(name="Product 2", price=200))
    conn.commit()

    # Query data
    result_users = conn.execute(users.select())
    for row in result_users:
        print(row)
    result_products = conn.execute(products.select())
    for row in result_products:
        print(row)

(1, 'Jan', 'jan@sirus.ai')
(1, 'Product 1', 100)
(2, 'Product 2', 200)


## 2. ORM (Declarative)

In [7]:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker, DeclarativeBase, Mapped, mapped_column

In [8]:
class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    __table_args__ = {
        "comment": "End-user accounts table (one row per person).",
        "info": {
            "purpose": "Holds user identities for authentication & notifications",
            "owner": "product-platform",
            "pii": True,
            "llm_notes": "Names/emails are personal data; avoid dumping raw values."
        },
    }

    id: Mapped[int] = mapped_column(primary_key=True, comment="Surrogate key")
    name: Mapped[str] = mapped_column(String, comment="Full display name")
    email: Mapped[str] = mapped_column(String, comment="Primary email")

In [9]:
engine = create_engine("sqlite:///data/sqlalchemy2.db")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

# Create and query
new_user = User(name="Jan", email="jan@sirus.ai")
session.add(new_user)
session.commit()

for user in session.query(User).all():
    print(user.name, user.email)

Jan jan@sirus.ai


In [10]:
users_reflected = Table("users", metadata, autoload_with=engine)

print(users_reflected.comment)          # Table comment (DB)
print(users_reflected.info)             # Python dict (if you set it in-process)
print(users_reflected.c.email.comment)  # Column comment

End-user accounts table (one row per person).
{'purpose': 'Holds user identities for authentication & notifications', 'owner': 'product-platform', 'pii': True, 'llm_notes': 'Names/emails are personal data; avoid dumping raw values.'}
Primary email address


In [11]:

def serialize_for_llm(metadata):
    out = {}
    for name, table in metadata.tables.items():
        out[name] = {
            "comment": table.comment,
            "info": table.info,  # your rich hints
            "columns": [
                {
                    "name": col.name,
                    "type": str(col.type),
                    "nullable": col.nullable,
                    "primary_key": col.primary_key,
                    "comment": col.comment,
                    "info": col.info,  # also available on Column
                }
                for col in table.columns
            ],
            "foreign_keys": [
                {
                    "column": fk.parent.name,
                    "references": f"{fk.column.table.name}.{fk.column.name}",
                }
                for fk in table.foreign_keys
            ],
        }
    return out

print(json.dumps(serialize_for_llm(metadata), indent=2))

{
  "users": {
    "comment": "End-user accounts table (one row per person).",
    "info": {
      "purpose": "Holds user identities for authentication & notifications",
      "owner": "product-platform",
      "pii": true,
      "llm_notes": "Names/emails are personal data; avoid dumping raw values."
    },
    "columns": [
      {
        "name": "id",
        "type": "INTEGER",
        "nullable": false,
        "primary_key": true,
        "comment": "Surrogate key",
        "info": {}
      },
      {
        "name": "name",
        "type": "VARCHAR",
        "nullable": true,
        "primary_key": false,
        "comment": "Full display name of the user",
        "info": {}
      },
      {
        "name": "email",
        "type": "VARCHAR",
        "nullable": true,
        "primary_key": false,
        "comment": "Primary email address",
        "info": {}
      }
    ],
    "foreign_keys": []
  },
  "products": {
    "comment": null,
    "info": {
      "purpose": "Holds prod