-
Notifications
You must be signed in to change notification settings - Fork 0
Database Schema
Database schema for a PostgreSQL database.
The database table schema can be defined using SQL statements or using SqlAlchemy. Both SqlAlchemy Core and ORM components can create database tables from definitions given in Python code.
Each approach has its strengths and weaknesses
| Aspect | SqlAlchemy | Direct SQL |
|---|---|---|
| Integration | Models and logic defined in Python | Tables in SQL, logic in Python |
| DRY? | Yes - models define table structure | No - models duplicate table structure |
| Migrations | Automatic using Alembic | Manual, but maybe can automate w/ Alembic |
| Programmatic Reuse | Yes. Can regenerate tables from code. | Somewhat tied to database. Manual database recreation |
| Database agnostic | Yes | No |
| Type Safety | IDE support for type checking & hinting | Yes if you define models for tables |
| Model evolution | Schema automatically updated as models change | Manually revise schema. |
| Data validation | In code | Done by database |
| Precision schema | Less precise raw SQL, limited access to Postgres features | Full access to database capabilities |
| Docker Integration | Harder - need Python code to create database in db container | Easy & reproducable using init scripts in db container |
The old style syntax for defining a class for a Postgres ORM table model is something like this:
from sqlalchemy.orm import declarative_base
from sqlalchemy import Column, ForeignKey, func
# These "convenience" classes are aliases for classes in module sqlalchemy.sql.sqltypes
from sqlalchemy import String, Integer
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True, nullable=False)
username = Column(String(50), unique=True, nullable=False)
class UserPassword(Base):
hashed_password = Column(String, nullable=False)
user_id = Column(Integer,
ForeignKey("users.id", ondelete="CASCADE"),
nullable=False) The Official SqlAlchemy 2.0 style for this is:
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import Integer, String
from sqlalchemy.ext.async import AsyncAttrs
class Base(AsyncAttrs, DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(String(50), nullable=False)
class UserPassword(Base):
hashed_password: Mapped[str] = mapped_column(String, nullable=False)
user_id: Mapped[int] = mapped_column(ForeignKey("users.id", ondelete="CASCADE"), nullable=False)Reasons for the change:
- Type hints to enable type checking tools
- "Clarity" (ha!)
Explanation of Syntax
-
Mapped[int]is a generic marker meaning "this is an ORM-mapped attribute" -
mapped_column(...)creates a Column object under the hood, but designed to work with Python typing. - in
mapped_columnit is not necessary to specify the datatype if it can be inferred from the type hint - specify the datatype if needed to add detail (
String(50)) or disambiguate:created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True))
| Layer | Approach | Rationale |
|---|---|---|
| Initialization | Use sql in Docker | Quick and reproducable using init.sql in container volume |
| Application | Use SqlAlchemy | Clean Pythonic access to models at expense of duplication |
| Migrations | Use Alembic | Optional - use if schema evolves over time |
- install and add to requirements.txt if used in container
pip install alembic
- Initialize
alembic init alembic
- Configure
alembic.ini
sqlalchemy.url = postgresql+asyncpg://postgres:fat_chance@localhost:5432/homelog
In alembic/env.py import the models:
# import models
from app.models import user
target_metadata = user.Base.metadata- Generate a migration
alembic revision --autogenerate -m "add more fields to user"
alembic upgrade head
Note: Deepseek does not align schema elements in columns as shown below, which is much easier to read. Next time I would ask it to do this. ChatGPT prints column-aligned schema code as shown here.
Deepseek, modified by team to
- store password hashes in separate table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(100) UNIQUE,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
last_login TIMESTAMPTZ, -- optional
is_active BOOLEAN DEFAULT TRUE
);Not all users will have passwords and be authenticated locally. A better Schema is to store passwords in a separate table:
CREATE TABLE user_passwords (
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
password_hash TEXT NOT NULL,
salt TEXT NOT NULL, -- not needed for Argon2, which generates its own salt
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP -- most recent update
);Copilot: Consider using BYTEA instead of TEXT for password_hash and salt.
-
Argon2 and script hashes are binary strings, so
BYTEAis preferred. -
Wrong: The output of Argon2id's
hash()function is a readable string. - If you encode them using base64 or hex, then
TEXTis fine. -
bcrypt produces a printable ASCII string (e.g.,
$2b$12$...) so the recommended type isTEXT. - See Password Security for details of password hashing.
ChatGPT
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
display_name VARCHAR(100),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Deepseek:
(I had to remind Deepseek of authentication requirements to get it to create this table)
-- Separate table for OAuth identities
CREATE TABLE user_oauth_identities (
identity_id UUID PRIMARY KEY,
user_id UUID REFERENCES users(user_id) ON DELETE CASCADE,
provider VARCHAR(20) NOT NULL CHECK (provider IN ('line', 'google')),
provider_user_id VARCHAR(100) NOT NULL, -- LINE or Google's unique ID
email VARCHAR(100), -- From provider
name VARCHAR(100), -- From provider
profile_image TEXT, -- URL to avatar
access_token TEXT, -- Encrypted
refresh_token TEXT, -- Encrypted
token_expiry TIMESTAMPTZ,
UNIQUE(provider, provider_user_id) -- One identity per provider
);Note:
Storing access and refresh tokens allows the application to:
- Access user data on their behalf: Some APIs (like LINE, Google) require these tokens to make authorized requests for user data or actions.
- Maintain user sessions: Refresh tokens let you obtain new access tokens without requiring the user to log in again, providing a seamless experience.
- Support background tasks: Your app can perform actions (like syncing data) even when the user is not actively using the app.
Security:
- Only store tokens if your app needs to interact with the provider’s API after login.
- Always encrypt sensitive tokens at rest and follow security best practices.
ChatGPT
-- Link to external auth providers (LINE, Google, etc.)
CREATE TABLE auth_identities (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
provider VARCHAR(50) NOT NULL, -- e.g. 'line', 'google'
provider_user_id VARCHAR(255) NOT NULL, -- external user ID
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (provider, provider_user_id)
);Me: A better uniqueness constraint would be a user can have only 1 OAuth identity per provider:
UNIQUE (user_id, provider)- Homes (Locations)
Deekseek:
CREATE TABLE homes (
home_id UUID PRIMARY KEY,
name VARCHAR(100) NOT NULL,
address TEXT,
timezone VARCHAR(50) DEFAULT 'UTC',
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
created_by UUID REFERENCES users(user_id)
);Copilot: To preserve referential integrity, add ON DELETE SET NULL or ON DELETE CASCADE to created_by.
ChatGPT: No table suggested
Deepseek:
CREATE TABLE data_sources (
source_id UUID PRIMARY KEY,
name VARCHAR(100) NOT NULL,
description TEXT,
source_type VARCHAR(50) NOT NULL, -- 'electric', 'water', 'gas', etc.
unit VARCHAR(20) NOT NULL, -- 'kWh', 'gallons', 'm³', etc.
home_id UUID REFERENCES homes(home_id) ON DELETE SET NULL, -- Optional association
is_public BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
created_by UUID REFERENCES users(user_id)
);Copilot: To preserve referential integrity, add ON DELETE SET NULL or ON DELETE CASCADE to created_by.
ChatGPT:
-- A “meter” or other data‐source
CREATE TABLE data_sources (
id SERIAL PRIMARY KEY,
owner_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
description TEXT,
place_name VARCHAR(200),
address TEXT,
meter_number VARCHAR(100),
thumbnail_url TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Deepseek:
CREATE TABLE readings (
reading_id UUID PRIMARY KEY,
source_id UUID REFERENCES data_sources(source_id) ON DELETE CASCADE,
value DOUBLE PRECISION NOT NULL,
timestamp TIMESTAMPTZ NOT NULL,
recorded_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
recorded_by UUID REFERENCES users(user_id),
notes TEXT,
quality_flag VARCHAR(20) CHECK (quality_flag IN ('good', 'questionable', 'bad', 'estimated')),
metadata JSONB -- For additional context like weather conditions
);
-- Index for efficient time-series queries
CREATE INDEX idx_readings_source_timestamp ON readings(source_id, timestamp DESC);Copilot: To preserve referential integrity, add ON DELETE SET NULL or ON DELETE CASCADE to recorded_by.
ChatGPT:
-- Time-stamped readings for each source
CREATE TABLE data_entries (
id SERIAL PRIMARY KEY,
source_id INTEGER NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
value NUMERIC NOT NULL, -- integer or decimal
recorded_at TIMESTAMPTZ NOT NULL, -- when the reading was taken
created_by INTEGER NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ
);
CREATE INDEX ON data_entries(source_id, recorded_at);Copilot: To preserve referential integrity, add ON DELETE SET NULL or ON DELETE CASCADE to created_by.
A Many-to-Many Relationship between Users and Data Sources
Deepseek:
CREATE TABLE user_source_access (
access_id UUID PRIMARY KEY,
user_id UUID REFERENCES users(user_id) ON DELETE CASCADE,
source_id UUID REFERENCES data_sources(source_id) ON DELETE CASCADE,
access_level VARCHAR(20) NOT NULL CHECK (access_level IN ('read', 'write', 'admin')),
granted_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
granted_by UUID REFERENCES users(user_id),
UNIQUE(user_id, source_id) -- Ensures one access record per user-source combination
);ChatGPT:
-- Grant view/edit rights on a data_source to other users
CREATE TABLE source_permissions (
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
source_id INTEGER NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
can_edit BOOLEAN NOT NULL DEFAULT FALSE,
granted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (user_id, source_id)
);Only ChatGPT suggested this.
Actual data validation may be more complex, such as requiring non-decreasing values.
-- e.g. enforce min/max or pattern checks before inserting entries
CREATE TABLE data_validators (
id SERIAL PRIMARY KEY,
source_id INTEGER NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
min_value NUMERIC,
max_value NUMERIC,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Only ChatGPT suggested this table.
-- Log all access, creations, and edits in an append-only table
CREATE TABLE audit_logs (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
source_id INTEGER REFERENCES data_sources(id),
entry_id INTEGER REFERENCES data_entries(id),
event_type VARCHAR(50) NOT NULL, -- e.g. 'view_source', 'create_entry', 'update_entry'
event_time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
details JSONB NOT NULL -- e.g. { "old_value": 123, "new_value": 125 }
);
CREATE INDEX ON audit_logs(event_time);Deepseek did not offer rationale for its schema. See Rationale Q-and-A below for Deepseek answers to specific questions.
-
Timezones: All timestamps use
TIMESTAMPTZto preserve exact moments. - Foreign-key cascades: Deleting a user or source automatically removes related records.
-
Indexes: Add additional indexes as needed, e.g.
on audit_logs(user_id)ordata_entries(created_by). -
Triggers: You can implement triggers on data_entries to enforce validator rules using
data_validatorsand to write audit records automatically.
This schema is designed to satisfy the core and aspirational requirements—tracking multiple sources per user, sharing access, rich metadata, and immutable audit logs—while remaining straightforward to implement and extend.
-
Token Storage
- Encrypt access/refresh tokens in database
- Use HTTP-only, Secure-Flag cookies for web JWTs
- Android: Store in
EncryptedSharedPreferences - Rationale: Prevents token theft via XSS attacks or database breaches |
-
Token Validation
- Verify LINE/Google ID tokens on the backend
- Set JWT expiration, e.g., 1 hour for access, 7 days for refresh
- Rationale: Mitigates replay attacks and limits exposure |
-
Data Access
- Row-Level Security (RLS) in PostgreSQL:
CREATE POLICY user_source_access ON readings USING (source_id IN ( SELECT source_id FROM user_source_access WHERE user_id = current_user_id() ) ); - **Rationale**: Enforces "users can only access sources they have permissions for" at DB level
-
API Security
- Rate-limit, e.g., 100 requests/minute
- CORS restricted to your domains
- HSTS headers enforced
- Rationale: Prevents brute-force attacks and unauthorized cross-origin access |
-
OAuth Hardening
- Store client secrets in environment variables
- Validate
redirect_uristrictly - Use PKCE for mobile apps
- Rationale: Blocks OAuth injection and redirect hijacking | |
-
LINE-Specific
- Verify the
nonceparameter to prevent CSRF - Check
client_idmatches your LINE Channel ID
- Verify the
-
Google-Specific
- Validate
hd(hosted domain) parameter if restricting to organizational emails - Use
prompt=select_accountto prevent accidental session reuse
- Validate
-
Unified JWT Claims
Include both authentication methods in your JWT:{ "sub": "user_123", "auth_provider": "line|google", "auth_time": 1625097600, "scope": "read:data write:data" }
Q: Why use UUID for ID fields instead of SERIAL?
Deepseek:
UUID Pros:
- Global Uniqueness: safe for distributed systems, e.g. mobile apps with offline data sync
- Avoid lock contention when multiple users create records simultaneously
- Obfuscation: harder to scan/guess, e.g.
/users/a1b2c3d4-xxxvs/users/102 - Merge friendly: no id collisions if merge databases
UUID Cons:
- More storage: 16 bytes for UUID versus 4 bytes for SERIAL
- Index fragmentation. Use
UUIDv7(time ordered) to improve index locality. Example for PostgreSQL 14+:reading_id UUID DEFAULT uuid_generate_v7() PRIMARY KEY, - Readability - hard for humans to read. For this app, ids would not normally be displayed, but may make log files harder to read.
SERIAL or modern equivalent GENERATED AS IDENTITY Pros:
- Simple: easy to understand and debug
- Human friendly. More legible in logs.
- Slightly faster to index using B-trees. Indexes are more compact.
- Integer indices yield better query performance, esp. for large tables.
-
Monolithic Growth: Newer rows always have higher keys, which helps keep inserts localized to the right end of the index and reduces page splits.
- Migitation: you can use time-ordered UUIDs, e.g. UUIDv7 or ULID, but that adds complexity.
When to prefer SERIAL:
- Small, single-server app
- Tables with less than 100K records
- Where integer joins are critical for performance
If using UUID, then to optimize for analytics use:
-- Add a SERIAL column for internal reporting
ALTER TABLE readings ADD COLUMN reading_number SERIAL;Deepseek Recommendation for HomeLog: Stick with UUIDs because:
- Your
user_source_accesstable benefits from collision-safe distributed writes - Mobile clients may generate IDs offline (readings)
- Future-proof for scaling (sharding, multi-region)
ChatGPT Recommendation for HomeLog:
Given HomeLog’s scope — only a handful of users, centralized deployment on Supabase/Railway, and no need for offline key generation, you’ll benefit most from using integer-based primary keys:
- Performance: Smaller indexes and faster joins on time-series data (your readings table will grow quickly).
- Simplicity: Easier schema introspection, debugging, and meaningful URLs.
- Operational overhead: No additional complexity for key generation or migration.
If in the future HomeLog evolves into a massively distributed system with cross-region writes or must merge data across silos, you can migrate to UUIDs (or ULIDs) at that time.
For now, stick with SERIAL (or better yet, GENERATED AS IDENTITY) for clean, performant, and maintainable design.
Deepseek Recommendation if Off-line Writes are Not Required
If you don't require offline writes and your system operates purely in a centralized, online-only mode, the recommendation shifts toward traditional sequential IDs. Here's the updated guidance:
Use GENERATED ALWAYS AS IDENTITY (PostgreSQL 10+):
CREATE TABLE readings (
reading_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY, -- Auto-incrementing integer
source_id INTEGER NOT NULL REFERENCES data_sources(source_id),
value DOUBLE PRECISION NOT NULL,
timestamp TIMESTAMPTZ NOT NULL
);Why?
- Better performance (4-byte integers vs 16-byte UUIDs)
- Natural sorting (no fragmentation)
- Simpler joins (integer indexes are smaller/faster)
- Built-in sequence tracking (no risk of duplicates)
Add a separate public_id UUID column if you need obscurity:
CREATE TABLE users (
user_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
public_id UUID DEFAULT gen_random_uuid() UNIQUE, -- For URLs/APIs
email TEXT UNIQUE NOT NULL
);Usage:
-
Internal joins: Use
user_id(integer) -
Public APIs: Expose only
public_id(UUID)
| Factor | With Offline Writes | Online-Only (New) |
|---|---|---|
| Primary Key | UUID | GENERATED AS IDENTITY |
| Sorting | Requires UUIDv7 | Natural integer order |
| Storage | 16 bytes per ID | 4 bytes per ID |
| Joins | Slightly slower | Optimized |
| Distributed Writes | Safe | Not applicable |
- If you anticipate future sharding/multi-region deployment
- For any tables that might sync with mobile devices later
- 20-30% faster joins on integer keys (based on pgBench tests)
- ~15% storage reduction for large tables
For HomeLog's scale, this is likely overkill—but it future-proofs your schema if analytics become critical.