Database Schema

Database schema for a PostgreSQL database.

Schema Suggested by AI Tools

Note: Deepseek does not align schema elements in columns as shown below, which is much easier to read. Next time I would ask it to do this. ChatGPT prints column-aligned schema code as shown here.

1. Users

Deepseek:

CREATE TABLE users (
    user_id       UUID         PRIMARY KEY,
    username      VARCHAR(50)  UNIQUE,
    email         VARCHAR(100) UNIQUE,
    password_hash VARCHAR(255), -- Only for local authentication (optional)
    created_at    TIMESTAMP    WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    last_login    TIMESTAMP    WITH TIME ZONE,
    is_active     BOOLEAN      DEFAULT TRUE
);

ChatGPT

CREATE TABLE users (
  id             SERIAL        PRIMARY KEY,
  email          VARCHAR(255)  UNIQUE NOT NULL,
  display_name   VARCHAR(100),
  created_at     TIMESTAMPTZ   NOT NULL DEFAULT NOW()
);

2. User Identities

Deepseek:
(I had to remind Deepseek of authentication requirements to get it to create this table)

-- Separate table for OAuth identities
CREATE TABLE user_oauth_identities (
    identity_id       UUID          PRIMARY KEY,
    user_id           UUID          REFERENCES users(user_id) ON DELETE CASCADE,
    provider          VARCHAR(20)   NOT NULL CHECK (provider IN ('line', 'google')),
    provider_user_id  VARCHAR(100)  NOT NULL, -- LINE or Google's unique ID
    email             VARCHAR(100), -- From provider
    name              VARCHAR(100), -- From provider
    profile_image     TEXT,         -- URL to avatar
    access_token      TEXT,         -- Encrypted
    refresh_token     TEXT,         -- Encrypted
    token_expiry      TIMESTAMP     WITH TIME ZONE,
    UNIQUE(provider, provider_user_id) -- One identity per provider
);

ChatGPT

-- Link to external auth providers (LINE, Google, etc.)
CREATE TABLE auth_identities (
  id                  SERIAL       PRIMARY KEY,
  user_id             INTEGER      NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  provider            VARCHAR(50)  NOT NULL,        -- e.g. 'line', 'google'
  provider_user_id    VARCHAR(255) NOT NULL,        -- external user ID
  created_at          TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  UNIQUE (provider, provider_user_id)
);

Homes (Locations)

Deekseek:

CREATE TABLE homes (
    home_id           UUID          PRIMARY KEY,
    name              VARCHAR(100)  NOT NULL,
    address           TEXT,
    timezone          VARCHAR(50) DEFAULT 'UTC',
    created_at        TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    created_by        UUID REFERENCES users(user_id)
);

ChatGPT: No table suggested

3. Data Sources

Deepseek:

CREATE TABLE data_sources (
    source_id      UUID          PRIMARY KEY,
    name           VARCHAR(100)  NOT NULL,
    description    TEXT,
    source_type    VARCHAR(50)   NOT NULL,  -- 'electric', 'water', 'gas', etc.
    unit           VARCHAR(20)   NOT NULL,         -- 'kWh', 'gallons', 'm³', etc.
    home_id        UUID          REFERENCES homes(home_id) ON DELETE SET NULL,  -- Optional association
    is_public      BOOLEAN       DEFAULT FALSE,
    created_at     TIMESTAMP     WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    created_by     UUID          REFERENCES users(user_id)
);

ChatGPT:

-- A “meter” or other data‐source
CREATE TABLE data_sources (
  id             SERIAL        PRIMARY KEY,
  owner_id       INTEGER       NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  name           VARCHAR(100)  NOT NULL,
  description    TEXT,
  place_name     VARCHAR(200),
  address        TEXT,
  meter_number   VARCHAR(100),
  thumbnail_url  TEXT,
  created_at     TIMESTAMPTZ   NOT NULL DEFAULT NOW()
);

4. Readings or Data Entries

Deepseek:

CREATE TABLE readings (
    reading_id     UUID         PRIMARY KEY,
    source_id      UUID         REFERENCES data_sources(source_id) ON DELETE CASCADE,
    value          DOUBLE PRECISION NOT NULL,
    timestamp      TIMESTAMP    WITH TIME ZONE NOT NULL,
    recorded_at    TIMESTAMP    WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    recorded_by    UUID         REFERENCES users(user_id),
    notes          TEXT,
    quality_flag   VARCHAR(20)  CHECK (quality_flag IN ('good', 'questionable', 'bad', 'estimated')),
    metadata       JSONB        -- For additional context like weather conditions
);

-- Index for efficient time-series queries
CREATE INDEX idx_readings_source_timestamp ON readings(source_id, timestamp DESC);

ChatGPT:

-- Time-stamped readings for each source
CREATE TABLE data_entries (
  id             SERIAL       PRIMARY KEY,
  source_id      INTEGER      NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
  recorded_at    TIMESTAMPTZ  NOT NULL,             -- when the reading was taken
  value          NUMERIC      NOT NULL,             -- integer or decimal
  created_by     INTEGER      NOT NULL REFERENCES users(id),
  created_at     TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  updated_at     TIMESTAMPTZ
);
CREATE INDEX ON data_entries(source_id, recorded_at);

5. Permissions

A Many-to-Many Relationship between Users and Data Sources

Deepseek:

CREATE TABLE user_source_access (
    access_id     UUID        PRIMARY KEY,
    user_id       UUID        REFERENCES users(user_id)          ON DELETE CASCADE,
    source_id     UUID        REFERENCES data_sources(source_id) ON DELETE CASCADE,
    access_level  VARCHAR(20) NOT NULL CHECK (access_level IN ('read', 'write', 'admin')),
    granted_at    TIMESTAMP   WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    granted_by    UUID        REFERENCES users(user_id),
    UNIQUE(user_id, source_id)  -- Ensures one access record per user-source combination
);

ChatGPT:

-- Grant view/edit rights on a data_source to other users
CREATE TABLE source_permissions (
  user_id      INTEGER     NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  source_id    INTEGER     NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
  can_edit     BOOLEAN     NOT NULL DEFAULT FALSE,
  granted_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (user_id, source_id)
);

6. Optional Data Validators

Only ChatGPT suggested this.

Actual data validation may be more complex, such as requiring non-decreasing values.

-- e.g. enforce min/max or pattern checks before inserting entries
CREATE TABLE data_validators (
  id             SERIAL      PRIMARY KEY,
  source_id      INTEGER     NOT NULL REFERENCES data_sources(id) ON DELETE CASCADE,
  min_value      NUMERIC,
  max_value      NUMERIC,
  created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

7. Audit Logging

Only ChatGPT suggested this table.

-- Log all access, creations, and edits in an append-only table
CREATE TABLE audit_logs (
  id             SERIAL       PRIMARY KEY,
  user_id        INTEGER      REFERENCES users(id),
  source_id      INTEGER      REFERENCES data_sources(id),
  entry_id       INTEGER      REFERENCES data_entries(id),
  event_type     VARCHAR(50)  NOT NULL,        -- e.g. 'view_source', 'create_entry', 'update_entry'
  event_time     TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
  details        JSONB        NOT NULL        -- e.g. { "old_value": 123, "new_value": 125 }
);
CREATE INDEX ON audit_logs(event_time);

Rationale

Deepseek

Deepseek did not offer rationale for its schema. See Rationale Q-and-A below for Deepseek answers to specific questions.

ChatGPT's Rationale

Timezones: All timestamps use TIMESTAMPTZ to preserve exact moments.
Foreign-key cascades: Deleting a user or source automatically removes related records.
Indexes: Add additional indexes as needed (e.g. on audit_logs(user_id) or data_entries(created_by)).
Triggers: You can implement triggers on data_entries to enforce validator rules (using data_validators) and to write audit records automatically.

This schema is designed to satisfy the core and aspirational requirements—tracking multiple sources per user, sharing access, rich metadata, and immutable audit logs—while remaining straightforward to implement and extend.

Security Considerations

Token Storage
- Encrypt access/refresh tokens in database
- Use HTTP-only, Secure-Flag cookies for web JWTs
- Android: Store in EncryptedSharedPreferences
- Rationale: Prevents token theft via XSS attacks or database breaches |
Token Validation
- Verify LINE/Google ID tokens on backend
- Set JWT expiration (e.g., 1 hour for access, 7 days for refresh)
- Rationale: Mitigates replay attacks and limits exposure |

Data Access

Row-Level Security (RLS) in PostgreSQL:

CREATE POLICY user_source_access
ON readings USING (source_id IN (
     SELECT source_id FROM user_source_access WHERE user_id = current_user_id()
                  )   ); 
- **Rationale**: Enforces "users can only access sources they have permissions for" at DB level

API Security
- Rate-limiting (e.g., 100 requests/minute)
- CORS restricted to your domains
- HSTS headers enforced
- Rationale: Prevents brute-force attacks and unauthorized cross-origin access |
OAuth Hardening
- Store client secrets in environment variables
- Validate redirect_uri strictly
- Use PKCE for mobile apps
- Rationale: Blocks OAuth injection and redirect hijacking | |

Key Additions for LINE/Google Auth

LINE-Specific
- Verify the nonce parameter to prevent CSRF
- Check client_id matches your LINE Channel ID
Google-Specific
- Validate hd (hosted domain) parameter if restricting to organizational emails
- Use prompt=select_account to prevent accidental session reuse

Unified JWT Claims
Include both authentication methods in your JWT:

{
  "sub": "user_123",
  "auth_provider": "line|google",
  "auth_time": 1625097600,
  "scope": "read:data write:data"
}

Rationale Q-and-A

Q: Why use UUID for ID fields instead of SERIAL?

Deepseek:

UUID Pros:

Global Uniqueness: safe for distributed systems, e.g. mobile apps with offline data sync
Avoid lock contention when multiple users create records simultaneously
Obfuscation: harder to scan/guess, e.g. /users/a1b2c3d4-xxx vs /users/102
Merge friendly: no id collisions if merge databases

UUID Cons:

More storage: 16 bytes for UUID versus 4 bytes for SERIAL
Index fragmentation. Use UUIDv7 (time ordered) to improve index locality. Example for PostgreSQL 14+:
```
reading_id UUID  DEFAULT uuid_generate_v7() PRIMARY KEY,
```
Readability - hard for humans to read. For this app, ids would not normally be displayed, but may make log files harder to read.

SERIAL (or modern equivalent GENERATED AS IDENTITY) Pros:

Simple: easy to understand and debug
Human friendly. More legible in logs.
Slightly faster to index using B-trees. Indexes are more compact.
Integer indices yield better query performance, esp. for large tables.
Monolithic Growth: Newer rows always have higher keys, which helps keep inserts locaized to the right end of the index and reduces page splits.
- Migitation: you can use time-ordered UUIDs, e.g. UUIDv1 or ULID, but that adds complexity. When to prefer SERIAL:
Small, single-server app
Tables with less than 100K records
Where integer joins are critical for performance

If using UUID, then to optimize for analytics use:

   -- Add a SERIAL column for internal reporting
   ALTER TABLE readings ADD COLUMN reading_number SERIAL;

Deepseek Recommendation for HomeLog: Stick with UUIDs because:

Your user_source_access table benefits from collision-safe distributed writes
Mobile clients may generate IDs offline (readings)
Future-proof for scaling (sharding, multi-region)

ChatGPT Recommendation for HomeLog:
Given HomeLog’s scope — only a handful of users, centralized deployment on Supabase/Railway, and no need for offline key generation, you’ll benefit most from using integer-based primary keys:

Performance: Smaller indexes and faster joins on time-series data (your readings table will grow quickly).
Simplicity: Easier schema introspection, debugging, and meaningful URLs.
Operational overhead: No additional complexity for key generation or migration.

If in the future HomeLog evolves into a massively distributed system with cross-region writes or must merge data across silos, you can migrate to UUIDs (or ULIDs) at that time.

For now, stick with SERIAL (or better yet, GENERATED AS IDENTITY) for clean, performant, and maintainable design.

Deepseek Recommendation if Off-line Writes are Not Required

If you don't require offline writes and your system operates purely in a centralized, online-only mode, the recommendation shifts toward traditional sequential IDs. Here's the updated guidance:

1. For Most Tables use GENERATED AS IDENTITY

Use GENERATED ALWAYS AS IDENTITY (PostgreSQL 10+):

CREATE TABLE readings (
    reading_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,  -- Auto-incrementing integer
    source_id INTEGER NOT NULL REFERENCES data_sources(source_id),
    value DOUBLE PRECISION NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL
);

Why?

Better performance (4-byte integers vs 16-byte UUIDs)
Natural sorting (no fragmentation)
Simpler joins (integer indexes are smaller/faster)
Built-in sequence tracking (no risk of duplicates)

2. For Tables Needing External IDs

Add a separate public_id UUID column if you need obscurity:

CREATE TABLE users (
    user_id     INTEGER      PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    public_id   UUID         DEFAULT gen_random_uuid() UNIQUE,  -- For URLs/APIs
    email       TEXT         UNIQUE NOT NULL
);

Usage:

Internal joins: Use user_id (integer)
Public APIs: Expose only public_id (UUID)

Key Changes from Original Recommendation

Factor	With Offline Writes	Online-Only (New)
Primary Key	UUID	`GENERATED AS IDENTITY`
Sorting	Requires UUIDv7	Natural integer order
Storage	16 bytes per ID	4 bytes per ID
Joins	Slightly slower	Optimized
Distributed Writes	Safe	Not applicable

When to Still Use UUIDs

If you anticipate future sharding/multi-region deployment
For any tables that might sync with mobile devices later

Performance Impact

20-30% faster joins on integer keys (based on pgBench tests)
~15% storage reduction for large tables

For HomeLog's scale, this is likely overkill—but it future-proofs your schema if analytics become critical.

Database Schema

Schema Suggested by AI Tools

1. Users

2. User Identities

3. Data Sources

4. Readings or Data Entries

5. Permissions

6. Optional Data Validators

7. Audit Logging

Rationale

Deepseek

ChatGPT's Rationale

Security Considerations

Key Additions for LINE/Google Auth

Rationale Q-and-A

1. For Most Tables use GENERATED AS IDENTITY

2. For Tables Needing External IDs

Key Changes from Original Recommendation

When to Still Use UUIDs

Performance Impact

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally