# Repositories Smoke Testing (Stage 3)

## Objective
Validate repository contracts (`CandidateRepository`, `ResumeRepository`, `ResumeSectionRepository`) including deterministic identity-key upsert and `content_hash` roundtrip lookups.

## Prerequisites
1. Local DB must be reachable via `DATABASE_URL`.
2. Migrations should already be applied.
3. This notebook writes test records prefixed with `nb_test_`.


In [1]:
from datetime import datetime, timezone

from notebooks._utils import print_checkpoint
from src.storage.db import get_session
from src.storage.repositories import CandidateRepository, ResumeRepository, ResumeSectionRepository

RUN_DB_SMOKE = True  # set False to skip DB writes in this notebook

session = get_session()
print("Session created")
print_checkpoint("setup complete")


Session created
[checkpoint] setup complete


## 1) Candidate identity-key upsert + field merge


In [2]:
if not RUN_DB_SMOKE:
    print("Skipping DB smoke tests (RUN_DB_SMOKE=False)")
else:
    c_repo = CandidateRepository(session)
    identity_key = f"candidate:v1:nb_test_{datetime.now(timezone.utc).strftime('%Y%m%d%H%M%S')}"

    c1, created1 = c_repo.get_or_create_by_identity_key(
        identity_key=identity_key,
        name="NB Test Candidate",
        email="nb_test@example.com",
    )
    c2, created2 = c_repo.get_or_create_by_identity_key(
        identity_key=identity_key,
        phone="+14155550100",
    )

    assert created1 is True, "first get_or_create should create"
    assert created2 is False, "second get_or_create should reuse"
    assert c1.id == c2.id, "idempotent lookup returned different records"
    assert c2.phone == "+14155550100", "identity merge should fill missing phone"

    print("candidate_id:", c1.id)
    print("identity_key:", identity_key)
    print_checkpoint("candidate identity-key assertions passed")


candidate_id: 2
identity_key: candidate:v1:nb_test_20260223195653
[checkpoint] candidate identity-key assertions passed


## 2) Resume create + source/content hash roundtrip


In [3]:
if RUN_DB_SMOKE:
    r_repo = ResumeRepository(session)
    ts = datetime.now(timezone.utc).strftime('%Y%m%d%H%M%S')
    source_file = f"nb_test_resume_{ts}.pdf"
    content_hash = f"nb_test_content_hash_{ts}"

    resume = r_repo.create(
        candidate_id=c1.id,
        source_file=source_file,
        content_hash=content_hash,
        raw_text="nb_test raw text",
        parsed_json={"clean_text": "nb_test clean", "section_names": ["summary"]},
        language="en",
    )

    fetched_by_source = r_repo.get_by_source_file(source_file)
    fetched_by_hash = r_repo.get_by_content_hash(content_hash)

    assert fetched_by_source is not None, "resume source roundtrip failed"
    assert fetched_by_hash is not None, "resume content hash roundtrip failed"
    assert fetched_by_source.id == fetched_by_hash.id
    assert fetched_by_source.language == "en"
    assert fetched_by_source.content_hash == content_hash

    print("resume_id:", resume.id)
    print_checkpoint("resume roundtrip assertions passed")


resume_id: 1
[checkpoint] resume roundtrip assertions passed


## 3) Resume section persistence with diagnostics metadata


In [4]:
if RUN_DB_SMOKE:
    s_repo = ResumeSectionRepository(session)
    section = s_repo.create(
        resume_id=resume.id,
        section_type="summary",
        content="nb_test section content",
        metadata_json={
            "origin": "notebook_smoke",
            "section_confidence": 0.9,
            "diagnostic_flags": ["short_content"],
            "recategorization_candidate": None,
        },
        tokens=4,
    )

    assert section.id is not None
    assert section.section_type == "summary"
    assert section.tokens == 4
    assert section.metadata_json["section_confidence"] == 0.9

    print("section_id:", section.id)
    print_checkpoint("section persistence assertions passed")


section_id: 1
[checkpoint] section persistence assertions passed


## 4) Commit and teardown guidance

1. Commit only notebook test records you intentionally inserted.
2. Test records are prefixed with `nb_test_` for easy cleanup.
3. You can roll back instead of commit while iterating.


In [None]:
if RUN_DB_SMOKE:
    session.commit()
    print("Committed notebook smoke records")
else:
    session.rollback()

session.close()
print_checkpoint("session closed")


## Summary (fill after run)

- Candidate identity-key idempotency: pass/fail
- Resume source/content-hash roundtrip: pass/fail
- Section diagnostics metadata persistence: pass/fail
- DB connectivity issues observed: yes/no


## Next actions

1. If any assertion fails, reproduce with a focused unit test in `tests/`.
2. Keep notebook-only records identifiable via `nb_test_` prefix.
3. If schema issues appear, verify migrations and model alignment first.
