# Level 2 - Week 2 - 03 Ingest and Query Workflow

**Estimated time:** 60-90 minutes

## Learning Objectives

- Define ingest and query flow
- Log inputs and outputs
- Print results in a debuggable way


## Overview

Your Week 2 deliverable is an end-to-end debugging loop:

- ingest
- query
- inspect what came back
- adjust chunking/metadata

### Underlying theory: reproducibility comes from stable inputs + stable outputs

To compare runs, you want:

- deterministic ordering of ingested files
- stable `chunk_id`s (idempotent upserts)
- consistent printing/logging of retrieval outputs

## Practice Steps

- Sketch ingest and query functions (shapes only).
- Print ranked results with `rank`, `score`, `doc_id`, `chunk_id`, and a text preview.
- Make the output format stable so you can diff it across runs.

### Sample code

Stub ingest and query functions with clear outputs.


In [None]:
def ingest_document(doc_id: str, text: str) -> dict:
    # TODO: parse, chunk, embed, upsert
    return {'doc_id': doc_id, 'chunks_indexed': 0}


def query_index(query: str, top_k: int = 5) -> list[dict]:
    # TODO: replace with vector DB query
    return []


### Student fill-in

Print ranked hits with score and preview.


In [None]:
def print_hits(hits: list[dict], preview_chars: int = 120) -> None:
    for i, hit in enumerate(hits, start=1):
        text = hit.get("text", "") or ""
        preview = (text[:preview_chars] + "...") if len(text) > preview_chars else text
        preview = preview.replace("\n", " ")

        score = hit.get("score")
        doc_id = hit.get("doc_id")
        chunk_id = hit.get("chunk_id")

        print(f"#{i} score={score} doc_id={doc_id} chunk_id={chunk_id}\n{preview}\n")


# TODO: call print_hits on query results

## Self-check

- Can you rerun ingest/query and get consistent output formatting?
- Do you log the query text and `top_k`?
- Does every hit include enough metadata to trace it back to a source (`doc_id`, optional path/url/section)?
- If retrieval is empty, do you have enough info to debug ingestion vs query settings?