ChatShop

Conversational shopping assistant powered by a RAG pipeline. User describes what they want → semantic search retrieves relevant products → LLM generates a recommendation.

Architecture

INGESTION (one-shot: scripts/ingest.py)
  data/raw/amazon_products.csv
    → data.loader      (chunked pd.read_csv)
    → data.cleaner     (normalize, strip HTML, dedupe → list[Product])
    → embeddings.embedder  (batch encode title+description → vectors)
    → vectorstore.chroma   (upsert to chroma_db/, persisted)

QUERY (per user message)
  Gradio chat input
    → rag.chain.stream(query)
        → rag.retriever.retrieve(query)   (embed → chroma.query → top-5 Products)
        → rag.prompt.build_user_message() (format products into context block)
        → litellm.completion(stream=True) (system prompt + context + query → LLM)
    → Gradio streams tokens back to UI

Setup

Requirements: Python 3.12, uv

# 1. Install dependencies
uv sync

# 2. Configure environment
cp .env.example .env
# Edit .env — set LITELLM_API_KEY (OpenAI key) or switch to Ollama

# 3. Download dataset
#    Kaggle: Amazon Products Dataset 2023 → Electronics category
#    Place CSV at: data/raw/amazon_products.csv

# 4. Run ingestion (one-time, ~10-30 min on CPU)
uv run python scripts/ingest.py

# 5. Launch the app
uv run python main.py

Development

# Install all extras
uv sync --extra dev --extra test --extra lint

# Run tests
uv run pytest --cov=chatshop tests/

# Inspect ChromaDB
uv run python scripts/inspect_chroma.py --query "wireless headphones" --top-k 5

Config

All settings are read from .env (see .env.example):

Key	Default	Description
`LITELLM_MODEL`	`gpt-4o-mini`	LLM model (any LiteLLM-supported string)
`LITELLM_API_KEY`	—	API key for the LLM provider
`EMBEDDING_MODEL`	`all-MiniLM-L6-v2`	Sentence-transformers model (local)
`CHROMA_PERSIST_DIR`	`chroma_db`	ChromaDB storage path
`TOP_K_RESULTS`	`5`	Products retrieved per query

To use a local Ollama model instead of OpenAI, set:

LITELLM_MODEL=ollama/llama3.2
LITELLM_API_KEY=

Project Structure

src/chatshop/
  config.py          ← pydantic-settings, all env vars
  data/
    models.py        ← Product pydantic model
    loader.py        ← chunked CSV reader
    cleaner.py       ← normalize, filter, dedupe
  embeddings/
    embedder.py      ← sentence-transformers wrapper
  vectorstore/
    chroma.py        ← ChromaDB upsert + query
  rag/
    retriever.py     ← embed query → chroma → Products
    prompt.py        ← system prompt + context builder
    chain.py         ← RAGChain: run() + stream()
  ui/
    gradio_app.py    ← Gradio Blocks, streaming chat

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.claude		.claude
documentation		documentation
notebooks		notebooks
scripts		scripts
src/chatshop		src/chatshop
tests		tests
.coverage		.coverage
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatShop

Architecture

Setup

Development

Config

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatShop

Architecture

Setup

Development

Config

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages