# Lab 02 — GenAI Application Architecture Design

**Course:** A Practical Guide to Building a GenAI Application

**Duration:** 90–120 minutes

**Objectives:**
- Understand the components of a production GenAI application
- Design a modular, scalable architecture
- Learn patterns for RAG systems, agents, and hybrid architectures
- Create sequence diagrams, architecture blocks, and data flows
- Complete hands-on architecture design exercises


## 1 — Why Architecture Matters in GenAI

A good GenAI architecture ensures:

- Scalability
- Modular development
- Lower latency
- Cheaper inference
- Separation of concerns
- Easier debugging and logging
- Better retrieval accuracy (in RAG apps)

Your GenAI app is only as good as its architecture.


## 2 — Core Components

Below is the standard architecture breakdown (high level):

```
┌──────────────────────────────┐
│          Frontend            │
│  (Next.js, React, Flutter)   │
└───────────────┬──────────────┘
                │
┌───────────────▼──────────────┐
│           Backend API         │
│         (FastAPI, Flask)      │
└───────────────┬──────────────┘
                │
       ┌────────▼────────┐
       │  LLM Orchestration │
       │ (LangChain, LlamaIndex) │
       └─────────┬────────┘
                 │
     ┌───────────▼───────────┐
     │   Knowledge Layer      │
     │ (Vector DB, Document   │
     │  loaders, chunking)    │
     └───────────┬───────────┘
                 │
      ┌──────────▼──────────┐
      │     Storage Layer    │
      │ (Postgres, S3, Logs) │
      └──────────────────────┘
```


### Brief descriptions

- **Frontend:** Chat UI, document upload, streaming.
- **Backend API:** Ingestion, query endpoints, auth, telemetry.
- **Orchestration:** Chains, agents, reranking logic.
- **Knowledge layer:** Chunking, embeddings, vector DB.
- **Storage:** Long-term storage for documents, logs and metadata.


## 3 — RAG Architecture (Standard Flow)

```
User Question
      │
      ▼
┌─────────────┐
│  Backend    │
│  (FastAPI)  │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Query Router│
│ + Rewriter  │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Vector Store│
│ Retrieval   │
└──────┬──────┘
       │
       ▼
┌───────────────┐
│Reranker (opt.) │
└──────┬────────┘
       │
       ▼
┌───────────────────────┐
│LLM: Answer Generation  │
└────────────────────────┘
```


## 4 — Sequence Diagram for a Query

```
User → Frontend: submit question
Frontend → Backend: POST /query
Backend → Orchestrator: prepare_query()
Orchestrator → VectorDB: similarity_search()
VectorDB → Orchestrator: return chunks
Orchestrator → LLM: generate answer with context
LLM → Orchestrator: answer
Orchestrator → Backend: result
Backend → Frontend: stream answer to user
```


## 5 — Lab Exercise 1: Draw Your Own Architecture

**Task:** Design an architecture for a GenAI application that:
- Accepts PDF uploads
- Performs chunking
- Stores embeddings
- Answers user questions
- Tracks feedback

**Deliverable:** Use ASCII, pen & paper, or draw.io. Paste your ASCII diagram into the cell below.

**Hint:** Start with Frontend → Backend → Ingestion → Vector DB → Orchestrator → LLM → Frontend


In [1]:
# Paste your ASCII diagram as a triple-quoted string here
my_diagram = '''
Frontend -> Backend (FastAPI)
  Backend -> Ingestion Service (extract, chunk, embed)
  Ingestion -> VectorDB (Weaviate)
  User Query -> Backend -> Orchestrator -> VectorDB -> Reranker -> LLM -> Backend -> Frontend
'''
print(my_diagram)


Frontend -> Backend (FastAPI)
  Backend -> Ingestion Service (extract, chunk, embed)
  Ingestion -> VectorDB (Weaviate)
  User Query -> Backend -> Orchestrator -> VectorDB -> Reranker -> LLM -> Backend -> Frontend



## 6 — Lab Exercise 2: Identify Bottlenecks

Given this flawed architecture:

```
User → Backend → OpenAI → Pinecone → Backend → User
```

**Questions:**
1. What are the missing steps?
2. What risks exist?
3. What component should be added?
4. How do you improve latency?

Write your answers in the code cell below as a Python dict (for easy grading).


In [2]:
answers_ex2 = {
    'missing_steps': ['chunking', 'embeddings', 'retrieval', 'context_building', 'reranking'],
    'risks': ['hallucination','no_source_grounding','slow_performance','higher_costs'],
    'components_to_add': ['chunker','embedding_service','vector_db','reranker','orchestrator'],
    'latency_improvements': ['caching','smaller_models','batching','pre-warm','reduce_context_size']
}
answers_ex2

{'missing_steps': ['chunking',
  'embeddings',
  'retrieval',
  'context_building',
  'reranking'],
 'risks': ['hallucination',
  'no_source_grounding',
  'slow_performance',
  'higher_costs'],
 'components_to_add': ['chunker',
  'embedding_service',
  'vector_db',
  'reranker',
  'orchestrator'],
 'latency_improvements': ['caching',
  'smaller_models',
  'batching',
  'pre-warm',
  'reduce_context_size']}

## 7 — Lab Exercise 3: Build Architecture From Requirements (NaijaLawBot)

**Scenario:** NaijaLawBot must:
- Accept Nigerian laws (PDF/DOCX)
- Extract text and chunk
- Store embeddings in vector DB
- LLM must cite sources
- Frontend shows highlighted source text
- Support offline local Llama models with fallback
- Admin dashboard (ingestion logs, usage, hallucination reports)

**Task:** Describe the full system architecture. Paste a free-text description + ASCII diagram in the cell below.


In [None]:
naijalawbot_arch = '''
User -> Frontend (Next.js)
Frontend -> API Gateway (FastAPI)
API -> Ingestion Service: (PDF/DOCX -> text extraction -> clean -> chunk)
Chunks -> Embedding Service (OpenAI or local SBERT)
Embeddings -> VectorDB (Weaviate) + Metadata in Postgres
Query Path: User Query -> Query Router -> Retriever (hybrid) -> Reranker -> LLM (OpenAI primary, Local Llama fallback)
LLM -> Citation Generator -> Response Builder -> Frontend (highlights + links to source chunks)
Admin: Logs -> Postgres / ELK, Monitoring -> Prometheus+Grafana, Dashboard -> Admin UI
'''
print(naijalawbot_arch)

## 8 — Bonus: Multi-Agent Architecture

**Design an agent workflow for market research report generation.**

Example flow:
```
User Query → Controller Agent → Research Agent → Summarizer Agent → Compliance Agent → Finalizer Agent → User
```

Describe responsibilities of each agent in the cell below.


In [None]:
agents = {
    'Controller': 'Orchestrates agents, splits tasks, aggregates outputs',
    'Research': 'Fetches documents, queries vector DB, external web research',
    'Summarizer': 'Condenses findings into structured sections',
    'Compliance': 'Ensures regulatory and ethical checks, flags risky claims',
    'Finalizer': 'Formats report, creates citations, polish language'
}
agents

## 9 — Instructor Answers (Model Key)

Answers are provided below for instructors and self-checking. Students should attempt all exercises before reading.


### Exercise 1 — Example Architecture

```
Frontend (Next.js)
    ↓
Backend API (FastAPI)
    ↓
Document Processor
    - Text extractor
    - Chunker
    - Embeddings
    ↓
Vector DB (Weaviate or Pinecone)
    ↓
RAG Orchestrator (LangChain)
    ↓
LLM (OpenAI or Local Ollama)
    ↓
Feedback Store (Postgres)
```


### Exercise 2 — Bottlenecks & Fixes

**Missing steps:** chunking, embeddings, retrieval, reranking, context building.

**Risks:** hallucinations, no source grounding, incorrect answers, slow performance.

**Add components:** chunker, embedding service, vector DB, reranker, orchestrator.

**Latency improvements:** caching, using smaller/faster models, batching, pre-warming, reducing context size, async processing.


### Exercise 3 — NaijaLawBot (Suggested Architecture)

```
User (Web/Mobile)
   ↓
Frontend (Next.js)
   ↓
API Gateway / FastAPI
   ↓
Ingestion Service
   ↓  (PDF, DOCX)
Text Extraction
   ↓
Semantic Chunking
   ↓
Embeddings (OpenAI or Local)
   ↓
Vector DB (Weaviate)
———————— Query Path ——————————
User Query → Query Router → Retriever (Hybrid) → Reranker → LLM (OpenAI + Local fallback) → Citation Generator → Response Builder → Frontend
———————— Admin Tools ——————————
Logs → PostgreSQL
Monitoring → Prometheus/Grafana
Dashboard → Admin Panel
```


## 10 — Next Steps & Resources

- Convert this ASCII diagram to a visual diagram (draw.io, Miro)
- Implement a minimal FastAPI + vector DB integration (Lab 05 & Lab 03)
- Read: LangChain/RAG docs, Weaviate guides, LlamaIndex examples

---


