## Week 9 Lab Manual
### Foundations of Deep Learning & AI Functionality

**Instructor Note**: This lab manual provides the aim, code, and explanation for each practical task. Focus on the architectural patterns and the transition from theoretical concepts to functional AI implementations.

---

# Week 9: Model Specialization, Production & Capstone Project
## From Notebooks to Production-Ready AI Systems

###  Weekly Table of Contents
1. [Specialized Dataset Lab](#-Lab-9.1:-Specialized-Dataset-Lab)
2. [Production Streaming API](#-Lab-9.2:-Production-Streaming-API)
3. [Capstone Project: The Autonomous Multi-Modal Researcher](#-Lab-9.3:-Capstone-Project--The-Autonomous-Multi-Modal-Researcher)
- Synthetic Data Generation for Fine-Tuning
- FastAPI Integration with Server-Sent Events (SSE)
- Real-time Performance Optimization

###  Learning Objectives
This final week focuses on transitioning from a "working prototype" to a "production system." You will cover:
1.  **Advanced Model Tuning**: Deep dive into PEFT, LoRA, and the logic of Synthetic Data Generation.
2.  **Streaming & UI Optimization**: Implementing real-time feedback loops for better user experience.
3.  **Production Architectures**: Deploying LLM logic via **FastAPI** with background tasks and streaming.
4.  **Capstone Integration**: Building a unified "Autonomous Researcher" agent.

---

###  9.1 Theory: Fine-Tuning vs. RAG (The Final Verdict)
By now, you've seen RAG (Week 5) and Prompt Engineering (Week 4). Why bother with Fine-Tuning?

| Feature | Prompting / RAG | Fine-Tuning |
| :--- | :--- | :--- |
| **Knowledge Update** | Easy (Update Vector DB) | Hard (Requires retraining) |
| **New Task Adaptation** | Good | Excellent |
| **Cost to Latency** | High (Long context) | Low (Shorter prompts) |
| **Steerability** | Variable | Very High (Consistent tone/format) |

**Modern Convergence**: Most production systems use a **Hybrid Approach**:
- Fine-tune for **style, format, and reasoning**.
- RAG for **up-to-date facts and data**.

##  Lab 9.1: Specialized Dataset Lab
**Aim**: To generate a high-quality, synthetic training dataset using a "Teacher" model (Gemini 1.5 Flash) to prepare a smaller "Student" model (Gemma 2) for specialized tasks through fine-tuning.

**Explanation**:
This lab demonstrates the first step in the fine-tuning pipeline:
1.  **Structured Generation**: We use `Pydantic` to ensure the teacher model returns perfectly formatted JSON training pairs.
2.  **Specialization**: By providing industry-specific instructions (e.g., Legal Contract Analysis), we generate data that captures professional jargon and complex risk assessment patterns.
3.  **JSONL Export**: The output is saved in `.jsonl` format, the standard for training modern LLMs via techniques like LoRA.

In [None]:
# ðŸ“¦ WEEK 9 INITIALIZATION
import os
import json
import asyncio
from typing import List
from pydantic import BaseModel, Field
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

# Standardized Model Definitions
MODEL = "gemini-1.5-flash"
LOCAL_MODEL = "gemma2:2b"

# API Clients
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

# Initialize Teacher Model
llm = ChatGoogleGenerativeAI(model=MODEL, temperature=0.7)

# --- Lab 9.1: Specialized Dataset Lab ---

# Define the structure of our training data
class TrainingExample(BaseModel):
    instruction: str = Field(description="The user query or prompt")
    response: str = Field(description="The ideal expert assistant response")

class Dataset(BaseModel):
    examples: List[TrainingExample]

parser = JsonOutputParser(pydantic_object=Dataset)

# Generation Prompt
prompt = PromptTemplate(
    template="Generate 5 diverse and high-quality training examples for a Legal Contract Analyzer.\n{format_instructions}\nEnsure the language is professional, contains legal terminology, and focuses on risk identification.",
    input_variables=[],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser

print("ðŸš€ Generating synthetic legal dataset...")
# Note: Limit to 5 for the lab demonstration speed
result = chain.invoke({})

# Export to JSONL for Fine-Tuning
output_filename = "legal_training_data.jsonl"
with open(output_filename, "w") as f:
    for ex in result["examples"]:
        # Standard Alpaca/Gemma format: {"instruction": "...", "response": "..."}
        f.write(json.dumps(ex) + "\n")

print(f"âœ… Successfully created {len(result['examples'])} training pairs in {output_filename}")
print("\n--- Preview of Example 1 ---")
print(f"Q: {result['examples'][0]['instruction'][:80]}...")
print(f"A: {result['examples'][0]['response'][:80]}...")


##  Lab 9.2: Production Streaming API
**Aim**: To build a production-ready, asynchronous API service using FastAPI that supports token-by-token streaming for real-time user experiences.

**Explanation**:
This lab focuses on the engineering requirements of deployment:
1.  **FastAPI Integration**: We create a RESTful endpoint that accepts user prompts and returns a `StreamingResponse`.
2.  **SSE (Server-Sent Events)**: The backend uses Python generators to yield tokens as they arrive from the LLM, keeping the HTTP connection open until the response is complete.
3.  **Latency Optimization**: By streaming results, we minimize the "Time To First Token" (TTFT), significantly improving the perceived speed of the application.

In [None]:
# --- Lab 9.2: Production Streaming API ---
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import ollama
import uvicorn
from langchain_google_genai import ChatGoogleGenerativeAI

app = FastAPI()

class ChatRequest(BaseModel):
    prompt: str
    stream: bool = True
    provider: str = "cloud" # "cloud" or "local"

async def generate_gemini_stream(prompt):
    # Using the standardized Gemini 1.5 Flash
    llm = ChatGoogleGenerativeAI(model=MODEL) 
    async for chunk in llm.astream(prompt):
        yield f"data: {chunk.content}\n\n"
    yield "data: [DONE]\n\n"

async def generate_ollama_stream(prompt):
    # Using the standardized local model Gemma 2
    stream = ollama.generate(model=LOCAL_MODEL, prompt=prompt, stream=True)
    for chunk in stream:
        if 'response' in chunk:
            yield f"data: {chunk['response']}\n\n"
    yield "data: [DONE]\n\n"

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    """
    Production-ready endpoint for real-time LLM interaction
    """
    if request.provider == "cloud":
        return StreamingResponse(generate_gemini_stream(request.prompt), media_type="text/event-stream")
    else:
        return StreamingResponse(generate_ollama_stream(request.prompt), media_type="text/event-stream")

print("ðŸ”Œ Streaming API Service Defined.")
print(f"Cloud Model: {MODEL}")
print(f"Local Model: {LOCAL_MODEL}")
print("\nInstructions to run this service:")
print("1. Save this code to a file named 'main.py'")
print("2. Run 'uvicorn main:app --reload' in your terminal")
print("3. Test with: curl -X POST http://localhost:8000/chat/stream -d '{\"prompt\": \"Hello\"}'")


##  Lab 9.3: Capstone Project  The Autonomous Multi-Modal Researcher

**Aim**: To architect and deploy a production-ready, agentic system that integrates RAG, LangGraph-based reasoning, and multi-modal fallback capabilities.

**Explanation**:
This capstone is the culmination of the curriculum, requiring the integration of:
1. **Dynamic Ingestion**: Automated monitoring and parsing of document folders.
2. **Agentic Reasoning**: A LangGraph state machine that decides between retrieval and web research.
3. **Production APIs**: Deployment via FastAPI and Gradio with real-time thought visualization.
4. **Automated Evaluation**: A separate model acting as a judge to verify the accuracy and completeness of generated reports.

---

##  Instructor's Evaluation & Lab Summary

###  Assessment Criteria
1. **Technical Implementation**: Adherence to the lab objectives and code functionality.
2. **Logic & Reasoning**: Clarity in the explanation of the underlying AI principles.
3. **Best Practices**: Use of secure environment variables and structured prompts.

**Lab Completion Status: Verified**
**Focus Area**: Language Modelling & Deep Learning Systems.