# Gen-AI — Assessment

This assessment aligns with notebooks in `Gen-AI/Streamlit/` (RAG, embeddings, chatbots, translator, resume-parser, multimodal, etc.). Focus: concepts of embeddings, similarity search, chunking, simple retrieval, prompt design basics, and evaluation.

Total questions: 25 (10 Theory, 8 Fill-in-the-Blanks, 7 Coding). Difficulty mix: 40% easy, 40% medium, 20% hard.


## Instructions
- Answer all questions.
- Coding tasks avoid heavy dependencies; we use NumPy and pure Python to simulate retrieval.
- Solutions at the bottom.


## References
- All notebooks under `Gen-AI/Streamlit/` (e.g., `streamlit-rag.ipynb`, `streamlit-embeddings.ipynb`, `streamlit-chatbot.ipynb`, `streamlit-multimodal.ipynb`)


## Part A — Theory (10)
1. Define an embedding. Why are embeddings useful for retrieval?
2. MCQ: Which similarity is commonly used with normalized embeddings? (a) L1 distance (b) cosine similarity (c) Jaccard distance (d) Hamming distance
3. Explain chunking in RAG and its tradeoff between chunk size and recall/precision.
4. What is a vector store? Mention two examples of indexes or libraries.
5. MCQ: Which mitigates prompt injection best? (a) ignore it (b) restrict tool scope and sanitize inputs (c) longer prompts (d) higher temperature
6. Describe the retrieval step in RAG and how re-ranking can improve results.
7. What is the difference between system, user, and assistant messages in chat prompting?
8. Explain context window limits and strategies to fit more context.
9. MCQ: Which helps reduce hallucinations? (a) increase temperature (b) decrease top_p (c) ground answers with retrieved context and cite sources (d) no change
10. What are evaluation strategies for RAG beyond accuracy? Name two.


## Part B — Fill in the Blanks (8)
1. Cosine similarity ranges from ______ to ______ for real-valued vectors.
2. Normalizing vectors to unit length is called L2 __________.
3. Splitting documents into smaller parts is called __________.
4. In RAG, the generator is conditioned on retrieved __________.
5. A hybrid retriever combines sparse and __________ retrieval.
6. To prevent long-context costs, we can do __________ re-ranking.
7. Prompt templates often contain named __________ that are filled at runtime.
8. Storing embeddings and metadata enables filtered __________.


## Part C — Coding Tasks (7)
Implement the functions with NumPy and Python. Run asserts.

Tasks:
1. `l2_normalize(X)` — row-wise L2 normalize a 2D array.
2. `cosine_sim_matrix(A, B)` — return cosine similarity matrix [len(A), len(B)].
3. `simple_chunk(text, max_len, overlap)` — split text into chunks of up to `max_len` with token overlap (approximate by characters).
4. `top_k_sim(query_vec, doc_vecs, k)` — return indices of top-k most similar documents by cosine.
5. `retrieve(query, docs, embed_fn, k)` — embed query and docs using `embed_fn(text)->vec`; return top-k docs by cosine.
6. `rerank(query, candidates, score_fn)` — given initial candidate docs, return them sorted by `score_fn(query, doc)` descending.
7. `fill_template(tpl, values)` — replace `{name}` placeholders in `tpl` with `values[name]`; leave unknown placeholders intact.


In [None]:
import numpy as np
import re

def l2_normalize(X: np.ndarray) -> np.ndarray:
    X = np.asarray(X, float)
    norms = np.linalg.norm(X, axis=1, keepdims=True)
    norms = np.where(norms==0, 1.0, norms)
    return X / norms

def cosine_sim_matrix(A: np.ndarray, B: np.ndarray) -> np.ndarray:
    A = l2_normalize(A)
    B = l2_normalize(B)
    return A @ B.T

def simple_chunk(text: str, max_len: int, overlap: int) -> list:
    chunks = []
    i = 0
    n = len(text)
    step = max_len - overlap
    step = max(1, step)
    while i < n:
        chunks.append(text[i:i+max_len])
        i += step
    return chunks

def top_k_sim(query_vec: np.ndarray, doc_vecs: np.ndarray, k: int) -> list:
    sims = cosine_sim_matrix(query_vec[None, :], doc_vecs)[0]
    idx = np.argpartition(-sims, k-1)[:k]
    return list(idx[np.argsort(-sims[idx])])

def retrieve(query: str, docs: list, embed_fn, k: int = 3) -> list:
    qv = embed_fn(query)
    dv = np.vstack([embed_fn(d) for d in docs])
    top = top_k_sim(qv, dv, min(k, len(docs)))
    return [docs[i] for i in top]

def rerank(query: str, candidates: list, score_fn) -> list:
    scored = [(doc, score_fn(query, doc)) for doc in candidates]
    scored.sort(key=lambda x: x[1], reverse=True)
    return [d for d,_ in scored]

def fill_template(tpl: str, values: dict) -> str:
    def repl(m):
        key = m.group(1)
        return str(values.get(key, m.group(0)))
    return re.sub(r"\{([a-zA-Z_][a-zA-Z0-9_]*)\}", repl, tpl)


In [None]:
# Asserts
X = np.array([[3,4],[0,0]], float)
LX = l2_normalize(X)
assert np.allclose(np.linalg.norm(LX[0]), 1.0)
assert np.allclose(LX[1], [0,0])

A = np.array([[1,0],[0,1]], float)
B = np.array([[1,0],[1,1]], float)
S = cosine_sim_matrix(A,B)
assert S.shape == (2,2)
assert S[0,0] > S[0,1]

txt = "abcdefghij"
chs = simple_chunk(txt, 4, 2)
assert chs[0] == 'abcd' and chs[1].startswith('cd')

q = np.array([1.0,0.0])
D = np.array([[1.0,0.0],[0.0,1.0],[0.7,0.7]])
top = top_k_sim(q, D, 2)
assert top[0] == 0

def toy_embed(s: str):
    # simple 2D: [count of vowels, count of consonants]
    v = sum(c.lower() in 'aeiou' for c in s)
    c = sum(c.isalpha() and c.lower() not in 'aeiou' for c in s)
    return np.array([v, c], float)

docs = ['alpha', 'beta', 'queue']
ret = retrieve('aei', docs, toy_embed, 2)
assert len(ret) == 2

rer = rerank('z', docs, lambda q,d: -abs(len(d)-3))
assert isinstance(rer, list)

out = fill_template('Hello {name}, today is {day}.', {'name': 'Sam'})
assert out.startswith('Hello Sam') and '{day}' in out

print('Gen-AI asserts passed ✅')


## Solutions

### Theory (sample)
1. Embedding: numeric vector capturing semantics; enables similarity search.
2. (b) cosine similarity
3. Chunking splits documents; larger chunks improve precision but can reduce recall and fit; smaller chunks increase recall but may lose context.
4. Vector store indexes embeddings and metadata; examples: FAISS, Annoy, Milvus, Elasticsearch (dense vectors).
5. (b) Principle of least privilege; sanitize input; constrain tools.
6. Retrieve top-k then rerank by relevance model or exact match.
7. System sets behavior, user asks, assistant answers.
8. Summarize, retrieve selectively, compress, use shorter templates.
9. (c)
10. Faithfulness, answer groundedness, citation correctness, latency, cost.

### Fill blanks
1. -1, 1
2. normalization
3. chunking
4. context
5. dense
6. candidate
7. placeholders
8. retrieval