# AI Tutor App – Full Pipeline Notebook

This notebook demonstrates a full prototype workflow for the **AI-Tutor-App** project:

1. Text cleaning (general + math-aware)
2. Generate embeddings (sentence-transformers) with a deterministic fallback
3. Build a simple file-backed vector store
4. Perform semantic retrieval
5. Produce a placeholder generated explanation (RAG-style)

**Notes for reviewers:** this notebook uses deterministic fallbacks if `sentence-transformers` is unavailable so the retrieval flow can be reviewed without heavy dependencies.

## Setup

Before running cells that call the embedding model, install dependencies locally:

```
pip install -r ../requirements.txt
```

If `sentence-transformers` is not available, the notebook will fall back to deterministic vectors.

In [None]:
import os,json
import numpy as np
from preprocessing.text_cleaning import clean_text
from preprocessing.math_cleaning import normalize_math

def print_retrieved(docs):
    for i,(txt,score) in enumerate(docs,1):
        print(f"{i}. (score={score:.3f}) {txt[:300]}")

## Sample data (raw exam-style questions)

We simulate a small question bank for cleaning and embedding.

In [None]:
raw_texts=[
"Solve: 2×(3 + 4) = ?",
"Find the LCM of 12 and 18.",
"Differentiate: d/dx x^2.",
"What is the area of a circle with radius 7 cm?",
"Simplify: (x^2 - 1)/(x - 1)"
]
cleaned_texts=[clean_text(t) for t in raw_texts]
math_normalized_texts=[normalize_math(t) for t in raw_texts]
print("Cleaned samples:")
for ct in cleaned_texts:
    print('-',ct)

## Embedding generation

Use `sentence-transformers` if available, otherwise fall back to deterministic random vectors.

In [None]:
def try_load_sentence_transformer(model_name="sentence-transformers/all-MiniLM-L6-v2"):
    try:
        from sentence_transformers import SentenceTransformer
        return SentenceTransformer(model_name)
    except Exception as e:
        print("Warning: sentence-transformers not available:",e)
        return None

def embed_texts_safe(texts,model=None,dim_fallback=384,seed=42):
    if model is not None:
        try:
            return np.array(model.encode(texts,show_progress_bar=False))
        except Exception as e:
            print("Model failed, falling back:",e)
    rng=np.random.default_rng(seed)
    return rng.standard_normal((len(texts),dim_fallback))

_model=try_load_sentence_transformer()
embeddings=embed_texts_safe(cleaned_texts,model=_model)
print("Embeddings shape:",embeddings.shape)

## Save embeddings + vector store

Store vectors as `.npy` and texts as JSON.

In [None]:
def save_vector_store(vectors,texts,base_path="./data/metadata/vector_store"):
    os.makedirs(os.path.dirname(base_path),exist_ok=True)
    np.save(base_path+".npy",np.array(vectors))
    with open(base_path+"_texts.json","w",encoding="utf-8") as f:
        json.dump(list(texts),f,indent=2)

def load_vector_store(base_path="./data/metadata/vector_store"):
    vectors=np.load(base_path+".npy")
    with open(base_path+"_texts.json","r",encoding="utf-8") as f:
        texts=json.load(f)
    return np.array(vectors),texts

store_path="./data/metadata/vector_store"
save_vector_store(embeddings,cleaned_texts,store_path)
vectors,texts=load_vector_store(store_path)
print("Loaded vectors:",vectors.shape,"Loaded texts:",len(texts))

## Retriever (Cosine Similarity)

In [None]:
def cosine_similarity_vec(a,b):
    a=np.asarray(a);b=np.asarray(b)
    denom=(np.linalg.norm(a)*np.linalg.norm(b))+1e-12
    return float(np.dot(a,b)/denom)

class SimpleRetriever:
    def __init__(self,vectors,texts):
        self.vectors=np.asarray(vectors)
        self.texts=list(texts)
    def retrieve(self,query_vec,top_k=5):
        sims=np.dot(self.vectors,query_vec)/(np.linalg.norm(self.vectors,axis=1)*(np.linalg.norm(query_vec)+1e-12))
        idxs=np.argsort(-sims)[:top_k]
        return [(self.texts[int(i)],float(sims[int(i)])) for i in idxs]

retriever=SimpleRetriever(vectors,texts)
query="What is the least common multiple of 12 and 18?"
q_emb=embed_texts_safe([query],model=_model)[0]
docs=retriever.retrieve(q_emb,top_k=3)
print("Top retrieved:")
print_retrieved(docs)

## Generation (placeholder)

This notebook uses a placeholder generator that concatenates retrieved contexts and returns a short explanation. Replace with an LLM call for production.

In [None]:
def generate_placeholder_explanation(query,retrieved_docs):
    context="\n\n".join([f"- {d[:300]}" for d,_ in retrieved_docs])
    explanation=f"Context:\n{context}\n\nExplanation:\n(This is a placeholder. Replace with LLM output for production.)"
    return explanation

print(generate_placeholder_explanation(query,docs))

## Evaluation notes

For retrieval: use Recall@k and MRR. For generation quality: human eval or BLEU/ROUGE when reference answers exist.

In [None]:
try:
    from evaluation.metrics import recall_at_k,retrieval_mrr
    ex_true=[1,0,2]
    ex_rank=[[1,2,0],[0,2,1],[2,0,1]]
    print("Recall@2 demo:",recall_at_k(ex_true,ex_rank,k=2))
    print("MRR demo:",retrieval_mrr(ex_true,ex_rank))
except Exception as e:
    print("Evaluation import failed:",e)

## Command-line demo script note

You can run `run_demo.py` from the repo root to reproduce the flow.

In [None]:
demo_code='''from embeddings.generate_embeddings import load_model, embed_texts
from embeddings.vector_store import save_vectors, load_vectors
from rag.retriever import Retriever
from rag.generator import generate_explanation
from preprocessing.text_cleaning import clean_text
from preprocessing.math_cleaning import normalize_math

texts=["Solve: 2x + 5 = 11.","Find the LCM of 12 and 18."]
model=None
try:
    model=load_model()
    embs=embed_texts([clean_text(t) for t in texts],model)
except:
    import numpy as _np
    rng=_np.random.default_rng(42)
    embs=rng.standard_normal((len(texts),384))

save_vectors(embs,[clean_text(t) for t in texts],'./data/metadata/vector_store')
print("Run demo complete.")'''
open('../run_demo.py','w',encoding='utf-8').write(demo_code)
print("Wrote ../run_demo.py")

## Closing notes & citation

```bibtex
@misc{jimogbolo2025aitutor,
  title={AI-Tutor-App: An Intelligent Tutoring System for WAEC/NECO/JAMB Exams},
  author={Nitoni Jim-Ogbolo},
  year={2025},
  url={https://github.com/nitoni-jim/ai-tutor-app}
}
```