# Exploring the Capabilities of LLM Models

In this notebook, I aim to evaluate and compare the capabilities of two large language models (LLMs):

1. **Codestral22B**
   A state-of-the-art model designed for advanced code generation and natural language understanding tasks.

2. **Llama 3.1-8B**
   A highly efficient and compact model optimized for general-purpose language tasks with an 8-billion parameter architecture.

The goal is to analyze their performance across various tasks, including but not limited to:

- Code generation and completion
- Natural language understanding
- Contextual reasoning
- Problem-solving capabilities

This comparison will help identify the strengths and weaknesses of each model and provide insights into their practical applications.

# AI-Powered Programming Tutor with RAG

This project focuses on building an AI-powered programming tutor designed to assist students in understanding code and solving problems. The tutor leverages **Retrieval-Augmented Generation (RAG)** to provide accurate and personalized explanations grounded in real university materials, such as:

- Past assignments
- Lecture notes
- Tutorials`

The system integrates two large language models (LLMs), **Codestral22B** and **Llama 3.1-8B**, to evaluate their performance in generating solutions and explanations for programming-related queries. The goal is to determine which model provides better support for students in a university setting.

---

## Key Features

- **Personalized Explanations**: Tailored responses based on retrieved university materials.
- **Code Understanding**: Helps students debug and understand code snippets.
- **Problem Solving**: Provides step-by-step solutions to programming problems.
- **Model Comparison**: Evaluates the performance of Codestral22B and Llama 3.1-8B.

---



### Basically, I will try to implement and test Codestral22B and Llama 3.1-8B

In [1]:
# from datetime import datetime
# import json
# from datetime import datetime
# from pprint import pprint
# from os.path import exists
#
# import requests
#
# #API endpoint exposed in Lm studio
# url = "http://localhost:1234/v1/chat/completions"
#
# #model ID
# model_id = "meta-llama-3.1-8b-instruct"
#
# headers={
#     "Content-Type" : "application/json",
#     "Authorization" :"Bearer lm-studio" #Dummy API key
# }
#
# # messages: [ #keep conversation history
# #                 {"role":"user", #what you type, only sends current prompt
# #                  "content":user_input}
# #             ]
#
# #Keep the message history
#
# #History file path, to keep conversation
# history_file = "chat-history.json"
# def save_history(messages):
#     # Load existing history if the file exists
#     if exists(history_file):
#         with open(history_file, "r", encoding="utf-8") as f:
#             full_history = json.load(f)
#             if isinstance(full_history, list):
#                 pass
#             else:
#                 full_history = [full_history]
#     else:
#         full_history = []
#
#     # Add this session with timestamp
#     full_history.append({
#         "timestamp": datetime.now().isoformat(),
#         "conversation": messages
#     })
#
#     # Save the full conversation list
#     with open(history_file, "w", encoding="utf-8") as f:
#         json.dump(full_history, f, indent=4, ensure_ascii=False)
#
#
# #Prompt loop
# def chat():
#     print(" Talk to LLaMA 3.1 (type 'exit' to quit)\n")
#     messages = [
#     {"role": "system", #Sets the intial behavior, the text below
#      "content": "You are a helpful programming tutor."}
# ] #Messages reset each time
#
#     while True:
#         user_input = input(" You: ")
#         if user_input.lower() == "exit":
#             #save chat history
#             save_history(messages)
#             print(f"\n Conversation saved to {history_file}")
#             break
#
#         # Add user message
#         messages.append({"role": "user",
#                          "content": user_input})
#
#         payload = {
#             "model": model_id,#id of model
#             "messages": messages,#chat history to preserve context
#             "temperature": 0.7 #control creativiy
#         }
#         print("Your question is: ")
#         print(user_input)
#         print("\n")
#
#         try:#send request to lm api
#             response = requests.post(url, headers=headers, json=payload, timeout=60)
#
#             if response.status_code == 200:
#                 data = response.json()
#                 answer = data['choices'][0]['message']['content'].strip()
#
#                 # Add assistant message
#                 messages.append({"role": "assistant", "content": answer})
#
#                 print("\n LLaMA:", flush=True)
#                 print(answer, flush=True)
#                 print("-" * 60 + "\n")
#
#             else:
#                 print(f" Error {response.status_code}: {response.text}\n")
#
#         except requests.exceptions.RequestException as e:
#             print(" Connection error:", e)
#             break


In [2]:
# chat()

# Step 1: Load the Datasets
## 1.  OpenMathInstruct-1 (from Hugging Face)
- This dataset contains 1.8 million math problem-solution pairs, making it ideal for enhancing mathematical reasoning in LLMs.

In [3]:
# from datasets import load_dataset
# from IPython import get_ipython
# from IPython.display import display
#
# #Load training split
# dataset = load_dataset("nvidia/OpenMathInstruct-1", split="train")
#
# first_element = next(iter(dataset))
#
# print(first_element)

#Give up on it, waaaay to much data in dataset

# Small & Clean Math Datasets
## 1. GSM8K
- Size: ~8.5K problems

- Focus: Grade school math word problems

- Good for: step-by-step reasoning, small LLM finetuning

In [4]:
from datasets import load_dataset
dataset = load_dataset("gsm8k", "main", split="train")
first_element = next(iter(dataset))

print(first_element)

{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}


# 2. Computer Science Theory QA Dataset (from Kaggle)
- This dataset offers a comprehensive collection of theoretical computer science questions, suitable for training chatbots and QA systems.

In [5]:
import pandas as pd
import json

with open("intents.json", "r") as f:
    intents_data = json.load(f)

# Convert to DataFrame if needed
df = pd.json_normalize(intents_data["intents"])
print(df.head())

             tag                                           patterns  \
0    abstraction  [Explain data abstraction., What is data abstr...   
1          error  [What is a syntax error, Explain syntax error,...   
2  documentation  [Explain program documentation. Why is it impo...   
3        testing                        [What is software testing?]   
4  datastructure             [How do you explain a data structure?]   

                                           responses  
0  [Data abstraction is a technique used in compu...  
1  [A syntax error is an error in the structure o...  
2  [Program documentation is written information ...  
3  [Software testing is the process of evaluating...  
4  [A data structure is a way of organizing and s...  


In [6]:
from datasets import load_dataset

ds = load_dataset("google-research-datasets/mbpp", "sanitized")

# 1. Install & Import Dependencies
We’ll need:

-  Transformers & Datasets

-  LangChain & an embedding backend (here HuggingFaceEmbeddings)

-  FAISS for the vector index

-  Accelerate + PEFT if you plan to fine-tune your generator

In [7]:
# !pip install \
#  transformers datasets faiss-cpu \
#  langchain sentence-transformers \
#  accelerate peft evaluate

In [8]:
#!pip install --upgrade peft

In [9]:
import os
import torch
import numpy as np

from datasets import load_dataset, concatenate_datasets
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    RagTokenizer,
    RagRetriever,
    RagSequenceForGeneration,
    DataCollatorForSeq2Seq,
    Trainer,
    TrainingArguments,
)
from peft import LoraConfig, get_peft_model

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain import HuggingFacePipeline


# 2. Configuration
Centralize all paths, model names, and hyperparameters.

In [10]:
# ── Paths & names ─────────────────────────────────────────────────────────────
OUTPUT_DIR         = "results/rag-llama"
FAISS_INDEX_PATH   = os.path.join(OUTPUT_DIR, "faiss_index")
DOCS_PATH          = os.path.join(OUTPUT_DIR, "docs.jsonl")

# ── Hugging Face models ───────────────────────────────────────────────────────
GEN_MODEL_NAME     = "meta-llama/Llama-3.1-8b"
EMBED_MODEL_NAME   = "sentence-transformers/all-MiniLM-L6-v2"

# ── Datasets ─────────────────────────────────────────────────────────────────
MBPP_ID            = "google-research-datasets/mbpp"
MBPP_CFG           = "sanitized"
GSM8K_ID           = "gsm8k"
GSM8K_SPLIT        = "train"

# ── RAG / Retrieval params ────────────────────────────────────────────────────
CHUNK_SIZE         = 1000
CHUNK_OVERLAP      = 200

# ── LoRA fine-tuning (optional) ───────────────────────────────────────────────
LORA_R             = 16
LORA_ALPHA         = 32
LORA_DROPOUT       = 0.05

# ── Trainer hyperparameters (for fine-tuning generator) ──────────────────────
NUM_EPOCHS         = 3
TRAIN_BS           = 2
EVAL_BS            = 2
GRAD_ACCUM_STEPS   = 8
LEARNING_RATE      = 2e-4


# 3. Load & Merge Datasets
Load your local Q&A (if any), plus MBPP (test split) and GSM8K train. Then standardize to a single list of “documents” with id and text.

In [11]:
from datasets import load_dataset, concatenate_datasets

# 1) Chat‐history (no built‐in validation split here, only “train”):
raw_chat = load_dataset(
    "json",
    data_files={"train": "chat-history.json"}
)
def format_chat_batch(batch):
    inps, tgts = [], []
    for conv in batch["conversation"]:
        # conv is a list of {role,content} dicts
        user = [t["content"] for t in conv if t["role"]=="user"]
        asst = [t["content"] for t in conv if t["role"]=="assistant"]
        inps.append(" ".join(user))
        tgts.append(" ".join(asst))
    return {"input_text": inps, "target_text": tgts}

chat_ds = raw_chat["train"].map(
    format_chat_batch,
    batched=True,
    remove_columns=["timestamp","conversation"]
)

# 2) Intents.json
raw_intents = load_dataset(
    "json",
    data_files={"train": "intents.json"}
)
def format_intents_batch(batch):
    # assume batch["intents"] is a list-of-lists of intent dicts
    inps, tgts = [], []
    for intents_list in batch["intents"]:
        for intent in intents_list:
            for pat in intent["patterns"]:
                inps.append(pat)
                tgts.append(intent["responses"][0])
    return {"input_text": inps, "target_text": tgts}

intents_ds = raw_intents["train"].map(
    format_intents_batch,
    batched=True,
    remove_columns=["intents"]
)

# 3) MBPP “sanitized” (splits: validation & prompt)
mbpp = load_dataset("google-research-datasets/mbpp", "sanitized")
def format_mbpp_batch(batch):
    inps, tgts = [], []
    for p, c in zip(batch["prompt"], batch["code"]):
        inps.append(p)
        tgts.append(f"```python\n{c}\n```")
    return {"input_text": inps, "target_text": tgts}

# concatenate both splits
mbpp_ds = concatenate_datasets([
    mbpp["validation"].map(format_mbpp_batch, batched=True, remove_columns=mbpp["validation"].column_names),
    mbpp["prompt"].    map(format_mbpp_batch, batched=True, remove_columns=mbpp["prompt"].column_names),
])

# 4) GSM8K “main” train
gsm = load_dataset("gsm8k", "main", split="train")
def format_gsm_batch(batch):
    inps = ["Problem:\n"+q for q in batch["question"]]
    tgts = ["Answer:\n"+a   for a in batch["answer"]]
    return {"input_text": inps, "target_text": tgts}

gsm_ds = gsm.map(
    format_gsm_batch,
    batched=True,
    remove_columns=gsm.column_names
)

# 5) Combine all training sets
train_ds = concatenate_datasets([chat_ds, intents_ds, mbpp_ds, gsm_ds])
print("Total training examples:", len(train_ds))


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]

Total training examples: 7889


# 4. Build & Chunk the Retrieval Corpus
We’ll treat each training example as a “document” by concatenating input_text + target_text and splitting into overlapping chunks.

In [12]:
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 4.1 Concatenate input+target into a list of raw docs
raw_texts = [
    ex["input_text"] + "\n\n" + ex["target_text"]
    for ex in train_ds
]
metadatas = [
    {"source": f"doc-{i}"}
    for i in range(len(raw_texts))
]

# 4.2 Chunk long docs into 1 000-token windows with 200-token overlap
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

docs = []
for text, meta in zip(raw_texts, metadatas):
    for chunk in splitter.split_text(text):
        docs.append(Document(page_content=chunk, metadata=meta))

print(f"▶ Created {len(docs)} chunks from {len(raw_texts)} documents.")


▶ Created 8153 chunks from 7889 documents.


# 5. Embed & Build a FAISS Vector Index
Use a Sentence-Transformer to embed each chunk, then store in FAISS for fast nearest-neighbour lookup.

In [13]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# 5.1 Initialize your embedding model
EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL_NAME)

# 5.2 Create FAISS index from Document objects
vectorstore = FAISS.from_documents(docs, embedder)

# 5.3 (Optional) persist to disk for later reuse
INDEX_PATH = "results/faiss_index"
vectorstore.save_local(INDEX_PATH)
print(f"✔ FAISS index saved to '{INDEX_PATH}'.")


  embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL_NAME)


✔ FAISS index saved to 'results/faiss_index'.


# 6. Wire Up a LangChain RetrievalQA Pipeline
We now plug your FAISS store and the Meta-Llama-3.1-8b generator into a single retrieval-augmented chain.

In [14]:
# Option B: set it directly in your environment
import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "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"


In [15]:
# ── Cell: Load FAISS index & build retriever ────────────────────────────────────

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# Recreate your embedder exactly as when you built the index
embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Load your on-disk FAISS index (you trust its provenance)
vectorstore = FAISS.load_local(
    "results/faiss_index",
    embedder,
    allow_dangerous_deserialization=True
)

# Wrap as a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})


In [16]:
import os
from transformers import pipeline, BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM
from langchain import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# ── 0) Grab your token from env ────────────────────────────────────────────────
hf_token = os.environ.get("HUGGINGFACE_HUB_TOKEN")
if not hf_token:
    raise ValueError("Please set HUGGINGFACE_HUB_TOKEN in your environment before running this cell.")

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# ── 1) Reload FAISS index ──
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.load_local(
    "results/faiss_index",
    embedder,
    allow_dangerous_deserialization=True
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# ── 2) Connect to local LM Studio API ──────────────────────────────────────────
llm = ChatOpenAI(
    model_name="meta-llama-3.1-8b-instruct",  # Just for tracking, not actually used to load model
    openai_api_key="lm-studio",               # Dummy API key as used in your chat() function
    openai_api_base="http://localhost:1234/v1", # Your LM Studio API endpoint
    temperature=0.7,
    max_tokens=512
)

# ── 3) Build & run RetrievalQA ─────────────────────────────────────────────────
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",       # or "map_reduce" / "refine"
    retriever=retriever,
    return_source_documents=True,
)

# ── 4) Test query ──────────────────────────────────────────────────────────────
query = "How would you implement binary search in Python?"
result = qa_chain(query)
print("Answer:\n", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print("-", doc.metadata["source"])




  llm = ChatOpenAI(
  result = qa_chain(query)


Answer:
 Binary Search is a fast searching algorithm with an average and worst-case time complexity of O(log n). Here's how you can implement it in Python:

```python
def binary_search(arr, target):
    """
    Searches for the target element in the given sorted array using Binary Search.

    Args:
        arr (list): A sorted list of elements.
        target: The element to be searched.

    Returns:
        int: The index of the target element if found. -1 otherwise.
    """

    # Initialize the low and high pointers
    low = 0
    high = len(arr) - 1

    while low <= high:
        # Calculate the mid index
        mid = (low + high) // 2

        # If the target is found at the mid index, return it
        if arr[mid] == target:
            return mid

        # If the target is less than the mid element, move the high pointer to mid - 1
        elif arr[mid] > target:
            high = mid - 1

        # If the target is greater than the mid element, move the low pointer to mi

In [17]:
from datetime import datetime
import json
from datetime import datetime
from pprint import pprint
from os.path import exists

import requests

#API endpoint exposed in Lm studio
url = "http://localhost:1234/v1/chat/completions"

#model ID
model_id = "meta-llama-3.1-8b-instruct"

headers={
    "Content-Type" : "application/json",
    "Authorization" :"Bearer lm-studio" #Dummy API key
}

# messages: [ #keep conversation history
#                 {"role":"user", #what you type, only sends current prompt
#                  "content":user_input}
#             ]

#Keep the message history

#History file path, to keep conversation
history_file = "chat-history.json"
def save_history(messages):
    # Load existing history if the file exists
    if exists(history_file):
        with open(history_file, "r", encoding="utf-8") as f:
            full_history = json.load(f)
            if isinstance(full_history, list):
                pass
            else:
                full_history = [full_history]
    else:
        full_history = []

    # Add this session with timestamp
    full_history.append({
        "timestamp": datetime.now().isoformat(),
        "conversation": messages
    })

    # Save the full conversation list
    with open(history_file, "w", encoding="utf-8") as f:
        json.dump(full_history, f, indent=4, ensure_ascii=False)


#Prompt loop
def chat():
    print(" Talk to LLaMA 3.1 (type 'exit' to quit)\n")
    messages = [
    {"role": "system", #Sets the intial behavior, the text below
     "content": "You are a helpful programming tutor."}
] #Messages reset each time

    while True:
        user_input = input(" You: ")
        if user_input.lower() == "exit":
            #save chat history
            save_history(messages)
            print(f"\n Conversation saved to {history_file}")
            break

        # Add user message
        messages.append({"role": "user",
                         "content": user_input})

        payload = {
            "model": model_id,#id of model
            "messages": messages,#chat history to preserve context
            "temperature": 0.7 #control creativiy
        }
        print("Your question is: ")
        print(user_input)
        print("\n")

        try:#send request to lm api
            response = requests.post(url, headers=headers, json=payload, timeout=1000)

            if response.status_code == 200:
                data = response.json()
                answer = data['choices'][0]['message']['content'].strip()

                # Add assistant message
                messages.append({"role": "assistant", "content": answer})

                print("\n LLaMA:", flush=True)
                print(answer, flush=True)
                print("-" * 60 + "\n")

            else:
                print(f" Error {response.status_code}: {response.text}\n")

        except requests.exceptions.RequestException as e:
            print(" Connection error:", e)
            break


In [18]:
chat()

 Talk to LLaMA 3.1 (type 'exit' to quit)


 Conversation saved to chat-history.json


In [19]:
# # Search for Pythagorean theorem related questions in GSM8K
# pythagorean_questions = []
#
# # Load the dataset if not already loaded
# gsm = load_dataset("gsm8k", "main", split="train")
# #
# # Search for relevant keywords
# keywords = ["pythagora", "pythagorean", "right triangle", "hypotenuse", "a^2 + b^2"]
#
# for i, example in enumerate(gsm):
#     question = example["question"].lower()
#     for keyword in keywords:
#         if keyword.lower() in question:
#             pythagorean_questions.append({
#                 "index": i,
#                 "question": example["question"],
#                 "answer": example["answer"]
#             })
#             break
#
# # Print the number of matching questions
# print(f"Found {len(pythagorean_questions)} questions related to the Pythagorean theorem")
#
# # Display the first few matches if any exist
# for i, q in enumerate(pythagorean_questions[:3]):
#     print(f"\nQuestion {i+1}:")
#     print(q["question"])
#     print("\nAnswer:")
#     print(q["answer"])

In [20]:
# import faiss
# import numpy as np
#
# # Load the FAISS index file
# index = faiss.read_index("results/faiss_index/index.faiss")
#
# # Print basic info
# print("Index type:", type(index).__name__)
# print("Dimension:", index.d)
# # print("Is trained:", index.is_trained)
# # print("Total vectors stored:", index.ntotal)
#
# # Example: retrieve all stored vectors (if they fit in memory)
# try:
#     xb = index.reconstruct_n(0, index.ntotal)  # returns all vectors
#     print("Sample vector (first one):", xb[0])
# except Exception as e:
#     print("Cannot reconstruct vectors:", e)
