# Chatbot Comparison: API-Based vs. Local Models

### This notebook demonstrates how to interact with both API-based and local models using a simple ConversationManager class.
### Students will compare responses from OpenAI GPT-4 and a local model (using HuggingFace Transformers).

In [11]:
from openai import OpenAI
import os
from transformers import pipeline

from introdl.utils import config_paths_keys
paths = config_paths_keys()

MODELS_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\models
DATA_PATH=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\data
TORCH_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads
HF_HOME=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads
HF_HUB_CACHE=C:\Users\bagge\My Drive\Python_Projects\DS776_Develop_Project\downloads
Successfully logged in to Hugging Face Hub.


In [12]:
class ConversationManager:
    def __init__(self, system_prompt: str = "You are a helpful assistant."):
        self.history = []
        self.system_prompt = system_prompt
        self.reset()

    def reset(self):
        """Resets the conversation history."""
        self.history = [{"role": "system", "content": self.system_prompt}]

    def add_user_message(self, message: str):
        """Adds a user message to the conversation history."""
        self.history.append({"role": "user", "content": message})

    def add_assistant_message(self, message: str):
        """Adds an assistant message to the conversation history."""
        self.history.append({"role": "assistant", "content": message})

    def get_history(self):
        """Returns the conversation history formatted for an API or LLM call."""
        return self.history

    def get_formatted_history(self):
        """Returns the conversation history as a formatted string for local models."""
        return "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.history])

In [13]:
# Initialize the conversation manager
conversation = ConversationManager(system_prompt="You are a helpful tutor for a deep learning class.")

### API-Based Model Interaction 

In [20]:
# Initialize the Gemini API client
client = OpenAI(
    api_key=os.getenv("GEMINI_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

def interact_with_api(conversation: ConversationManager, **kwargs):
    response = client.chat.completions.create(
        model="gemini-2.0-flash-lite",
        messages=conversation.get_history(), **kwargs
    )
    message = response.choices[0].message.content
    conversation.add_assistant_message(message)
    return message

### Local Model Interaction (HuggingFace GPT-2) with Configurable Decoding Parameters

In [15]:
# Load the model and tokenizer separately for better control over generation
from transformers import AutoModelForCausalLM, AutoTokenizer

local_model_name = "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(local_model_name)
model = AutoModelForCausalLM.from_pretrained(local_model_name, device_map="auto")
model.eval();  # Set to evaluation mode

In [27]:
def interact_with_local_model(conversation: ConversationManager, split_str='Assistant:', **kwargs):
    if hasattr(tokenizer, "apply_chat_template") and getattr(tokenizer, "chat_template", None) is not None:
        inputs = tokenizer.apply_chat_template(
            conversation.get_history(),
            return_tensors="pt",
            padding=True,
            truncation=True
        ).to(model.device)
    else:
        prompt = conversation.get_formatted_history() + 'Assistant:'
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(inputs, **kwargs)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    message = response.split(split_str)[-1].strip() if split_str in response else response
    conversation.add_assistant_message(message)
    return message

### Example Interaction (API-based)



In [21]:
conversation.add_user_message("What is overfitting in deep learning?")
api_response = interact_with_api(conversation)
print("\nAPI Response:\n", api_response)


API Response:
 Okay, let's break down overfitting in the context of deep learning. Imagine you're trying to learn a new skill, like riding a bike.

**Overfitting: Learning the Noise, Not the Signal**

Overfitting is a common problem in machine learning, including deep learning. It happens when a model learns the training data *too well*, including the noise and random fluctuations present in that data.  Think of it like memorizing specific answers to practice questions instead of understanding the underlying concepts.

Here's a more detailed explanation:

*   **Training Data:** This is the dataset the model uses to learn.
*   **Noise:**  Real-world data often has errors, inconsistencies, and random variations. This is noise. It might be due to measurement errors, irrelevant features, or simply random chance.
*   **Signal:** The underlying patterns, relationships, and true trends that the data represents.  This is what we *want* the model to learn.

**What Happens During Overfitting?**

### Example Interaction (Local Model)



In [28]:
conversation.reset()
conversation.add_user_message("What is overfitting in deep learning?")
local_response = interact_with_local_model(conversation)
print("\nLocal Model Response:\n", local_response)


Local Model Response:
 system

Cutting Knowledge Date: December 2023
Today Date: 05 Apr 2025

You are a helpful tutor for a deep learning class.user

What is overfitting in deep learning?assistant

Overfitting is a common problem in deep learning where a model becomes too specialized to the training data and fails to generalize well to new, unseen data. This occurs when the model is too complex and has too many parameters, causing it to fit the noise and patterns in the training data too closely.

In other words, overfitting happens when a model is too good at fitting the training data, but poorly at fitting the underlying data distribution. This results in poor performance on new, unseen data, which can be similar to the training data but not identical.

There are several causes of overfitting:

1. **Model complexity**: Models with too many parameters or layers can fit the training data too closely, leading to overfitting.
2. **Insufficient training data**: If the training data is to

### Students will try different prompts and compare the responses from both models.
### They will also examine how the conversation history affects the responses.



In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer


# Load model and tokenizer
model_name = "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
)

# Set model to evaluation mode
model.eval()

# Example prompt
prompt = "Explain the difference between supervised and unsupervised learning."

# Encode the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

max_length = 200  # Maximum length of the generated text

# Generate text using different decoding strategies
with torch.no_grad():
    greedy_output = model.generate(**inputs, max_length=max_length, do_sample=False)
    top_k_output = model.generate(**inputs, max_length=max_length, top_k=50)
    top_p_output = model.generate(**inputs, max_length=max_length, top_p=0.9)
    beam_output = model.generate(**inputs, max_length=max_length, num_beams=5)

# Decode the outputs
generated_texts = {
    "Greedy": tokenizer.decode(greedy_output[0], skip_special_tokens=True),
    "Top-K": tokenizer.decode(top_k_output[0], skip_special_tokens=True),
    "Top-P": tokenizer.decode(top_p_output[0], skip_special_tokens=True),
    "Beam Search": tokenizer.decode(beam_output[0], skip_special_tokens=True)
}

# Reference text (ground truth)
reference = [
    "Supervised learning uses labeled data to learn a mapping from inputs to outputs, while unsupervised learning tries to find patterns or groupings within unlabeled data."
]

# Calculate BLEU and ROUGE scores
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
results = {}

for strategy, text in generated_texts.items():
    bleu_score = sentence_bleu([reference[0].split()], text.split())
    rouge_scores = scorer.score(reference[0], text)
    results[strategy] = {
        "Generated Text": text,
        "BLEU Score": bleu_score,
        "ROUGE-1": rouge_scores['rouge1'].fmeasure,
        "ROUGE-2": rouge_scores['rouge2'].fmeasure,
        "ROUGE-L": rouge_scores['rougeL'].fmeasure,
    }

# Display results
import pandas as pd

df = pd.DataFrame.from_dict(results, orient='index')
from IPython.display import display, HTML
display(HTML(df.to_html(float_format="%.4f", justify="center", index=True, border=0, classes='dataframe')))




Unnamed: 0,Generated Text,BLEU Score,ROUGE-1,ROUGE-2,ROUGE-L
Greedy,"Explain the difference between supervised and unsupervised learning. In the context of machine learning, supervised learning is used to predict outcomes based on labeled data, while unsupervised learning is used to identify patterns and relationships in unlabeled data.\nIn the context of machine learning, supervised learning is used to predict outcomes based on labeled data, while unsupervised learning is used to identify patterns and relationships in unlabeled data. The key difference between the two is the type of data used to train the model.\n\n**Supervised Learning:**\n\nIn supervised learning, the model is trained on labeled data, where each example is associated with a target output. The goal is to learn a mapping between inputs and outputs, so the model can make predictions on new, unseen data. The model is trained to minimize the difference between its predictions and the actual outputs.\n\n**Unsupervised Learning:**\n\nIn unsupervised learning, the model is trained on unlabeled data, where there is no target output. The",0.0264,0.2054,0.0874,0.173
Top-K,"Explain the difference between supervised and unsupervised learning. \nSupervised learning is a type of machine learning where the data is labeled or classified into predefined categories. In this type of learning, the algorithm is trained on labeled data to learn the relationships between the input and output variables. The goal of supervised learning is to make predictions on new, unseen data.\n\nUnsupervised learning, on the other hand, is a type of machine learning where the data is not labeled or classified into predefined categories. In this type of learning, the algorithm is trained on unlabeled data to identify patterns, relationships, or groupings within the data. The goal of unsupervised learning is to discover hidden structures or patterns in the data that may not be immediately apparent.\n\nHere is an example to illustrate the difference:\n\nSupervised learning:\n\n* A company wants to predict whether a customer will buy a product based on their age, income, and other demographic information.\n* The company has labeled data with the outcome (",0.0261,0.2,0.0957,0.1684
Top-P,"Explain the difference between supervised and unsupervised learning. In the context of the dataset provided in the problem you'll be working on, the dataset has 3 features: x1, x2, and x3. The target variable y is the number of days it takes to complete a task. The goal is to predict the number of days to complete a task based on the input features x1, x2, and x3.\n\n## Step 1: Understand the Basics of Supervised and Unsupervised Learning\nSupervised learning involves training a model on labeled data, where the correct output is already known. The goal is to learn a mapping between inputs and outputs, so the model can make predictions on new, unseen data. Unsupervised learning, on the other hand, involves training a model on unlabeled data, where the correct output is not known. The goal is to discover patterns or structure in the data.\n\n## Step 2: Apply Supervised Learning to the",0.0232,0.2011,0.0904,0.1788
Beam Search,"Explain the difference between supervised and unsupervised learning. \n\n**Supervised Learning**\n======================\n\nIn supervised learning, the algorithm is trained on labeled data, where each example is accompanied by a target or response variable. The goal is to learn a mapping between input data and output labels, so the algorithm can make predictions on new, unseen data.\n\n**Key Characteristics:**\n\n* The algorithm is trained on labeled data.\n* The algorithm learns a mapping between input data and output labels.\n* The goal is to make predictions on new, unseen data.\n\n**Example Use Cases:**\n\n* Image classification (e.g., classifying images as ""dog"" or ""cat"")\n* Sentiment analysis (e.g., classifying text as ""positive"" or ""negative"")\n* Regression (e.g., predicting continuous values, such as house prices)\n\n**Unsupervised Learning**\n=====================\n\nIn unsupervised learning, the algorithm is trained on unlabeled data,",0.0215,0.2208,0.0921,0.1948


In [2]:
# Complete Notebook Demonstration: Decoding Strategy Evaluation with Text Cleaning

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer
import pandas as pd
import re

# Load model and tokenizer
model_name = "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
)

# Set model to evaluation mode
model.eval()

def clean_output(text, prompt=""):
    """
    Cleans the model-generated text by removing the prompt, formatting, and common prefixes/suffixes.
    
    Args:
        text (str): The generated text from the model.
        prompt (str): The prompt used for generating the text. If present at the beginning, it will be removed.
        
    Returns:
        str: Cleaned text ready for evaluation.
    """
    # Remove the prompt if it exists at the beginning of the text
    if text.startswith(prompt):
        text = text[len(prompt):].strip()
    
    # Remove markdown headers, formatting, and common prefixes
    unwanted_prefixes = [
        "The answer is:", "Here is the explanation:", 
        "In conclusion,", "To summarize,", "As follows:"
    ]
    for prefix in unwanted_prefixes:
        if text.startswith(prefix):
            text = text[len(prefix):].strip()
    
    # Remove markdown-like formatting (titles, headers, etc.)
    text = re.sub(r"\*\*.*?\*\*|=+", "", text)
    
    # Remove bullet points, numbers, dashes, and unwanted newlines
    text = re.sub(r"(\*|-|•|\d+\.)\s", "", text)
    text = re.sub(r"\n+", " ", text)
    
    # Remove anything that's not alphanumeric, standard punctuation, or whitespace
    text = re.sub(r"[^a-zA-Z0-9.,!?;:\-()\'\"\s]", "", text)
    
    # Remove extra whitespace
    text = ' '.join(text.split())
    
    return text

# Example prompt
prompt = "Explain the difference between supervised and unsupervised learning."

# Encode the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate text using different decoding strategies
with torch.no_grad():
    greedy_output = model.generate(**inputs, max_length=100)
    top_k_output = model.generate(**inputs, max_length=100, top_k=50)
    top_p_output = model.generate(**inputs, max_length=100, top_p=0.9)
    beam_output = model.generate(**inputs, max_length=100, num_beams=5)

# Decode the outputs
generated_texts = {
    "Greedy": tokenizer.decode(greedy_output[0], skip_special_tokens=True),
    "Top-K": tokenizer.decode(top_k_output[0], skip_special_tokens=True),
    "Top-P": tokenizer.decode(top_p_output[0], skip_special_tokens=True),
    "Beam Search": tokenizer.decode(beam_output[0], skip_special_tokens=True)
}

# Clean all generated texts
cleaned_texts = {strategy: clean_output(text, prompt) for strategy, text in generated_texts.items()}

# Reference text (ground truth)
reference = [
    "Supervised learning uses labeled data to learn a mapping from inputs to outputs, while unsupervised learning tries to find patterns or groupings within unlabeled data."
]

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Calculate BLEU and ROUGE scores
results = {}
for strategy, text in cleaned_texts.items():
    bleu_score = sentence_bleu([reference[0].split()], text.split())
    rouge_scores = scorer.score(reference[0], text)
    results[strategy] = {
        "Generated Text": text,
        "BLEU Score": bleu_score,
        "ROUGE-1": rouge_scores['rouge1'].fmeasure,
        "ROUGE-2": rouge_scores['rouge2'].fmeasure,
        "ROUGE-L": rouge_scores['rougeL'].fmeasure,
    }

# Display results
dataframe=pd.DataFrame.from_dict(results, orient='index')
from IPython.display import display, HTML
display(HTML(dataframe.to_html(float_format="%.4f", justify="center", index=True, border=0, classes='dataframe')))


Unnamed: 0,Generated Text,BLEU Score,ROUGE-1,ROUGE-2,ROUGE-L
Greedy,"In supervised learning, the algorithm is trained on labeled data, where each example is associated with a target output. The goal is to learn a mapping between inputs and outputs, so the algorithm can make predictions on new, unseen data. Here's an example: : A picture of a cat : A label indicating whether the picture is of a cat (yes or no",0.0441,0.2759,0.1176,0.2529
Top-K,"In machine learning, supervised learning is used to predict the output of a target variable, whereas unsupervised learning is used to identify patterns and relationships in data without a target variable. In supervised learning, the algorithm is trained on labeled data, where each example is associated with a target variable. The goal is to learn a mapping between input features and target variables. The algorithm learns to predict the output of the target variable based on the input",0.0409,0.3301,0.1188,0.233
Top-P,"Supervised learning involves training a model on labeled data, where the model is given a set of input examples, each labeled with a target output. The goal is to learn a mapping between inputs and outputs, so the model can make predictions on new, unseen data. Unsupervised learning, on the other hand, involves training a model on unlabeled data, where the goal is to identify patterns or structure in the data.",0.0495,0.3542,0.1702,0.3333
Beam Search,"In supervised learning, the algorithm is trained on labeled data, where each example is accompanied by a target or response variable. The goal is to learn a mapping between input data and output labels, so the algorithm can make predictions on new, unseen data. In unsupervised learning, the algorithm is trained on unlabeled data, and the goal is",0.0493,0.3614,0.1728,0.3373


In [6]:
!pip install ../Course_Tools/introdl

Processing c:\users\bagge\my drive\python_projects\ds776_develop_project\ds776\lessons\course_tools\introdl
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting bert_score (from introdl==1.0)
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
Building wheels for collected packages: introdl
  Building wheel for introdl (pyproject.toml): started
  Building wheel for introdl (pyproject.toml): finished with status 'done'
  Created wheel for introdl: filename=introdl-1.0-py3-none-any.whl size=46579 sha256=aff3cbdb4b46b3db49cfeea6faafd5deaaae3a6c954977ba51a9616ed63d0040
  Stored in directory: C:\Users\bagge\AppData\Local\Temp\pip-ephem-whee

In [8]:
# Complete Notebook Demonstration: Decoding Strategy Evaluation with Text Cleaning

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from bert_score import score
import pandas as pd
import re

# Load model and tokenizer
model_name = "unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
)

# Set model to evaluation mode
model.eval()

def clean_output(text, prompt=""):
    """
    Cleans the model-generated text by removing the prompt, formatting, and common prefixes/suffixes.
    
    Args:
        text (str): The generated text from the model.
        prompt (str): The prompt used for generating the text. If present at the beginning, it will be removed.
        
    Returns:
        str: Cleaned text ready for evaluation.
    """
    # Remove the prompt if it exists at the beginning of the text
    if text.startswith(prompt):
        text = text[len(prompt):].strip()
    
    # Remove markdown headers, formatting, and common prefixes
    unwanted_prefixes = [
        "The answer is:", "Here is the explanation:", 
        "In conclusion,", "To summarize,", "As follows:"
    ]
    for prefix in unwanted_prefixes:
        if text.startswith(prefix):
            text = text[len(prefix):].strip()
    
    # Remove markdown-like formatting (titles, headers, etc.)
    text = re.sub(r"\*\*.*?\*\*|=+", "", text)
    
    # Remove bullet points, numbers, dashes, and unwanted newlines
    text = re.sub(r"(\*|-|•|\d+\.)\s", "", text)
    text = re.sub(r"\n+", " ", text)
    
    # Remove anything that's not alphanumeric, standard punctuation, or whitespace
    text = re.sub(r"[^a-zA-Z0-9.,!?;:\-()\'\"\s]", "", text)
    
    # Remove extra whitespace
    text = ' '.join(text.split())
    
    return text

# Example prompt
prompt = "Explain the difference between supervised and unsupervised learning."

# Encode the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate text using different decoding strategies
with torch.no_grad():
    greedy_output = model.generate(**inputs, max_length=100)
    top_k_output = model.generate(**inputs, max_length=100, top_k=50)
    top_p_output = model.generate(**inputs, max_length=100, top_p=0.9)
    beam_output = model.generate(**inputs, max_length=100, num_beams=5)

# Decode the outputs
generated_texts = {
    "Greedy": tokenizer.decode(greedy_output[0], skip_special_tokens=True),
    "Top-K": tokenizer.decode(top_k_output[0], skip_special_tokens=True),
    "Top-P": tokenizer.decode(top_p_output[0], skip_special_tokens=True),
    "Beam Search": tokenizer.decode(beam_output[0], skip_special_tokens=True)
}

# Clean all generated texts
cleaned_texts = {strategy: clean_output(text, prompt) for strategy, text in generated_texts.items()}

# Reference text (ground truth)
reference = [
    "Supervised learning uses labeled data to learn a mapping from inputs to outputs, while unsupervised learning tries to find patterns or groupings within unlabeled data."
]

# Calculate BERTScore for each strategy
results = {}
for strategy, text in cleaned_texts.items():
    P, R, F1 = score([text], reference, lang="en", model_type="microsoft/deberta-xlarge-mnli")
    results[strategy] = {
        "Generated Text": text,
        "BERTScore P": P.mean().item(),
        "BERTScore R": R.mean().item(),
        "BERTScore F1": F1.mean().item(),
    }

# Display results
dataframe=pd.DataFrame.from_dict(results, orient='index')
from IPython.display import display, HTML
display(HTML(dataframe.to_html(float_format="%.4f", justify="center", index=True, border=0, classes='dataframe')))


Unnamed: 0,Generated Text,BERTScore P,BERTScore R,BERTScore F1
Greedy,"In the context of a classification problem. In the classification problem, the data is split into two classes: 0 and Class 0 represents a normal state, while Class 1 represents a faulty state. The goal is to train a model to predict the class of a new, unseen input. In supervised learning, the model is trained on labeled data, where each example is associated with a target",0.5546,0.6315,0.5906
Top-K,"Supervised learning involves training a model on labeled data, where the model learns to predict the target variable based on the input features. In contrast, unsupervised learning involves training a model on unlabeled data, where the model learns to identify patterns or relationships in the data without a target variable. In the context of natural language processing, supervised learning is often used for tasks such as text classification, sentiment analysis, and machine translation. These",0.6513,0.8146,0.7238
Top-P,"In the context of machine learning, supervised learning is used to predict outcomes, while unsupervised learning is used to identify patterns. Step 1: Define Supervised Learning Supervised learning is a type of machine learning where the algorithm is trained on labeled data. This means that the data is already classified or labeled with the correct output, and the goal is to learn a mapping between inputs and outputs. Step 2:",0.6209,0.7607,0.6837
Beam Search,"In supervised learning, the algorithm is trained on labeled data, where each example is accompanied by a target or response variable. The goal is to learn a mapping between input data and output labels, so the algorithm can make predictions on new, unseen data. In unsupervised learning, the algorithm is trained on unlabeled data, and the goal is",0.6837,0.7898,0.733
