<a href="https://colab.research.google.com/github/sourcesync/kagglex_gemma/blob/gw%2Finitial/colab/mary_georges_troubleshoot_based_on_Attempt10_21.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part A:  Notebook Setup and Configuration

# Install required packages

In [1]:
!pip install transformers datasets keras-nlp keras>=3 tensorflow-text huggingface-hub peft langchain_community chromadb sentence-transformers peft python-docx

# Import required packages

In [2]:
import os
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from transformers import StoppingCriteria, StoppingCriteriaList
from torch.utils.data import DataLoader, Dataset
from huggingface_hub import login
from google.colab import files, userdata
import torch
from torch import nn
from peft import LoraConfig, get_peft_model, TaskType, PeftModel, PeftConfig
from tokenizers.processors import TemplateProcessing
from docx import Document
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

# Bind to Hugging Face
* Via Colab notebook secrets

In [3]:
# NOTE: Your HF secret token name may be different
hugging_face_api_token = userdata.get("huggingface_api_token_2")
login(token=hugging_face_api_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Configure this notebook

In [4]:
os.environ["WANDB_DISABLED"] = "true" # Disable HF trainer WANDB integration

# Define some useful classes/functions
* Mary, I increased the 'max_new_token' from 100 to 256 to reduce chance of completion truncation

In [5]:
# Load model and tokenizer via HF
def load_model_and_tokenizer(model_name):
    model = AutoModelForCausalLM.from_pretrained(model_name, attn_implementation='eager')
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

# Load tokenizer for fine-tune data
def load_tokenizer_for_ft(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token=True)
    return tokenizer

# A class to make sure we stop when the EOS token is generated
class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stop_id = 1):
      StoppingCriteria.__init__(self),
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, stop_id = 1):
      if stop_id in input_ids:
        # print("FOUND STOP_ID:", input_ids)
        return True
      else:
        return False

# Generate set-up for model response
def generate_response(prompt, device='cuda'):
    # debug - print("input_ids=", encoding.input_ids)
    encoding = tokenizer(prompt, return_tensors='pt').to(device)
    generation_config = model.generation_config
    generation_config.max_new_tokens = 512
    generation_config.temperature = 0.7
    #generation_config.top_p = 0.7 # uncomment for more 'creative' completion
    generation_config.num_return_sequences = 1

    # this will ensure text generation stops at the EOS token
    stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stop_id = tokenizer.eos_token_id)  ])
    completion = model.generate(input_ids = encoding.input_ids,
                                attention_mask = encoding.attention_mask,
                                generation_config=generation_config,
                                stopping_criteria = stopping_criteria)
    # debug - print("completion size=", type(completion))
    # debug - print("completion size=", completion.shape)
    # debug - print("completion=", completion)
    response = tokenizer.decode(completion[0], skip_special_tokens=True)
    return response.replace(prompt, "")

# Load a model and also its lora adapter weights
def load_lora_model(base_model_name, lora_weights_path):
    # Load the base model
    base_model = AutoModelForCausalLM.from_pretrained(base_model_name, attn_implementation='eager')

    # Load the LoRA configuration
    peft_config = PeftConfig.from_pretrained(lora_weights_path)

    # Load the LoRA model
    model = PeftModel.from_pretrained(base_model, lora_weights_path)

    # Merge LoRA weights with base model
    model = model.merge_and_unload()

    # Load the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_name)

    return model, tokenizer

# Useful in our RAG implementation
class DocumentWithText:
    def __init__(self, content, metadata=None):
        self.page_content = content
        self.metadata = metadata if metadata is not None else {}

# Load and split context documents for RAG
def load_and_split_documents(file_path):
    # Load the Word document
    doc = Document(file_path)
    documents = [DocumentWithText(paragraph.text) for paragraph in doc.paragraphs if paragraph.text]

    # Here you can choose how to split the text
    text_splitter = CharacterTextSplitter(chunk_size=9000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    return texts

# Initialize prompt display
def display_chat(prompt, response):
    print("Prompt:")
    print(prompt)
    print("-------------------------------------------")
    print("\nResponse:")
    print(response)


# Part B: Pretrained Model Experimentation

In this part of the notebook, we see how good or bad the Gemma 2 base models are for this task.  We try the following models with a simple prompt:
* Gemma2 2b
* Gemma2 instruct 2b

# Load the base Gemma 2 model for testing

In [6]:
model_name = 'google/gemma-2-2b' # Note we are using Google's official model
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/818 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/481M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/46.4k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

# Perform a test prompt
* It's the base model, so we don't expect it to do well
* Let's use the "raw" prompt we formulated in the Keras experiments

In [7]:
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also set up systems to monitor student progress towards meeting those standards.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and ensure that all students have access to a high-quality education.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

-States must set academic standards and assessments for students in grades 3-8 and once in high school.

-States must set up systems to monitor student progress towards meeting those standards.

-States must prov

# Load the Gemma 2 instruction tuned model

In [8]:
model_name = 'google/gemma-2-2b-it' # Note we are using Google's official version
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

# Perform a test prompt
* Let's use the identical prompt we used above on the base model
* We might expect it to do better than the base model

In [9]:
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)

Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is the Every Student Succeeds Act, a federal law passed in 2015 that replaced the No Child Left Behind Act. 

Here are some key features of ESSA:

* **Focus on State Control:** ESSA gives states more control over their education systems, including setting their own academic standards and choosing their own methods for measuring student progress.
* **Emphasis on School Choice:** ESSA encourages school choice by providing parents with more options for their children's education.
* **Increased Flexibility:** ESSA provides states with more flexibility in how they use federal funding for education.
* **Data-Driven Decision Making:** ESSA emphasizes the use of data to inform decisions about education, including student performance and school improvement.
* **Support for Students wit

# Evaluation

* Not surprisingly, the Gemma2 base model appears not to be able to complete a thought
* The Gemma2 instruct model does a lot better but its a little wordy
* Sounds like we need to do some fine-tuning to get the response style just right
* Note that we will only fine-tune for response style, not for factual grounding since we will deal with that as a separate RAG activity later in the notebook

# Part C: Fine-tuning

In this part of the notebook, we do the following:
* load a fine-tuning dataset and make sure it's formatted correctly for our task
* perform various fine-tuning training experiments until we get a model we like

# Create a dataset for fine-tuning

In [11]:
# Mount google drive
from google.colab import drive
drive.mount('/content/drive')

# Load the QNA cvs
# NOTE: obviously your path will be different within your google drive
DATASET_PATH='/content/drive/MyDrive/Kaggle_X/Mary_ESSA/input/attempt-930/ESSA qna_csv.csv'
if not os.path.exists(DATASET_PATH):
  raise Exception("Cannot find the dataset")
df = pd.read_csv(DATASET_PATH)
pd.set_option('display.max_colwidth', None)
df.describe()
df.head(5)

# Define format of the fine-tuning data
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
pre = '''The following is an excerpt from a conversation of a user with an AI assistant. '''\
      '''The assistant that can answer questions about ESSA. '''\
      '''ESSA stands for the Every Student Succeeds Act.'''

# Format each training string, put them all into a list
ft_all_data = []
for idx, row in df.iterrows():
  ft_item = template.format(pre=pre, question=row['Question'], answer=row['Answer'])
  ft_all_data.append(ft_item)

# Double-check by printing some of the data
print("----")
print(ft_all_data[0])
print("----")
print(ft_all_data[1])
print("----")
print(ft_all_data[2])
print("----")
print(ft_all_data[-1])

# Tokenize all the fine-tune data
# Note we use a slightly different tokener in order for EOS to be appended
tokenizer = load_tokenizer_for_ft("google/gemma-2-2b")
tokenized_ft_data = []
for el in ft_all_data:
  tok_item = tokenizer(el, padding=True, truncation=True)
tokenized_ft_data.append( tok_item )
print("bos token=",tokenizer.bos_token_id, "eos token=", tokenizer.eos_token_id )
print("----tokenized----")
print(tokenized_ft_data[0])

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
----
The following is an excerpt from a conversation of a user with an AI assistant. The assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
Does my state still have to test 95 percent of its students? 

Answer:
ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overall achievement of 95 percent of students, 95 percent must take the annual assessments. 
----
The following is an excerpt from a conversation of a user with an AI assistant. The assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
How do the students (up to 

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


bos token= 2 eos token= 1
----tokenized----
{'input_ids': [2, 651, 2412, 603, 671, 80545, 774, 476, 12836, 576, 476, 2425, 675, 671, 16481, 20409, 235265, 714, 20409, 674, 798, 3448, 3920, 1105, 62639, 235280, 235265, 62639, 235280, 12353, 604, 573, 7205, 13137, 64795, 17825, 5031, 235265, 109, 9413, 235292, 108, 2299, 3695, 2004, 14561, 590, 7594, 7535, 5913, 7695, 235336, 109, 1261, 235292, 108, 4883, 590, 7594, 708, 3690, 577, 9877, 8897, 37921, 577, 18739, 5913, 7695, 578, 24434, 576, 40126, 235269, 26936, 984, 4664, 573, 15459, 10053, 1142, 11128, 731, 62639, 235280, 235265, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


# Load the base model for fine-tuning (epochs=1)

In [12]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Prepare model for LORA fine-tuning (epochs=1)
* From https://huggingface.co/docs/peft/en/quicktour
* And help from https://colab.research.google.com/drive/1IqL0ay04RwNNcn5R7HzhgBqZ2lPhHloh?usp=sharing#scrollTo=i4jK1V20qiac

In [13]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


# Fine-tune the HF-way for 1 epoch

In [14]:
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft1", # NOTE: Your gdrive path is likely different
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=1,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None # don't integrate with WANDB
)

# TODO: Should we explicitly set the AdamW optimizer as we used in KERAS?
# TODO: It could explain why it appears to take a few more epochs the HF-way

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Step,Training Loss
1,2.278


TrainOutput(global_step=1, training_loss=2.2779738903045654, metrics={'train_runtime': 8.9832, 'train_samples_per_second': 0.111, 'train_steps_per_second': 0.111, 'total_flos': 1057215269376.0, 'train_loss': 2.2779738903045654, 'epoch': 1.0})

# Perform a test prompt (epochs=1)

In [15]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also set up systems to monitor student progress and provide support for schools that need it.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and provide high-quality education for all students.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

- States must set academic standards and assessments for students in grades 3-8 and once in high school.
- States must set up systems to monitor student progress and provide support for schools that need it.
- States

# Load the base model for fine-tuning (epochs=4)

In [16]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Fine-tune for 4 epochs

In [17]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft4", # NOTE: Your gdrive path is likely different
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=4,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None # don't integrate with WANDB
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?
# TODO: It could explain why it appears to take a few more epochs the HF-way

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


Step,Training Loss
1,2.278
2,2.219
3,2.1688
4,2.132


TrainOutput(global_step=4, training_loss=2.1994380950927734, metrics={'train_runtime': 11.343, 'train_samples_per_second': 0.353, 'train_steps_per_second': 0.353, 'total_flos': 4228861077504.0, 'train_loss': 2.1994380950927734, 'epoch': 4.0})

# Test a prompt (epochs=4)

In [18]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also develop plans to help students who are struggling to meet these standards.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and ensure that all students have access to a high-quality education.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

- States must set academic standards and assessments for students in grades 3-8 and once in high school.
- States must develop plans to help students who are struggling to meet these standards.
- States must repor

# Load model for fine-tuning (epochs=32)

In [19]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Fine-tune for 32 epochs

In [20]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft32", # NOTE: Your gdrive path is likely different
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=32,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None, # don't integrate with WANDB
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?
# TODO: It could explain why it appears to take a few more epochs the HF-way

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


Step,Training Loss
1,2.278
2,2.2187
3,2.1532
4,2.0821
5,2.0068
6,1.9277
7,1.8448
8,1.7589
9,1.6712
10,1.5835


TrainOutput(global_step=32, training_loss=1.2505269143730402, metrics={'train_runtime': 13.9627, 'train_samples_per_second': 2.292, 'train_steps_per_second': 2.292, 'total_flos': 33830888620032.0, 'train_loss': 1.2505269143730402, 'epoch': 32.0})

# Test a prompt (32 epochs)


In [21]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that sets education standards and accountability measures for public schools in the United States. It aims to improve student achievement and ensure equitable access to education.


# Evaluation

* The model fine-tuned for 32 epochs has a nice concise reponse
* It could be there is a "sweet spot" between 4 and 32 epochs

# Save fine-tuned model (only save the LORA adapter weights!)

In [22]:
# TODO: your gdrive path will be different likely
lora_save_path = "/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft32/lora"
os.makedirs(os.path.dirname(lora_save_path), exist_ok=True)
model.save_pretrained(lora_save_path)

# Load the saved model



In [23]:
# Remember we fine-tuned from Gemma2 base 2b.  The function loads and merges the LORA adapter weights.
model, tokenizer = load_lora_model("google/gemma-2-2b", lora_save_path)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Double-check the loaded model with a test prompt

In [24]:
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)

Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that sets education standards and accountability measures for public schools in the United States. It aims to improve student achievement and ensure equitable access to education.


# Part D:  RAG

We use RAG techniques to add 'context' to a prompt in order to ground the model in facts from a database

# Test the model with a challenging prompt without RAG context
* the purpose here is to justify our need for RAG

In [25]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='"Does my state still have to test 95 percent of its students?"',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)


Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
"Does my state still have to test 95 percent of its students?"

Answer:

-------------------------------------------

Response:
Yes, ESSA requires states to assess student performance to ensure they meet academic standards.


# Now lets fake RAG 'context' to see how it responds



In [26]:
template = "{pre}\n\nContext:\n{context}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.'''
                            ''' Use the following context to answer the question below.''',
                         context='''ESSA requires that a state’s accountability system must measure the performance of '''
                                 '''95 percent of students by looking at a variety of indicators. One of the indicators '''
                                 '''is “academic achievement as measured by proficiency on the annual assessments.” '''
                                 '''For this reason, in order to measure the overall achievement of 95 percent of students, '''
                                 '''95 percent must take the annual assessments.''',
                         question='Does my state still have to test 95 percent of its students?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)

Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act. Use the following context to answer the question below.

Context:
ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overall achievement of 95 percent of students, 95 percent must take the annual assessments.

Question:
Does my state still have to test 95 percent of its students?

Answer:

-------------------------------------------

Response:
Yes, ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overal

# Evaluation

* The fine-tuned model got the answer right, but...
* The fake-RAG answer was able to cite exactly the context in which to answer the question
* This suggests we should use RAG when possible so that we can ensure the model has an answer grounded with 'truth' from a dataset

# Create the RAG "database" and vector-based retriever

In [27]:
# Chunk the content of the file we are using as the RAG database
# NOTE: your gdrive path to the file is likely different
# !ls /content/drive/MyDrive/Kaggle_X/Mary_ESSA/input/
texts = load_and_split_documents("/content/drive/MyDrive/Kaggle_X/Mary_ESSA/input/ESSA RAG file_10.23.docx")

# Let's add the facts from the fine-tuning dataset too
# NOTE: your gdrive path to the file is likely different
DATASET_PATH='/content/drive/MyDrive/Kaggle_X/Mary_ESSA/input/attempt-930/ESSA qna_csv.csv'
if not os.path.exists(DATASET_PATH):
  raise Exception("Cannot find the dataset")
df = pd.read_csv(DATASET_PATH)
for idx, row in df.iterrows():
  txt = DocumentWithText(row['Question'] + ' ' + row['Answer'])
  texts.append(txt)

print("Number of items=", len(texts))

# Debugging - print all the items
#for txt in texts:
#  print(txt.page_content)

# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create a Chroma vector database from the documents
# Important: Make sure to delete previous db (if any) or else retrieval returns lots of duplicates :)
try:
  db.delete_collection()
except:
  pass
db = Chroma.from_documents(texts, embeddings)

# Create a retriever from the vector database
# NOTE: You need to experiment with retrieval parameters
retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"k": 2, "score_threshold": 0.5})

Number of items= 284


  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Test the RAG retriever on a prompt

In [28]:
docs = retriever.invoke("Does my state still have to test 95 percent of its students?")
print("retreived", len(docs), "documents")
print(docs)

retreived 2 documents
[Document(metadata={}, page_content='Does my state still have to test 95 percent of its students?  ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overall achievement of 95 percent of students, 95 percent must take the annual assessments. '), Document(metadata={}, page_content='Are all students required to be tested for accountability purposes?  ESSA affirms states’ authority over the policies governing parents “opting” their children out of state standardized tests, but it maintains the requirement that 95 percent of children in each school, as well as 95 percent of students in each subgroup, be tested. The big change from NCLB is that schools will not be subject to federally prescribed corrective action if they fail to meet th

# Test model with a real RAG-enhanced prompt
* Note that I'm formatting the raw prompt myself
* You could use LangChain to do all the formattting for but I've gotten used to formatting my own prompt :)

In [29]:
# Combine all the retriever docs into a piece of 'context'

context = ''
for doc in docs:
  context += doc.page_content + '\n'

# Test the RAG-based prompt
template = "{pre}\n\nContext:\n{context}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.'''
                            ''' Use the following context to answer the question below.''',
                         context=context,
                         question='Does my state still have to test 95 percent of its students?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act. Use the following context to answer the question below.

Context:
Does my state still have to test 95 percent of its students?  ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overall achievement of 95 percent of students, 95 percent must take the annual assessments. 
Are all students required to be tested for accountability purposes?  ESSA affirms states’ authority over the policies governing parents “opting” their children out of state standardized tests, but it maintains the requirement that 95 percent of children in each school, as well as 95 percent of students in each subgroup, be tested. The big change from NCLB is that schools

# Part E:  User Interface

TODO