<a href="https://colab.research.google.com/github/sourcesync/kagglex_gemma/blob/gw%2Finitial/colab/mary_georges_troubleshoot_based_on_Attempt10_21.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install required packages

In [1]:
# Installs
!pip install transformers datasets keras-nlp keras>=3 tensorflow-text huggingface-hub peft langchain_community chromadb sentence-transformers peft


# Import required packages

In [54]:
# Libraries
import os
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from transformers import StoppingCriteria, StoppingCriteriaList
from torch.utils.data import DataLoader, Dataset
from huggingface_hub import login
from google.colab import files, userdata
import torch
from torch import nn
from peft import LoraConfig, get_peft_model, TaskType
from tokenizers.processors import TemplateProcessing

# Bind to Hugging Face
* via Colab notebook secrets

In [2]:
hugging_face_api_token = userdata.get("huggingface_api_token_2") # Your HF secret token name may be different
login(token=hugging_face_api_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Configure this notebook

In [3]:
os.environ["WANDB_DISABLED"] = "true" # Disable HB trainer WANDB integration

# Define some useful classes/functions
* Mary, I increased the 'max_new_token' from 100 to 256 to reduce chance of completion truncation

In [140]:
# Load model and tokenizer via HF
def load_model_and_tokenizer(model_name):
    model = AutoModelForCausalLM.from_pretrained(model_name, attn_implementation='eager')
    tokenizer = AutoTokenizer.from_pretrained(model_name) #, add_eos_token=True)
    return model, tokenizer

# Load tokenizer for fine-tune data
def load_tokenizer_for_ft(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token=True)
    return tokenizer

# A class to make sure we stop when the EOS token is generated
class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stop_id = 1):
      StoppingCriteria.__init__(self),
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, stop_id = 1):
      if stop_id in input_ids:
        # print("FOUND STOP_ID:", input_ids)
        return True
      else:
        return False

# Generate set-up for model response
def generate_response(prompt, device='cuda'):
    # debug - print("input_ids=", encoding.input_ids)
    encoding = tokenizer(prompt, return_tensors='pt').to(device)
    generation_config = model.generation_config
    generation_config.max_new_tokens = 512
    generation_config.temperature = 0.7
    #generation_config.top_p = 0.7
    generation_config.num_return_sequences = 1

    # this will ensure text generation stops at the EOS token
    stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stop_id = tokenizer.eos_token_id)  ])
    completion = model.generate(input_ids = encoding.input_ids,
                                attention_mask = encoding.attention_mask,
                                generation_config=generation_config,
                                stopping_criteria = stopping_criteria)
    # debug - print("completion size=", type(completion))
    # debug - print("completion size=", completion.shape)
    # debug - print("completion=", completion)
    response = tokenizer.decode(completion[0], skip_special_tokens=True)
    return response.replace(prompt, "")

# Initialize prompt display
def display_chat(prompt, response):
    print("Prompt:")
    print(prompt)
    print("-------------------------------------------")
    print("\nResponse:")
    print(response)


# Load the base Gemma 2 model for testing

In [141]:
model_name = 'google/gemma-2-2b' # Note we are using Google's official model
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Perform a test prompt
* It's the base model, so we don't expect it to do well
* Let's use the prompt we formulated in the Keras experiments (not sure it matters)

In [142]:
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also set up systems to monitor student progress towards meeting those standards.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and ensure that all students have access to a high-quality education.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

-States must set academic standards and assessments for students in grades 3-8 and once in high school.

-States must set up systems to monitor student progress towards meeting those standards.

-States must prov

# Load the Gemma 2 instruction tuned model

In [143]:
model_name = 'google/gemma-2-2b-it' # Note we are using Google's official version
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Perform a test prompt
* Let's use the identical prompt we used above on the base model
* We might expect it to do better than the base model

In [144]:
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)

Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is the Every Student Succeeds Act, a federal law passed in 2015 that replaced the No Child Left Behind Act. 

Here are some key features of ESSA:

* **Focus on State Control:** ESSA gives states more control over their education systems, including setting their own academic standards and choosing their own methods for measuring student progress.
* **Emphasis on School Choice:** ESSA encourages school choice by providing parents with more options for their children's education.
* **Increased Flexibility:** ESSA provides states with more flexibility in how they use federal funding for education.
* **Data-Driven Decision Making:** ESSA emphasizes the use of data to inform decisions about education, including student performance and school improvement.
* **Support for Students wit

# Create a dataset for fine-tuning

In [145]:
from google.colab import drive
drive.mount('/content/drive')

# Obviously your path will be different
DATASET_PATH='/content/drive/MyDrive/Kaggle_X/Mary_ESSA/input/attempt-930/ESSA qna_csv.csv'
if not os.path.exists(DATASET_PATH):
  raise Exception("Cannot find the dataset")
df = pd.read_csv(DATASET_PATH)
pd.set_option('display.max_colwidth', None)
df.describe()
df.head(5)

# define format of the fine-tuning data
template = "{pre}\n\nQuestion:\n{question}\n\nAnswer:\n{answer}"
pre = '''The following is an excerpt from a conversation of a user with an AI assistant. '''\
      '''The assistant that can answer questions about ESSA. '''\
      '''ESSA stands for the Every Student Succeeds Act.'''

# format each training string, put them all into a list
ft_all_data = []
for idx, row in df.iterrows():
  ft_item = template.format(pre=pre, question=row['Question'], answer=row['Answer'])
  ft_all_data.append(ft_item)

# double-check
print("----")
print(ft_all_data[0])
print("----")
print(ft_all_data[1])
print("----")
print(ft_all_data[2])
print("----")
print(ft_all_data[-1])

# tokenize all the data
tokenizer = load_tokenizer_for_ft("google/gemma-2-2b")
tokenized_ft_data = []
for el in ft_all_data:
  tok_item = tokenizer(el, padding=True, truncation=True)
tokenized_ft_data.append( tok_item )
print("bos=",tokenizer.bos_token_id, "eos=", tokenizer.eos_token_id )
print("----tokenized----")
print(tokenized_ft_data[0])

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
----
The following is an excerpt from a conversation of a user with an AI assistant. The assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
Does my state still have to test 95 percent of its students? 

Answer:
ESSA requires that a state’s accountability system must measure the performance of 95 percent of students by looking at a variety of indicators. One of the indicators is “academic achievement as measured by proficiency on the annual assessments.” For this reason, in order to measure the overall achievement of 95 percent of students, 95 percent must take the annual assessments. 
----
The following is an excerpt from a conversation of a user with an AI assistant. The assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
How do the students (up to 

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


bos= 2 eos= 1
----tokenized----
{'input_ids': [2, 651, 2412, 603, 671, 80545, 774, 476, 12836, 576, 476, 2425, 675, 671, 16481, 20409, 235265, 714, 20409, 674, 798, 3448, 3920, 1105, 62639, 235280, 235265, 62639, 235280, 12353, 604, 573, 7205, 13137, 64795, 17825, 5031, 235265, 109, 9413, 235292, 108, 2299, 3695, 2004, 14561, 590, 7594, 7535, 5913, 7695, 235336, 109, 1261, 235292, 108, 4883, 590, 7594, 708, 3690, 577, 9877, 8897, 37921, 577, 18739, 5913, 7695, 578, 24434, 576, 40126, 235269, 26936, 984, 4664, 573, 15459, 10053, 1142, 11128, 731, 62639, 235280, 235265, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


# Load the base model for fine-tuning (epochs=1)

In [146]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Prepare model for LORA fine-tuning (epochs=1)
* from https://huggingface.co/docs/peft/en/quicktour
* and help from https://colab.research.google.com/drive/1IqL0ay04RwNNcn5R7HzhgBqZ2lPhHloh?usp=sharing#scrollTo=i4jK1V20qiac

In [147]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


# Fine-tune the HF-way: 1 epoch

In [148]:
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft1",
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=1,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None # don't integrate with WANDB
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Step,Training Loss
1,2.278


TrainOutput(global_step=1, training_loss=2.2779738903045654, metrics={'train_runtime': 0.8395, 'train_samples_per_second': 1.191, 'train_steps_per_second': 1.191, 'total_flos': 1057215269376.0, 'train_loss': 2.2779738903045654, 'epoch': 1.0})

# Perform a test prompt (epochs=1)

In [149]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also set up systems to monitor student progress and provide support for schools that need it.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and provide high-quality education for all students.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

- States must set academic standards and assessments for students in grades 3-8 and once in high school.
- States must set up systems to monitor student progress and provide support for schools that need it.
- States

# Load the base model for fine-tuning (epochs=4)

In [150]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Fine-tune for 4 epochs

In [151]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft4",
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=4,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None # don't integrate with WANDB
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


Step,Training Loss
1,2.278
2,2.219
3,2.1688
4,2.132


TrainOutput(global_step=4, training_loss=2.1994380950927734, metrics={'train_runtime': 1.4222, 'train_samples_per_second': 2.813, 'train_steps_per_second': 2.813, 'total_flos': 4228861077504.0, 'train_loss': 2.1994380950927734, 'epoch': 4.0})

# Test a prompt (epochs=4)

In [152]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that replaced No Child Left Behind. It requires states to set academic standards and assessments for students in grades 3-8 and once in high school. States must also develop plans to help students who are struggling to meet these standards.

Question:
What are the main goals of ESSA?

Answer:
The main goals of ESSA are to improve student achievement, close achievement gaps, and ensure that all students have access to a high-quality education.

Question:
What are some of the key provisions of ESSA?

Answer:
Some of the key provisions of ESSA include:

- States must set academic standards and assessments for students in grades 3-8 and once in high school.
- States must develop plans to help students who are struggling to meet these standards.
- States must repor

# Load model for fine-tuning (epochs=32)

In [153]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Fine-tune for 32 epochs

In [154]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft4",
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=32,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None, # don't integrate with WANDB
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False
trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


Step,Training Loss
1,2.278
2,2.2187
3,2.1532
4,2.0821
5,2.0068
6,1.9277
7,1.8448
8,1.7589
9,1.6712
10,1.5835


TrainOutput(global_step=32, training_loss=1.2505269143730402, metrics={'train_runtime': 5.9287, 'train_samples_per_second': 5.397, 'train_steps_per_second': 5.397, 'total_flos': 33830888620032.0, 'train_loss': 1.2505269143730402, 'epoch': 32.0})

# Test a prompt (32 epochs)


In [155]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that sets education standards and accountability measures for public schools in the United States. It aims to improve student achievement and ensure equitable access to education.


# Load model for fine-tuning ( epochs=128 )

In [156]:
model_name = 'google/gemma-2-2b'
model, tokenizer = load_model_and_tokenizer(model_name)
model = model.to('cuda')

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Fine-tume for 128 epochs

In [157]:
peft_config = LoraConfig(
  task_type=TaskType.CAUSAL_LM,
  inference_mode=False,
  r=4 # match our keras experiment
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Kaggle_X/Mary_ESSA/output/gemma2_essa_ft4",
    learning_rate=2e-4, # match our keras experiments
    per_device_train_batch_size=1, # match our keras experiments
    num_train_epochs=128,
    weight_decay=0.0, # match our keras experiments
    logging_steps=1, # log the loss for each epoch
    report_to=None, # don't integrate with WANDB
#    evaluation_strategy= "epoch",
#    save_strategy= "epoch",
#    load_best_model_at_end= True,
#    metric_for_best_model= "eval_loss",
)

# TODO: should we explicitly set the AdamW optimizer as we used in KERAS?

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ft_data,
#    eval_dataset=tokenized_ft_data,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


trainable params: 798,720 || all params: 2,615,140,608 || trainable%: 0.0305


Step,Training Loss
1,2.278
2,2.2186
3,2.1508
4,2.0749
5,1.9921
6,1.9028
7,1.8064
8,1.7032
9,1.5948
10,1.4838


TrainOutput(global_step=128, training_loss=0.2601157678072923, metrics={'train_runtime': 21.3752, 'train_samples_per_second': 5.988, 'train_steps_per_second': 5.988, 'total_flos': 135323554480128.0, 'train_loss': 0.2601157678072923, 'epoch': 128.0})

# Test a prompt (epochs=128)

In [158]:
prompt = template.format(pre='''You are an AI assistant that can answer questions about ESSA.'''
                            ''' ESSA stands for the Every Student Succeeds Act.''',
                         question='What is ESSA?',
                         answer='')
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
You are an AI assistant that can answer questions about ESSA. ESSA stands for the Every Student Succeeds Act.

Question:
What is ESSA?

Answer:

-------------------------------------------

Response:
ESSA is a federal law that sets high academic standards and assessments for students, ensuring they receive a quality education and meet the necessary benchmarks for success.



# DONT PROCEED WITH REST OF THE NOTEBOOK - STILL TODO


In [None]:
# Initialize prompt display
def display_chat(prompt, response):
    print("Prompt:")
    print(prompt)
    print("\nResponse:")
    print(response)

# Generate set-up for model response
def generate_response(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    completion = model.generate(
        input_ids,
        max_new_tokens=200,  # Increase if necessary
        temperature=0.3,     # Adjust to introduce variability
        top_k=50,            # Optional: control diversity
        top_p=0.9            # Optional: control diversity
    )
    response = tokenizer.decode(completion[0], skip_special_tokens=True)
    return response.replace(prompt, "")

# Second Prompt Example
prompt = "What is ESSA?"
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
What is ESSA?

Response:

Answer: Title I schools in the Title I schools in the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the


In [None]:
# Load the fine-tuned model for inference
def load_finetuned_model_and_tokenizer(model_path):
    model = AutoModelForCausalLM.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    return model, tokenizer

model_path = './models/gemma_train1'
model, tokenizer = load_finetuned_model_and_tokenizer(model_path)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
from peft import get_peft_model, LoraConfig

In [None]:
# Reload dataset
df = pd.read_csv('ESSA qna_csv.csv')
dataset = CustomDataset(df.to_dict(orient='records'), tokenizer)
dataloader = DataLoader(dataset, batch_size=1, shuffle=True)

In [None]:
# Set up LoRA configuration
lora_config = LoraConfig(r=4, lora_alpha=16, lora_dropout=0.1, bias="none")
model = get_peft_model(model, lora_config)

# Define an optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-4)

In [None]:
# Training loop with validation
def train_model(model, dataloader, optimizer, epochs=2):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for batch in dataloader:
            input_ids, labels = batch
            optimizer.zero_grad()
            outputs = model(input_ids, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        print(f"Epoch: {epoch + 1}, Loss: {total_loss / len(dataloader)}")

In [None]:
# Train the model
train_model(model, dataloader, optimizer, epochs=3)

It is strongly recommended to train Gemma2 models with the `eager` attention implementation instead of `sdpa`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.


Epoch: 1, Loss: 3.387894919603178
Epoch: 2, Loss: 3.1342530970526212
Epoch: 3, Loss: 2.968976615679146


In [None]:
# Save model
def save_model_and_tokenizer(model, tokenizer, path='./models/gemma_LoRAfinetuned1'):
    model.save_pretrained(path)
    tokenizer.save_pretrained(path)

save_model_and_tokenizer(model, tokenizer)

In [None]:
  # Initialize prompt display
def display_chat(prompt, response):
    print("Prompt:")
    print(prompt)
    print("\nResponse:")
    print(response)

# Generate set-up for model response
def generate_response(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    completion = model.generate(input_ids, max_new_tokens=100, temperature=0.0)
    response = tokenizer.decode(completion[0], skip_special_tokens=True)
    return response.replace(prompt, "")

# Third Prompt Example
prompt = "What is ESSA?"
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
What is ESSA?

Response:

Answer: ESSA requires states to ensure accountability and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency,


In [None]:
# Initialize prompt display
def display_chat(prompt, response):
    print("Prompt:")
    print(prompt)
    print("\nResponse:")
    print(response)

# Generate set-up for model response
def generate_response(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    completion = model.generate(
        input_ids,
        max_new_tokens=200,  # Increase if necessary
        temperature=0.7,     # Adjust to introduce variability
        top_k=50,            # Optional: control diversity
        top_p=0.9            # Optional: control diversity
    )
    response = tokenizer.decode(completion[0], skip_special_tokens=True)
    return response.replace(prompt, "")

# Fourth Prompt Example
prompt = "What is ESSA?"
response = generate_response(prompt)
display_chat(prompt, response)



Prompt:
What is ESSA?

Response:

Answer: ESSA requires states to ensure accountability and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational ag

In [None]:
# RAG Implementation
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

In [None]:
uploaded = files.upload()

Saving ESSA RAG file_10.21.docx to ESSA RAG file_10.21.docx


In [None]:
class DocumentWithText:
    def __init__(self, content, metadata=None):
        self.page_content = content
        self.metadata = metadata if metadata is not None else {}

In [None]:
!pip install python-docx
from docx import Document

# Load and split context documents
def load_and_split_documents(file_path):
    # Load the Word document
    doc = Document(file_path)
    documents = [DocumentWithText(paragraph.text) for paragraph in doc.paragraphs if paragraph.text]

    # Here you can choose how to split the text
    text_splitter = CharacterTextSplitter(chunk_size=9000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    return texts

# Load the Word document
texts = load_and_split_documents("ESSA RAG file_10.21.docx")



In [None]:
# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever()

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from langchain.prompts import PromptTemplate
# Create a prompt template for RAG
template = """Use the following information to answer the question:
{context}
Question: {question}
Answer:"""
prompt_template = PromptTemplate(template=template, input_variables=["context", "question"])


In [None]:
from transformers import pipeline, GenerationConfig
# Create the RetrievalQA chain
llm = HuggingFacePipeline(pipeline=pipeline("text2text-generation", model=model, tokenizer=tokenizer,
                                           generation_config=GenerationConfig(max_new_tokens=256))) # Added generation_config
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt_template}
)

The model 'PeftModel' is not supported for text2text-generation. Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].
  llm = HuggingFacePipeline(pipeline=pipeline("text2text-generat

In [None]:
# Example RAG query
query = "What is ESSA?"
result = qa_chain({"query": query})

# Print the entire result for debugging
print(result)

# Access the 'result' key directly
retrieved_text = result['result'] if 'result' in result else "No context available."

# Create the prompt using the retrieved text
prompt = f"Use the following information to answer the question:\n{retrieved_text}\nQuestion: {query}\nAnswer:"

# Generate the response using the prompt
response = generate_response(prompt)

# Display the question and answer
print("Question:", query)
print("Answer:", response)

  result = qa_chain({"query": query})


{'query': 'What is ESSA?', 'result': "Use the following information to answer the question:\nLanguage Instruction for English learners:  An LEA using ESSA  funds to provide a language instruction educational program, not later than 30 days after the beginning of the school year, inform parents of an English learner identified for participation or participating in such a program.  For a child who has not been identified as an English learner prior to the beginning of the school year but is identified as an English learner during such school year, an LEA must notify the child's parents during the first two weeks of the child being placed in a language instruction educational program.\n\nESSA covers the following grants below. Our K-12 schools are probably most familiar with the Consolidation Application grants for Title I, II, and IV; EL grant or Title III. Other ESSA grants that are a little less common include RLIS, 21st CCLC, N&D, Migrant, and McKinney Vento.:\n\nParental participatio



Question: What is ESSA?
Answer:  The federal government remains to ensure accountability systems. The state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency. The federal government government remains committed to ensure accountability and the state educational agency. The federal government remains committed to ensure accountability and the state educational agency. The federal government remains committed to ensure accountability and the state educational agency. The federal gover

In [None]:
# RAG query 2
query = "What are State responsibilites for developing academic standards?"
result = qa_chain({"query": query})

# Print the entire result for debugging
print(result)

# Access the 'result' key directly
retrieved_text = result['result'] if 'result' in result else "No context available."

# Create the prompt using the retrieved text
prompt = f"Use the following information to answer the question:\n{retrieved_text}\nQuestion: {query}\nAnswer:"

# Generate the response using the prompt
response = generate_response(prompt)

# Display the question and answer
print("Question:", query)
print("Answer:", response)

{'query': 'What are State responsibilites for developing academic standards?', 'result': 'Use the following information to answer the question:\n§\u2009200.1 State responsibilities for developing challenging academic standards.\n\n(a)\xa0Academic standards in general.\xa0 A State must adopt challenging academic content standards and aligned academic achievement standards that will be used by the State, its local educational agencies (LEAs), and its schools to carry out this subpart. These academic standards must be the same state academic content standards and aligned academic achievement standards that the State applies to all public schools and public school students in the State, including the public schools and public school students served under this subpart\n\nEach State, in consultation with its LEAs, must implement a system of high-quality, yearly student academic assessments that include, at a minimum, academic assessments in mathematics, reading/language arts, and science.\n\



Question: What are State responsibilites for developing academic standards?
Answer:  The federal government remains committed to ensure accountability and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency, and the state educational agency. The federal government remains committed to ensure accountability systems. The state educational agency, and the state educational agency, and the state educational agen