# Overview:

---


This project implements a question answering (QA) system using Retrieval-Augmented Generation (RAG) language models. It integrates document retrieval capabilities to enhance the accuracy of answers provided by leveraging context from a knowledge source. The system showcases advanced natural language understanding and generation techniques, demonstrating the capability to handle complex queries and provide informative responses based on retrieved documents.



In [None]:
!pip install opendatasets datasets accelerate rouge_score --quiet
import opendatasets as od
od.download("https://www.kaggle.com/datasets/gondimalladeepesh/nvidia-documentation-question-and-answer-pairs")

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/536.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m532.5/536.7 kB[0m [31m20.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/280.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
Please pro

100%|██████████| 400k/400k [00:00<00:00, 568kB/s]







In [None]:
import pandas as pd
from datasets import Dataset
import datasets
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import re

device = 'cuda' if torch.cuda.is_available() else 'cpu' # detect the GPU if any, if not use CPU, change cuda to mps if you have a mac

**Data Acquisition:**
Acquired question-answer pairs dataset from NVIDIA documentation using Kaggle datasets. Used opendatasets to download and preprocess the dataset.


In [None]:
data = pd.read_csv("/content/nvidia-documentation-question-and-answer-pairs/NvidiaDocumentationQandApairs.csv")[["question", "answer"]]
print(data.shape)
data.head()

(7108, 2)


Unnamed: 0,question,answer
0,What is Hybridizer?,Hybridizer is a compiler from Altimesh that en...
1,How does Hybridizer generate optimized code?,Hybridizer uses decorated symbols to express p...
2,What are some parallelization patterns mention...,The text mentions using parallelization patter...
3,How can you benefit from accelerators without ...,You can benefit from accelerators' compute hor...
4,What is an example of using Hybridizer?,An example in the text demonstrates using Para...


**Data Cleaning and Preprocessing:**
Implemented a cleaning function to lowercase and remove special characters from questions and answers. Split the dataset into training, validation, and testing sets for model training and evaluation.

In [None]:
def clean_text(text):
  text = text.lower()
  text = re.sub('[^A-Za-z0-9]+', ' ', text)
  return text

data['question'] = data['question'].apply(clean_text)
data['answer'] = data['answer'].apply(clean_text)
data.head()

Unnamed: 0,question,answer
0,what is hybridizer,hybridizer is a compiler from altimesh that en...
1,how does hybridizer generate optimized code,hybridizer uses decorated symbols to express p...
2,what are some parallelization patterns mention...,the text mentions using parallelization patter...
3,how can you benefit from accelerators without ...,you can benefit from accelerators compute hors...
4,what is an example of using hybridizer,an example in the text demonstrates using para...


**Model Training:**
Defined training parameters (TrainingArguments) such as batch size, learning rate, and number of epochs. Utilized Trainer from Hugging Face Transformers to train the model on the tokenized datasets.

In [None]:
train=data.sample(frac=0.7,random_state=7) # Create training of 70% of the data
test=data.drop(train.index) # Create testing by removing the 70% of the train data which will result in 30%

val=test.sample(frac=0.5,random_state=7) # Create validation of 50% of the testing data
test=test.drop(val.index) # Create testing by removing the 50% of the validation data which will result in 50%

In [None]:
print("Training Shape: ", train.shape)
print("Validation Shape: ", val.shape)
print("Testing Shape: ", test.shape)

Training Shape:  (4976, 2)
Validation Shape:  (1066, 2)
Testing Shape:  (1066, 2)


In [None]:
def get_model_tokenizer(model_name = "google/flan-t5-base"):
  original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.float32)
  tokenizer = AutoTokenizer.from_pretrained(model_name)

  return original_model, tokenizer

In [None]:
model, tokenizer = get_model_tokenizer()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [None]:
rouge_metric = datasets.load_metric("rouge")

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

In [None]:
def tokenize_function(example):
    start_prompt = '\n\n'
    end_prompt = '\n\nAnswer: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["question"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt", max_length=200).input_ids
    example['labels'] = tokenizer(example["answer"], padding="max_length", truncation=True, return_tensors="pt",max_length=200).input_ids

    return example

In [None]:
train_data = Dataset.from_pandas(train)
train_tokenized_datasets = train_data.map(tokenize_function, batched=True)
train_tokenized_datasets = train_tokenized_datasets.remove_columns(['question', 'answer',])


val_data = Dataset.from_pandas(val)
val_tokenized_datasets = val_data.map(tokenize_function, batched=True)
val_tokenized_datasets = val_tokenized_datasets.remove_columns(['question', 'answer',])


test_data = Dataset.from_pandas(test)
test_tokenized_datasets = test_data.map(tokenize_function, batched=True)
test_tokenized_datasets = test_tokenized_datasets.remove_columns(['question', 'answer',])

Map:   0%|          | 0/4976 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

In [None]:
EPOCHS = 1
LR = 1e-4
BATCH_SIZE = 2

training_path = "./training_path_nvidia_chatbot"

training_args = TrainingArguments(
    output_dir = training_path,
    overwrite_output_dir = True,
    per_device_train_batch_size = BATCH_SIZE,
    per_device_eval_batch_size = BATCH_SIZE,
    learning_rate = LR,
    num_train_epochs = EPOCHS,
    evaluation_strategy = "epoch",
    save_total_limit = 2
    )

trainer= Trainer(
    model = model,
    args = training_args,
    train_dataset = train_tokenized_datasets,
    eval_dataset = val_tokenized_datasets,
)

trainer.train()

model_path = "./nvidia-chatbot-final-model"

trainer.model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss
1,0.4863,0.43934


('./nvidia-chatbot-final-model/tokenizer_config.json',
 './nvidia-chatbot-final-model/special_tokens_map.json',
 './nvidia-chatbot-final-model/spiece.model',
 './nvidia-chatbot-final-model/added_tokens.json',
 './nvidia-chatbot-final-model/tokenizer.json')

**Evaluation and Metrics:**
Loaded the test dataset and evaluated model performance using metrics like ROUGE score. Generated predictions for sample questions to validate model effectiveness.

In [None]:
rouge_metric = datasets.load_metric("rouge")
eval_results= trainer.evaluate(
    eval_dataset = test_tokenized_datasets,
                 )

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


In [None]:
print(eval_results)

{'eval_loss': 0.43840309977531433, 'eval_runtime': 45.0175, 'eval_samples_per_second': 23.68, 'eval_steps_per_second': 11.84, 'epoch': 1.0}


**Deployment and Inference:**Saved the trained model and tokenizer for future deployment. Implemented functions for inference, allowing users to input questions and receive answers from the model.

In [None]:
test_text = "what is cuda nsight?"
trained_model, tokenizer = get_model_tokenizer(model_path)

tokenized_test_text = tokenizer(test_text,
                              return_tensors='pt')
model_output = trained_model.generate(tokenized_test_text.input_ids,
                                      generation_config=GenerationConfig(max_new_tokens=200, num_beams=1),)[0]
final_output = tokenizer.decode(model_output, skip_special_tokens=True)
print(final_output)

cuda nsight is a gpu based nsight toolkit that provides a comprehensive overview of the cuda gpu architecture and provides a comprehensive overview of the various features and capabilities of cuda nsight 


In [None]:
test_text = "what is cuda nsight?"

tokenized_test_text = tokenizer(test_text,
                              return_tensors='pt').to(device)
model_output = model.generate(tokenized_test_text.input_ids,
                                      generation_config=GenerationConfig(max_new_tokens=200, num_beams=1),)[0]
final_output = tokenizer.decode(model_output, skip_special_tokens=True)
print(final_output)

cuda nsight is a gpu based nsight toolkit that provides a comprehensive overview of the cuda gpu architecture and provides a comprehensive overview of the various features and capabilities of cuda nsight 
