<a href="https://colab.research.google.com/github/sayid-alt/eleutherai-finetuned-nvidia-faq-llm/blob/main/Training_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Portfolio - Final Project

**Instructions for Students:**

Please carefully follow these steps to complete and submit your assignment:

1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
   
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
   
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

**Description:**

Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.

You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.

To get you started, here are some ideas:

1. **Sentiment Analysis Application:** Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.

2. **Chatbot:** Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.

3. **Predictive Text Application:** Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.

4. **Image Classification Application:** Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.

5. **News Article Classifier:** Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.

6. **Recommendation System:** Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.

7. **Plant Disease Detection:** Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.

8. **Facial Expression Recognition:** Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.

9. **Chest X-Ray Interpretation:** Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.

10. **Food Classification:** Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.

11. **Traffic Sign Recognition:** Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.

**Submission:**

Please upload both your model and application to Huggingface or your own Github account for submission.

**Presentation:**

You are required to create a presentation to showcase your project, including the following details:

- The objective of your model.
- A comprehensive description of your model.
- The specific metrics used to measure your model's effectiveness.
- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
- An explanation of the methodology used in developing the model.
- A discussion on challenges faced, how they were handled, and your learnings from those.
- Suggestions for potential future improvements to the model.
- A functioning link to a demo of your model in action.

**Grading:**

Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.

Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!


In [1]:
# @title #### Student Identity
student_id = "" # @param {type:"string"}
name = "" # @param {type:"string"}
drive_link = "https://drive.google.com/drive/folders/14JDulM3A92jwGTqiJ_chnXmrsfhZd8Lt?usp=sharing"  # @param {type:"string"}
assignment_id = "00_portfolio_project"

## Installation and Import `rggrader` Package

In [None]:
%pip install rggrader
from rggrader import submit_image
from rggrader import submit

Collecting rggrader
  Downloading rggrader-0.1.6-py3-none-any.whl.metadata (485 bytes)
Downloading rggrader-0.1.6-py3-none-any.whl (2.5 kB)
Installing collected packages: rggrader
Successfully installed rggrader-0.1.6


## Working Space

In [3]:
!pip install transformers datasets accelerate -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/491.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.2/491.2 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/183.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.9/183.9 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [6]:
import pandas as pd
import requests
import zipfile
import os
import torch
import tensorflow as tf

from pprint import pprint
from datasets import Dataset, load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
# from transformers import GPTNeoXForCausalLM, AutoTokenizer

from datasets import load_dataset
import logging
import torch
import wandb
from transformers import TrainingArguments, Trainer
from transformers import AutoTokenizer, AutoModelForCausalLM



In [7]:

#@title **Load Pretrained Model**
pretrained_model = 'EleutherAI/pythia-1b'
tokenizer = AutoTokenizer.from_pretrained(pretrained_model)
base_model = AutoModelForCausalLM.from_pretrained(pretrained_model)

dataset_hf_name = f"nvidia-faq-{pretrained_model.split('/')[0].lower()}-fine-tuned"

# @title Setup Training
model_finetuned_name = f"{pretrained_model.split('/')[0]}-{pretrained_model.split('/')[1]}-finetuned-nvidia-faq"
output_dir = model_finetuned_name

print(f'Finetuned Model Name: {model_finetuned_name}')
print(f'dataset hf Name: {dataset_hf_name}')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/396 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/569 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.09G [00:00<?, ?B/s]

Finetuned Model Name: EleutherAI-pythia-1b-finetuned-nvidia-faq
dataset hf Name: nvidia-faq-eleutherai-fine-tuned


In [None]:
# @title **Logging To Hugging Face**
!pip install huggingface_hub

from huggingface_hub import notebook_login

# login to hugging face
notebook_login()



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [9]:
# @title **Load Data**
def load_nvidia_faq_data(url, zip_path='nvidia_faq.zip', extract_dir='nvidia_faq'):
    # Download the ZIP file from the URL
    response = requests.get(url)
    with open(zip_path, 'wb') as f:
        f.write(response.content)
    print(f"Downloaded {zip_path}")

    # Unzip the file
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_dir)
    print(f"Extracted to {extract_dir}")

    # Find the CSV file inside the extracted folder
    csv_files = [f for f in os.listdir(extract_dir) if f.endswith('.csv')]
    if not csv_files:
        raise FileNotFoundError("No CSV file found in the extracted content.")

    # Load the first CSV file found
    csv_path = os.path.join(extract_dir, csv_files[0])
    data = pd.read_csv(csv_path)
    print(f"Loaded data from {csv_path}")

    return data

# URL to the Nvidia FAQ zip file (replace with the actual URL)
url = 'https://github.com/sayid-alt/eleutherai-finetuned-nvidia-faq-llm/raw/main/datasets/NvidiaDocumentationQandApairs.zip'

dataset = load_nvidia_faq_data(url)
dataset = dataset[['question', 'answer']]
display(dataset)

Downloaded nvidia_faq.zip
Extracted to nvidia_faq
Loaded data from nvidia_faq/NvidiaDocumentationQandApairs.csv


Unnamed: 0,question,answer
0,What is Hybridizer?,Hybridizer is a compiler from Altimesh that en...
1,How does Hybridizer generate optimized code?,Hybridizer uses decorated symbols to express p...
2,What are some parallelization patterns mention...,The text mentions using parallelization patter...
3,How can you benefit from accelerators without ...,You can benefit from accelerators' compute hor...
4,What is an example of using Hybridizer?,An example in the text demonstrates using Para...
...,...,...
7103,What is the focus of the GTC event in 2015?,The focus of the GTC event in 2015 is GPU code...
7104,How were the main changes made to the code for...,"The main changes included merging kernels, reg..."
7105,What are some key fields in the cudaDeviceProp...,"Some key fields include name, memoryClockRate,..."
7106,What did changing the kernel approach achieve ...,Changing the kernel approach reduced the itera...


### **Data Preparation**

In [None]:
# @title **Preparing Finetuning Dataset**
# prompt template
prompt_template = """### Question:
{question}

### Answer:"""

# array for storing question answer data
finetuning_dataset = []
for i in range(len(dataset)):
  question = dataset.iloc[i]['question']
  answer = dataset.iloc[i]['answer']
  text_with_prompt_template = prompt_template.format(question=question)
  finetuning_dataset.append({
      "question": text_with_prompt_template,
      "answer": answer
  })

finetuning_dataset = Dataset.from_list(finetuning_dataset)
finetuning_dataset

Dataset({
    features: ['question', 'answer'],
    num_rows: 7108
})

In [None]:
sample_text = finetuning_dataset['question'][0] + finetuning_dataset['answer'][0]
sample_tokenized = tokenizer(sample_text, return_tensors='pt')
sample_tokenized['input_ids'][0]

tensor([ 4118, 19782,    27,   187,  1276,   310, 42813,  6081,    32,   187,
          187,  4118, 37741,    27, 17151,  7807,  6081,   310,   247, 17963,
          432,  1219,  3181,    73,   326, 13276, 10717, 24720,    84,   285,
        17308,  2392,   970,   330,     4,  2127,   390,   964, 11502, 14184,
           15])

In [None]:
# @title Tokenize Dataset

def tokenize_function(examples):
  text = examples["question"][0] + examples["answer"][0]

  tokenizer.pad_token = tokenizer.eos_token
  tokenizer.truncation_side = 'left'
  tokenized_input = tokenizer(
      text,
      padding='max_length',
      truncation=True,
      max_length=512,
      return_tensors='pt'
  )

  return tokenized_input


# tokenize dataset
tokenized_dataset = finetuning_dataset.map(
    lambda x: tokenize_function(x),
    batched=True,
    batch_size=1,
    drop_last_batch=True,
    # remove_columns=['question', 'answer']
)

Map:   0%|          | 0/7108 [00:00<?, ? examples/s]

In [None]:
tokenized_dataset = tokenized_dataset.add_column("labels", tokenized_dataset["input_ids"])

In [None]:
# @title Split Dataset
split_dataset = tokenized_dataset.train_test_split(test_size=0.2, seed=25)
train_dataset = split_dataset['train']
test_dataset = split_dataset['test']

train_dataset, test_dataset

(Dataset({
     features: ['question', 'answer', 'input_ids', 'attention_mask', 'labels'],
     num_rows: 5686
 }),
 Dataset({
     features: ['question', 'answer', 'input_ids', 'attention_mask', 'labels'],
     num_rows: 1422
 }))

In [None]:
# check if all size inputs are the same length
len(train_dataset['input_ids'][5]) == len(train_dataset['input_ids'][10])

True

In [None]:
example_encoded = train_dataset['input_ids'][0]
example_decoded = tokenizer.decode(example_encoded, skip_special_tokens=True)

print(example_encoded, '\n', example_decoded)

[4118, 19782, 27, 187, 7371, 10717, 11515, 403, 4516, 407, 10672, 267, 11804, 32, 187, 187, 4118, 37741, 27, 49886, 267, 11804, 8525, 247, 4618, 2491, 273, 10717, 11515, 13, 1690, 8595, 13, 16872, 13, 13814, 13, 21521, 13, 416, 13, 285, 625, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

In [None]:
# pushing
split_dataset.push_to_hub(dataset_hf_name)

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/6 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/paacamo/nvidia-faq-eleutherai-fine-tuned/commit/c9e51b89911f622a5edffd1d0fdcc19c340d401a', commit_message='Upload dataset', commit_description='', oid='c9e51b89911f622a5edffd1d0fdcc19c340d401a', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/paacamo/nvidia-faq-eleutherai-fine-tuned', endpoint='https://huggingface.co', repo_type='dataset', repo_id='paacamo/nvidia-faq-eleutherai-fine-tuned'), pr_revision=None, pr_num=None)

### Training Data

In [None]:
!pip install wandb -q

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
# # !wandb login

# # Login to wandb using kaggle notebook
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# secret_value_0 = user_secrets.get_secret("wandb_api_key")

In [None]:
%env WANDB_PROJECT=eleutherai-nvidia-faq-fine-tuned
%env WANDB_WATCH=true
%env WANDB_LOG_MODEL=end

env: WANDB_PROJECT=eleutherai-nvidia-faq-fine-tuned
env: WANDB_WATCH=true
env: WANDB_LOG_MODEL=end


In [None]:
# @title Load Dataset
dataset_path_hf = f'paacamo/{dataset_hf_name}'
dataset = load_dataset(dataset_path_hf)

train_dataset = dataset['train'].map(remove_columns=(['question', 'answer'])) #use this for deleted some columns
test_dataset = dataset['test'].map(remove_columns=(['question', 'answer']))

train_dataset, test_dataset

README.md:   0%|          | 0.00/556 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.98M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/506k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5686 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1422 [00:00<?, ? examples/s]

Map:   0%|          | 0/5686 [00:00<?, ? examples/s]

Map:   0%|          | 0/1422 [00:00<?, ? examples/s]

(Dataset({
     features: ['input_ids', 'attention_mask', 'labels'],
     num_rows: 5686
 }),
 Dataset({
     features: ['input_ids', 'attention_mask', 'labels'],
     num_rows: 1422
 }))

In [None]:
!nvidia-smi

Sat Apr  5 22:59:02 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P0             26W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
device_count = torch.cuda.device_count()
print(device_count)
if device_count > 0:
  device = torch.device("cuda")
else:
  device = torch.device("cpu")

base_model.to(device)

1


GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 2048)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-15): 16 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXSdpaAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
          (dense): Linear(in_features=2048, out_features=2048, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
          (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True

In [None]:
training_args = TrainingArguments(
    # Learning Rate
    learning_rate=2e-5,

    remove_unused_columns=False,

    # Epochs
    num_train_epochs=2,

    # Batch Trainig Size
    per_device_train_batch_size=8,

    output_dir=output_dir,

    # max_steps=10,

    # Other arguments
    overwrite_output_dir=False, # Overwrite the content of the output directory
    disable_tqdm=False, # Disable progress bars
    eval_steps=100, # Number of update steps between two evaluations
    save_steps=100, # After # steps model is saved
    warmup_steps=1, # Number of warmup steps for learning rate scheduler
    per_device_eval_batch_size=8, # Batch size for evaluation
    save_strategy='steps',
    eval_strategy="steps",
    logging_strategy="steps",
    logging_steps=1,
    optim="adafactor",
    gradient_accumulation_steps = 1,
    gradient_checkpointing=False,

    # Parameters for early stopping
    load_best_model_at_end=True,
    save_total_limit=1,
    metric_for_best_model="eval_loss",
    greater_is_better=False,

    push_to_hub=True,
    report_to='wandb',
    run_name=model_finetuned_name
)

In [None]:
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

In [None]:
# @title Trainer
from transformers import DataCollatorWithPadding

tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True)
trainer = Trainer(
    model=base_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    processing_class=tokenizer,
    data_collator=data_collator
)


In [None]:
trainer.train()

Step,Training Loss,Validation Loss
100,0.256,0.21158
200,0.2217,0.199945
300,0.1473,0.188164
400,0.1512,0.181644
500,0.1611,0.17632
600,0.1395,0.170005
700,0.1552,0.164761
800,0.0864,0.171515
900,0.1144,0.168943
1000,0.1396,0.164494


TrainOutput(global_step=1422, training_loss=0.13799777907828908, metrics={'train_runtime': 8369.0955, 'train_samples_per_second': 1.359, 'train_steps_per_second': 0.17, 'total_flos': 3.174730077044736e+16, 'train_loss': 0.13799777907828908, 'epoch': 2.0})

In [None]:
# @title Save model
save_dir = f'{output_dir}/final'
trainer.save_model(save_dir)
print(f'model saved to {save_dir}')

In [None]:
trainer.evaluate()

## **Inference & Evaluation**

In [None]:
# @title load Fine-Tuned Model
device = "cuda" if torch.cuda.is_available() else "cpu"
finetuned_model_name = f'paacamo/{output_dir}'

finetuned_model = AutoModelForCausalLM.from_pretrained(finetuned_model_name)
tokenizer = AutoTokenizer.from_pretrained(finetuned_model_name)
tokenizer.pad_token = tokenizer.eos_token
finetuned_model.to(device)

config.json:   0%|          | 0.00/800 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.05G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/4.88k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 2048)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-15): 16 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
          (dense): Linear(in_features=2048, out_features=2048, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
          (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((2048,), eps=1e-05, 

In [None]:
def inference(text, model, tokenizer, max_input_token=1000, max_output_token=500):
  # Tokenize
  tokenizer.truncation_side = 'left'
  input_ids = tokenizer.encode(
      text,
      return_tensors='pt',
      padding=True,
      truncation=True,
      max_length=max_input_token
  )

  # generate
  device = model.device
  output_ids = finetuned_model.generate(
      input_ids=input_ids.to(device),
      max_length=max_output_token
  )

  # decode
  decoded_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)

   # Strip the prompt
  generated_text_answer = decoded_output[len(text):]
  return generated_text_answer



In [None]:
from tqdm import tqdm
from pprint import pprint

dataset_infer = load_dataset("paacamo/"+dataset_hf_name, split='test')

text = dataset_infer['question'][90]
answer = dataset_infer['answer'][90]

print(f'question: {text}')
predictions = {
    'answer': answer,
    'prediction': inference(text, finetuned_model, tokenizer)
}

pprint(predictions)


README.md:   0%|          | 0.00/556 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.98M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/506k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5686 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1422 [00:00<?, ? examples/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


question: ### Question:
What is the purpose of the Thrust library in CUDA 7?

### Answer:
{'answer': 'The Thrust library in CUDA 7 provides efficient and composable '
           'parallel algorithms that operate on vector containers. It brings a '
           'familiar abstraction layer similar to the C++ Standard Template '
           'Library to the realm of parallel computing.',
 'prediction': 'The Thrust library in CUDA 7 provides a collection of parallel '
               'algorithms and data structures that operate on sequences and '
               'containers, enabling efficient parallel processing.'}


In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model=finetuned_model_name)
pipe("what is the purpose of using CUDA rather than cpu?")

Device set to use cpu


[{'generated_text': 'what is the purpose of using CUDA rather than cpu?\n: Answer:CUDA is used to accelerate applications on GPUs, and the CUDA runtime'}]

## Interface

In [None]:
!pip install gradio langchain langchain-core langchain langchain_huggingface langchain-community langchain_google_genai python-dotenv -q

In [None]:
# Write your code here
# Feel free to add new code block as needed

import gradio as gr
from transformers import pipeline


def chatbot(question):
    """
    This function takes a question as input and returns the chatbot's response.
    """
    pipe = pipeline('text-generation', model=finetuned_model_name)
    response = pipe(question)[0]['generated_text']
    return response


# Create the Gradio interface
iface = gr.Interface(
    fn=chatbot,
    inputs="text",
    outputs="text",
    title="Simple Chatbot with Langchain and Gradio",
    description="Ask me anything!",
)

# Launch the Gradio interface
iface.launch(debug=True)

## Submit Notebook

In [None]:
portfolio_link = ""
presentation_link = ""

question_id = "01_portfolio_link"
submit(student_id, name, assignment_id, str(portfolio_link), question_id, drive_link)

question_id = "02_presentation_link"
submit(student_id, name, assignment_id, str(presentation_link), question_id, drive_link)

# FIN