# Making the Most of your Colab Subscription



## Faster GPUs

Users who have purchased one of Colab's paid plans have access to premium GPUs. You can upgrade your notebook's GPU settings in `Runtime > Change runtime type` in the menu to enable Premium accelerator. Subject to availability, selecting a premium GPU may grant you access to a V100 or A100 Nvidia GPU.

The free of charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.

You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is "Not connected to a GPU", you can change the runtime by going to `Runtime > Change runtime type` in the menu to enable a GPU accelerator, and then re-execute the code cell.


In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In order to use a GPU with your notebook, select the `Runtime > Change runtime type` menu, and then set the hardware accelerator dropdown to GPU.

## More memory

Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available.



You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via `Runtime > Change runtime type` in the menu. Then select High-RAM in the Runtime shape dropdown. After, re-execute the code cell.


In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

## Longer runtimes

All Colab runtimes are reset after some period of time (which is faster if the runtime isn't executing code). Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.

## Background execution

Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.



## Relaxing resource limits in Colab Pro

Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.



If you encounter limitations, you can relax those limitations by purchasing more compute units via Pay As You Go. Anyone can purchase compute units via [Pay As You Go](https://colab.research.google.com/signup); no subscription is required.

## Send us feedback!

If you have any feedback for us, please let us know. The best way to send feedback is by using the Help > 'Send feedback...' menu. If you encounter usage limits in Colab Pro consider subscribing to Pro+.

If you encounter errors or other issues with billing (payments) for Colab Pro, Pro+, or Pay As You Go, please email [colab-billing@google.com](mailto:colab-billing@google.com).

## More Resources

### Working with Notebooks in Colab
- [Overview of Colab](/notebooks/basic_features_overview.ipynb)
- [Guide to Markdown](/notebooks/markdown_guide.ipynb)
- [Importing libraries and installing dependencies](/notebooks/snippets/importing_libraries.ipynb)
- [Saving and loading notebooks in GitHub](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)
- [Interactive forms](/notebooks/forms.ipynb)
- [Interactive widgets](/notebooks/widgets.ipynb)

<a name="working-with-data"></a>
### Working with Data
- [Loading data: Drive, Sheets, and Google Cloud Storage](/notebooks/io.ipynb)
- [Charts: visualizing data](/notebooks/charts.ipynb)
- [Getting started with BigQuery](/notebooks/bigquery.ipynb)

### Machine Learning Crash Course
These are a few of the notebooks from Google's online Machine Learning course. See the [full course website](https://developers.google.com/machine-learning/crash-course/) for more.
- [Intro to Pandas DataFrame](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb)
- [Linear regression with tf.keras using synthetic data](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb)


<a name="using-accelerated-hardware"></a>
### Using Accelerated Hardware
- [TensorFlow with GPUs](/notebooks/gpu.ipynb)
- [TensorFlow with TPUs](/notebooks/tpu.ipynb)

<a name="machine-learning-examples"></a>

## Machine Learning Examples

To see end-to-end examples of the interactive machine learning analyses that Colab makes possible, check out these tutorials using models from [TensorFlow Hub](https://tfhub.dev).

A few featured examples:

- [Retraining an Image Classifier](https://tensorflow.org/hub/tutorials/tf2_image_retraining): Build a Keras model on top of a pre-trained image classifier to distinguish flowers.
- [Text Classification](https://tensorflow.org/hub/tutorials/tf2_text_classification): Classify IMDB movie reviews as either *positive* or *negative*.
- [Style Transfer](https://tensorflow.org/hub/tutorials/tf2_arbitrary_image_stylization): Use deep learning to transfer style between images.
- [Multilingual Universal Sentence Encoder Q&A](https://tensorflow.org/hub/tutorials/retrieval_with_tf_hub_universal_encoder_qa): Use a machine learning model to answer questions from the SQuAD dataset.
- [Video Interpolation](https://tensorflow.org/hub/tutorials/tweening_conv3d): Predict what happened in a video between the first and the last frame.


In [None]:
# Training step 1
!pip install transformers
!pip install torch
!pip install peft
!pip install datasets
!pip install accelerate
!pip install constants
!pip install prompting
!pip install trl
!pip install huggingface_hub
!pip install --upgrade urllib3
!pip install --upgrade trl
#!pip install bitsandbytes

In [None]:
# Testing step 1
!pip install transformers
!pip install torch
!pip install peft
!pip install constants
!pip install prompting

In [None]:
# Training and testing step 2

# Create a datasets directory on colab and upload the train and test csv files into it

In [None]:
# Training and testing step 3

# set train dataset size for both training and testing
train_size = 16

# set epoch size for training based on train size
NUM_EPOCHS = 10
if train_size == 16:
  NUM_EPOCHS = 20
elif train_size == 512:
  NUM_EPOCHS = 3

In [None]:
# Training and testing step 4

# load section to run before both training and testing
import torch
from peft import PeftModel
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
from constants import *
from prompting import *

def load_base_model(model_id, api_key):
    base_model = AutoModelForCausalLM.from_pretrained(
        model_id,
        use_auth_token = api_key,
        torch_dtype = torch.bfloat16,
        device_map = "auto",
        trust_remote_code = True
    )
    return base_model

def load_local_model(model_id, api_key, weight_path=None):
    model = load_base_model(model_id, api_key)
    checkpoint_weights = f"{weight_path}/"
    model = PeftModel.from_pretrained(
        model,
        checkpoint_weights,
    ).merge_and_unload()

    model.config.use_cache = True
    model.eval()

    return model

def load_tokenizer(model_id, api_key):
    tokenizer = AutoTokenizer.from_pretrained(
        model_id,
        use_auth_token = api_key,
        padding_side = "right",
        add_eos_token = True,
        add_bos_token = True,
        trust_remote_code = True
    )
    return tokenizer

In [None]:
# Training step 5

# Fine-tune section to run for training
import torch
from datasets import Dataset, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
from constants import *
import pandas as pd

# Constants for training arguments
WARMUP_STEPS = 100
BATCH_SIZE = 1
GRADIENT_ACCUMULATION = 1
OPTIMIZER = "adamw_torch"
LEARNING_RATE = 1e-4

# Logging and saving arguments
SAVE_STEPS = 10
LOGGING_STEPS = 10
LOG_DIR = "./logs"

# Sequence length
MAX_SEQUENCE_LENGTH = 150

# PEFT settings
RANK = 6
LORA_ALPHA = 16
LORA_DROPOUT = 0.1
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "out_proj"]

device = 'cpu'
if torch.cuda.is_available():
    device = torch.cuda.current_device()
elif torch.backends.mps.is_available():
    device = 'mps'

verbose = True
dataset_path = './datasets'
save_weights_to = "./saved_weights"
model_id = 'google/gemma-7b-it'
api_key = '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'

class FineTune():

    def __init__(
        self,
        device,
        save_weights_to,
        verbose=True
    ):
        self.device = device
        self.save_weights_to = save_weights_to
        self.verbose = verbose
        self.model = None
        self.tokenizer = None

    """
    This is the dataset that we'll use to finetune the model.
    Assumes the dataset is a .csv file with columnes "Question" and "Answer".
    """
    def get_dataset(self, path):
        train_file = 'car_text_train_top_' + str(train_size) + '.csv'
        df = pd.read_csv(f"{path}/{train_file}", delimiter='\t', header=None, names=['user', 'model'])
        df['text'] = df.apply(lambda row: f"User: {row['user']} Model: {row['model']}", axis=1)
        train_dataset = Dataset.from_pandas(df[['text']])

        if self.verbose: print("number of datapoints in train: ", len(train_dataset))
        return train_dataset

    """
    This loads the model, tokenizer, and peft framework
    """
    def prepare_model(self, model_id, api_key):
        # loading model
        base_model = load_base_model(model_id, api_key)
        base_model.config.use_cache = False

        # turn on if training gives OOM errors
        base_model.gradient_checkpointing_enable()

        peft_config = LoraConfig(
            r=RANK,
            lora_alpha=LORA_ALPHA,
            lora_dropout=LORA_DROPOUT,
            bias="none",
            task_type="CAUSAL_LM",
            target_modules=TARGET_MODULES
        )
        model = get_peft_model(base_model, peft_config)
        self.model = model

        # loading tokenizer
        tokenizer = load_tokenizer(model_id, api_key)
        tokenizer.pad_token = tokenizer.eos_token
        self.tokenizer = tokenizer

        return peft_config

    """
    Creates the TrainingArguments and runs SFTTrainer.
    """
    def train(self, train_dataset, peft_config):
        training_args = TrainingArguments(
            output_dir=self.save_weights_to,
            warmup_steps=WARMUP_STEPS,
            num_train_epochs=NUM_EPOCHS,
            per_device_train_batch_size=BATCH_SIZE,
            gradient_accumulation_steps=GRADIENT_ACCUMULATION,
            gradient_checkpointing=True,
            optim=OPTIMIZER,
            learning_rate=LEARNING_RATE,
            save_steps=SAVE_STEPS,
            save_strategy="steps",
            logging_dir=LOG_DIR,
            logging_steps=LOGGING_STEPS,
            push_to_hub=True,
            hub_model_id="sukara13/gemma7bcars" + str(train_size),
            hub_token='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'
        )

        trainer = SFTTrainer(
            model=self.model,
            train_dataset=train_dataset,
            peft_config=peft_config,
            max_seq_length=MAX_SEQUENCE_LENGTH,
            tokenizer=self.tokenizer,
            args=training_args,
            dataset_text_field="text"
        )

        # start commenting out when OOM
        if self.verbose: print("=> START TRAINING!")
        trainer.train()

        if self.verbose: print("=> SAVING WEIGHTS")
        try:
            trainer.save_model(self.save_weights_to)
            if self.verbose: print(f"Weights saved to {self.save_weights_to}")
        except Exception as e:
            if self.verbose: print(f"Failed to save weights: {e}")

        # Push to hub
        trainer.push_to_hub("End of training")

        """
        self.model.config.use_cache = True
        self.model.eval()
        del self.model, trainer
        if self.device == "cuda":
            torch.cuda.empty_cache()
        """

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: ", device)

# Fine-tune
FT = FineTune(
    device = device,
    save_weights_to = save_weights_to,
    verbose = True
)
train_dataset = FT.get_dataset(dataset_path)
peft_configs = FT.prepare_model(model_id, api_key)
FT.train(train_dataset, peft_configs)

In [None]:
# Training step 6

# Download the lowest loss checkpoint directory onto local machine
# Upload those files onto HF for remote testing if necessary

In [None]:
# END OF TRAINING

In [None]:
# Testing step 5

# Create a saved_weight directory and upload the model files from the checkpoint directory on local machine

In [None]:
# Testing step 6

# Load the weights from local machine for inference
model_id = 'google/gemma-7b-it'
api_key = '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'
save_weights_to = "./saved_weights"
model = load_local_model(model_id, api_key, weight_path=save_weights_to + '/checkpoint-320')
#model = load_local_model(model_id, api_key, weight_path=save_weights_to)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=api_key)

In [None]:
# Testing step 7

# Inference for a single test item from local machine to see whether tests are running fast locally
# Note that running each inference with the fine-tuned model files uploaded to colab completes in 1-2 sec
# The same inference take 20 seconds when the fine-tuned model files are on HF
from datetime import datetime

chat = [
    { "role": "user", "content": "You are an expert car evaluator. Here are the attributes of a car in text format: Buying Price is low, Maintenance Cost is high, Doors are four, Persons are more than four, Trunk Size is big, Safety Score is high. Give your recommendation to buy this car as 'unacceptable', 'acceptable', 'good', or 'very good' without providing any explanation." },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
print('encode: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=256)
print('generate: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
output = tokenizer.decode(outputs[0]).lower().replace('"', '')
print(output)
model_start = output.find("<start_of_turn>model") + len("<start_of_turn>model")
model_end = output.find("\n", model_start + 2)
model_response = output[model_start:model_end].replace('\n', '').replace('.', '')
print('response: ' + model_response)

In [None]:
# Testing step 8

# Test section to load local model from saved_weights
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
from datetime import datetime
import transformers
import torch
import pandas as pd

model_id = 'google/gemma-7b-it'
api_key = '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'
save_weights_to = "./saved_weights"
filePath = './datasets/'
fineTunedModelBase = "sukara13/gemma7bcars"

def inferenceCar(format):
    # Set counters
    count = 0
    success = 0
    fail = 0

    # Read the car test data into a pandas data frame
    testFileBase = 'car_text_test_wo_'
    if format != 'text':
        testFileBase = testFileBase.replace('text', format)
    testFile = filePath + testFileBase + str(train_size) + '.csv'
    dfTest = pd.read_csv(testFile, names=['user', 'model'], delimiter='\t')
    print(len(dfTest))
    print(dfTest.head)

    modelName = fineTunedModelBase + str(train_size)
    fileName = filePath + modelName.replace('/', '_') + '-' + format + '.csv'

    #print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    #model = load_local_model(model_id, api_key, weight_path=save_weights_to)
    #tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=api_key)
    #print('tokenizer: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    #base_model = AutoModelForCausalLM.from_pretrained(model_id, use_auth_token=api_key)
    #print('base: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

    # Load the fine-tuned model with PEFT
    #model = PeftModel.from_pretrained(base_model, model_id, use_auth_token=api_key)
    #print('peft: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

    with open(fileName, 'w') as file:
        for index, row in dfTest.iterrows():
            print(str(index) + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            user_str = row['user']
            if format == 'table':
              user_str = user_str.replace('|', '\n')
            chat = [ { "role": "user", "content": user_str } ]
            prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
            print('prompt: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            print(prompt)
            inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
            print('encode: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=120)
            print('generate: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            output = tokenizer.decode(outputs[0]).lower().replace('"', '')
            print('decode: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            print(output)

            model_start = output.find("<start_of_turn>model") + len("<start_of_turn>model")
            #model_end = output.find(".", model_start + 1)
            #llm_eval = output[model_start:model_end].replace('\n', '')
            model_end = output.find("\n", model_start + 2)
            llm_eval = output[model_start:model_end].replace('\n', '').replace('.', '')
            print('llm_eval: ' + llm_eval)

            if 'unacceptable' in llm_eval:
                llm_eval = 'unacceptable'
            elif 'acceptable' in llm_eval:
                llm_eval = 'acceptable'
            elif 'very good' in llm_eval:
                llm_eval = 'very good'
            elif 'good' in llm_eval:
                llm_eval = 'good'
            else:
                llm_eval = 'fail: ' + row['user'] + ' - ' + llm_eval
                print(llm_eval)
                fail += 1
            ground_truth = row['model']
            if llm_eval == ground_truth:
                success += 1
            count += 1
            file.write(ground_truth + ',' + llm_eval + '\n')
            print(str(count) + ": " + ground_truth + ',' + llm_eval)

    print(fileName + ': ' + str(success) + '/' + str(count) + ' = ' + "{:.2%}".format(success/count))
    print(fail)

inferenceCar('text')

In [None]:
# Testing step 9

from google.colab import files

# Specify the path to the file you want to download
file_path = './datasets/sukara13_gemma7bcars' + str(train_size) + '-text.csv'

# Download the file
files.download(file_path)

In [None]:
# Testing step 10

# Make sure the test results csv file is downloaded onto your local machine

In [None]:
# END OF TESTING

In [None]:
# Everything below here is for backup purposes
# Some of the code is to run inference by pointing to the fine-tuned model on HF remotely but was very slow
# Some others with a label OLD on top are used for quantization but didn't work properly

In [None]:
# inference for a single item from HF
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import transformers
import torch

api_key = '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'
model_id = "sukara13/gemma7bcars" + str(train_size)

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=api_key)
base_model = AutoModelForCausalLM.from_pretrained(model_id, use_auth_token=api_key)

# Load the fine-tuned model with PEFT
model = PeftModel.from_pretrained(base_model, model_id, use_auth_token=api_key)

chat = [
    { "role": "user", "content": "You are an expert car evaluator. Here are the attributes of a car in text format: Buying Price is medium, Maintenance Cost is low, Doors are three, Persons are more than four, Trunk Size is small, Safety Score is low. Give your recommendation to buy this car as 'unacceptable', 'acceptable', 'good', or 'very good' without providing any explanation." },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=256)
output = tokenizer.decode(outputs[0]).lower().replace('"', '')
print(output)
model_start = output.find("<start_of_turn>model") + len("<start_of_turn>model")
model_end = output.find(".", model_start + 1)
model_response = output[model_start:model_end].replace('\n', '')
print('response: ' + model_response)


In [None]:
# Load model from HF
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
from datetime import datetime
import transformers
import torch
import pandas as pd

api_key = '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'
fineTunedModelBase = "sukara13/gemma7bcars"
filePath = './datasets/'
testFileBase = 'car_text_test_wo_'

def inferenceCar(format, trainSize):
    # Set counters
    count = 0
    success = 0
    fail = 0

    # Read the car test data into a pandas data frame
    testFile = filePath + testFileBase + str(trainSize) + '.csv'
    dfTest = pd.read_csv(testFile, names=['user', 'model'], delimiter='\t')
    print(len(dfTest))
    print(dfTest.head)

    model_id = fineTunedModelBase + str(trainSize)
    fileName = filePath + model_id.replace('/', '_') + '-' + format + '.csv'

    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=api_key)
    print('tokenizer: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    base_model = AutoModelForCausalLM.from_pretrained(model_id, use_auth_token=api_key)
    print('base: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

    # Load the fine-tuned model with PEFT
    model = PeftModel.from_pretrained(base_model, model_id, use_auth_token=api_key)
    print('peft: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

    """
    # Ensure the model is in evaluation mode
    model.eval()

    print(model.device.type)
    # Enable mixed precision if supported
    if model.device.type == "cuda":
        from torch.cuda.amp import autocast
        print('cuda')
    """

    with open(fileName, 'w') as file:
        for index, row in dfTest.iterrows():
            print(str(index) + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            chat = [ { "role": "user", "content": row['user'] } ]
            prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
            print('prompt: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            print(prompt)
            inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
            print('encode: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=120)
            print('generate: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            output = tokenizer.decode(outputs[0]).lower().replace('"', '')
            print('decode: ' + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
            print(output)

            model_start = output.find("<start_of_turn>model") + len("<start_of_turn>model")
            #model_end = output.find(".", model_start + 1)
            #llm_eval = output[model_start:model_end].replace('\n', '')
            model_end = output.find("\n", model_start + 2)
            llm_eval = output[model_start:model_end].replace('\n', '').replace('.', '')
            print('llm_eval: ' + llm_eval)

            if 'unacceptable' in llm_eval:
                llm_eval = 'unacceptable'
            elif 'acceptable' in llm_eval:
                llm_eval = 'acceptable'
            elif 'very good' in llm_eval:
                llm_eval = 'very good'
            elif 'good' in llm_eval:
                llm_eval = 'good'
            else:
                llm_eval = 'fail: ' + row['user'] + ' - ' + llm_eval
                print(llm_eval)
                fail += 1
            ground_truth = row['model']
            if llm_eval == ground_truth:
                success += 1
            count += 1
            file.write(ground_truth + ',' + llm_eval + '\n')
            print(str(count) + ": " + ground_truth + ',' + llm_eval)

    print(fileName + ': ' + str(success) + '/' + str(count) + ' = ' + "{:.2%}".format(success/count))
    print(fail)

inferenceCar('text', 16)

In [None]:
# OLD

import torch
from peft import PeftModel
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
from constants import *
#from inference.utils import *
from prompting import *
import bitsandbytes as bnb
import accelerate

def load_base_model(model_id, api_key):
    """
    This assumes model supports quantization
    """
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,  # Consider your requirement
        bnb_4bit_quant_type= "nf4",
        bnb_4bit_compute_dtype= torch.bfloat16,
    )

    # TODO: how do these different AutoModel classes compare? Should I be calling different ones per task?
    base_model = AutoModelForCausalLM.from_pretrained(
        model_id,
        use_auth_token=api_key,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
    )
    return base_model

def load_model(model_id, api_key, weight_path=None, checkpoint=0, inference=True):
    """
    By default, assumes there is no finetuned weights and returns base model.
    If there is finetuning, assumes weights were finetuned with Peft.

    # TODO: change to loading from HF. Especially with inference endpoints
    """
    model = load_base_model(model_id, api_key)
    if checkpoint > 0:
        checkpoint_weights = f"{weight_path}/checkpoint-{checkpoint}/"
        model = PeftModel.from_pretrained(
            model,
            checkpoint_weights,
        ).merge_and_unload()
    else:
        checkpoint_weights = f"{weight_path}/"
        model = PeftModel.from_pretrained(
            model,
            checkpoint_weights,
        ).merge_and_unload()

    if inference:
        model.config.use_cache = True
        model.eval()

    return model

def load_tokenizer(model_id, api_key):
    """
    # TODO: change to loading from HF. Especially with inference endpoints
    """
    tokenizer = AutoTokenizer.from_pretrained(
        model_id,
        use_auth_token = api_key,
        padding_side="right",
        add_eos_token=True,
        add_bos_token=True,
        trust_remote_code=True
    )
    return tokenizer

In [None]:
# OLD

import torch
#import argparse
from datasets import Dataset, load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
#from loading import load_base_model, load_tokenizer
#from inference.utils import *
from constants import *
import pandas as pd

# Constants for training arguments
WARMUP_STEPS = 100
NUM_EPOCHS = 10
BATCH_SIZE = 1
GRADIENT_ACCUMULATION = 1
OPTIMIZER = "adamw_torch"
LEARNING_RATE = 1e-4

# Logging and saving arguments
SAVE_STEPS = 10
LOGGING_STEPS = 10
LOG_DIR = "./logs"

# Sequence length
MAX_SEQUENCE_LENGTH = 150

# PEFT settings
RANK = 6
LORA_ALPHA = 16
LORA_DROPOUT = 0.1
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "out_proj"]

device = 'cpu'
if torch.cuda.is_available():
    device = torch.cuda.current_device()
elif torch.backends.mps.is_available():
    device = 'mps'

verbose = True
dataset_path = './datasets'
save_weights_to = "./saved_weights"
model_id = 'google/gemma-7b-it'
api_key = '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'

class FineTune():

    def __init__(
        self,
        device,
        save_weights_to,
        verbose=False,
    ):
        self.device = device
        self.save_weights_to = save_weights_to
        self.verbose = verbose

        # these two attributes get set with prepare_model method
        self.model = None
        self.tokenizer = None

    """
    This is the dataset that we'll use to finetune the model.
    Assumes the dataset is a .csv file with columnes "Question" and "Answer".
    """
    def get_dataset(self, path):
        df = pd.read_csv(f"{path}/car_text_train_top_16.csv", delimiter='\t', header=None, names=['user', 'model'])
        df['text'] = df.apply(lambda row: f"User: {row['user']} Model: {row['model']}", axis=1)
        train_dataset = Dataset.from_pandas(df[['text']])
        if self.verbose: print("=> num datapoints: ", len(train_dataset))
        return train_dataset

    """
    This loads the model, tokenizer, and peft framework
    The exact framework is as follows: base -> quantize -> peft
    """
    def prepare_model(self, model_id, api_key):
        # loading model
        base_model = load_base_model(model_id, api_key)
        base_model.config.use_cache = False
        # base_model.gradient_checkpointing_enable()  # NOTE: turn on if training gives OOM errors

        quantized_model = prepare_model_for_kbit_training(base_model)
        peft_config = LoraConfig(
            r=RANK,
            lora_alpha=LORA_ALPHA,
            lora_dropout=LORA_DROPOUT,
            bias="none",
            task_type="CAUSAL_LM",
            target_modules=TARGET_MODULES,   # query, key, value, output projections
        )
        model = get_peft_model(quantized_model, peft_config)
        self.model = model.to(device) #comment out to(device)

        # loading tokenizer
        tokenizer = load_tokenizer(model_id, api_key)
        tokenizer.pad_token = tokenizer.eos_token
        self.tokenizer = tokenizer

        return peft_config

    """
    Creates the TrainingArguments and runs SFTTrainer.
    """
    def train(self, train_dataset, peft_config):
        training_args = TrainingArguments(
            output_dir=self.save_weights_to,
            warmup_steps=WARMUP_STEPS,
            num_train_epochs=NUM_EPOCHS,
            per_device_train_batch_size=BATCH_SIZE,
            gradient_accumulation_steps=GRADIENT_ACCUMULATION,
            gradient_checkpointing=True,
            optim=OPTIMIZER,
            learning_rate=LEARNING_RATE,
            save_steps=SAVE_STEPS,
            save_strategy="steps",
            logging_dir=LOG_DIR,
            logging_steps=LOGGING_STEPS,
            push_to_hub=True,
            hub_model_id="sukara13/gemma7bcars16",
            hub_token='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'
        )
        trainer = SFTTrainer(
            model=self.model,
            train_dataset=train_dataset,
            peft_config=peft_config,
            max_seq_length=MAX_SEQUENCE_LENGTH,
            tokenizer=self.tokenizer,
            args=training_args,
            dataset_text_field="text"
        )

        # Train and save model locally
        if self.verbose: print("=> START TRAINING!")
        trainer.train()

        # change this path, otherwise will overwrite
        if self.verbose: print("=> SAVING WEIGHTS")
        trainer.model.save_pretrained(self.save_weights_to)

        self.model.config.use_cache = True
        self.model.eval()
        del self.model, trainer
        if self.device == "cuda":
            torch.cuda.empty_cache()

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: ", device)

# Set up class and finetune
FT = FineTune(
    device = device,
    save_weights_to = save_weights_to,
    verbose = True
)
train_dataset = FT.get_dataset(dataset_path)
peft_configs = FT.prepare_model(model_id, api_key)
FT.train(train_dataset, peft_configs)