### Imports

In [7]:
# !pip install -e ../.

In [8]:
# Install proper version of torch, as according to: https://pytorch.org/
# !pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In [1]:
import pandas as pd
import torch
from looptune import prep_config_combinations, single_run, clean_memory

### Prepare dataset with two columns: 'text' and 'label'

Examplary data source: https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news


In [2]:
df = pd.read_csv('example_data/SentimentAnalysisforFinancialNews.csv', encoding="ISO-8859-1", header=None)
df.columns = ['label', 'text']
df.head(5)

Unnamed: 0,label,text
0,neutral,"According to Gran , the company has no plans t..."
1,neutral,Technopolis plans to develop in stages an area...
2,negative,The international electronic industry company ...
3,positive,With the new production plant the company woul...
4,positive,According to the company 's updated strategy f...


### Prepare run configurations

In [5]:
run_config = {   # -----------------------
                 'model_name': 'meta-llama/Meta-Llama-3-8B', # Pre-trained model names from the Hugging Face hub used for fine-tuning
                 # --------------------------
                  'split': (0.7, 0.3), # Divides the dataset into training, testing, (and optionally) validation sets. Examples: (0.7,0.3) -> split into train and test proportionally; (70, 30) splits into train,test proportionally.
                 'binary': False, # Indicates whether the task is binary (two classes) or multi-class classification.,
                 'balanced': (('train',), ('test',)),
                 # --------------------------
                 'training_arguments': {
                     'num_train_epochs': 1, # Number of times the model sees the entire training dataset.
                     'per_device_train_batch_size': 4, # Number of samples processed in each training step (personally, 8/16 work best, 16 is faster, but you may find linear drop in inference speed during fine-tuning).
                     'per_device_eval_batch_size': 4, # Number of samples processed in each evaluation step.
                     # 'gradient_accumulation_steps': 4,
                     'gradient_checkpointing': True,
                     #-----------------------------
                     'save_total_limit': 2,
                     'load_best_model_at_end': True,
                     'save_strategy': 'steps', # Controls when to save model checkpoints ('steps', 'epoch' or 'no').
                     'metric_for_best_model': 'f1-score',
                     #-----------------------------
                     'evaluation_strategy': "steps",
                     'logging_steps': 100,
                     'fp16': False,
                     # 'use_cpu': False,
                     #-----------------------------
                     'learning_rate': 2e-5,
                     'lr_scheduler_type': "linear",
                     'warmup_ratio': 0.1,
                     'max_grad_norm': 0.3,
                     'weight_decay': 0.001,
                 },
                 #-----------------------------
                     'bnb_config': [
                                # False,
                                {'bnb_4bit_compute_dtype': torch.bfloat16, 'load_in_4bit': True, 'bnb_4bit_quant_type': "nf4", 'bnb_4bit_use_double_quant': True, 'load_in_8bit': False}
                                 ],
                 'peft_config': [
                                # False,
                                {'r': 8, 'lora_alpha': 32, 'lora_dropout': 0.05, 'bias': "none",
                                'task_type': "SEQ_CLS", 
                                # 'target_modules': ("v_proj",),
                                'target_modules': "all-linear"
                                }
                                ],
                    }

run_params_serie = prep_config_combinations(run_config)
run_params_serie

[{'model_name': 'meta-llama/Meta-Llama-3-8B',
  'split': (0.7, 0.3),
  'binary': False,
  'balanced': (('train',), ('test',)),
  'training_arguments': {'num_train_epochs': 1,
   'per_device_train_batch_size': 4,
   'per_device_eval_batch_size': 4,
   'gradient_checkpointing': True,
   'save_total_limit': 2,
   'load_best_model_at_end': True,
   'save_strategy': 'steps',
   'metric_for_best_model': 'f1-score',
   'evaluation_strategy': 'steps',
   'logging_steps': 100,
   'fp16': False,
   'learning_rate': 2e-05,
   'lr_scheduler_type': 'linear',
   'warmup_ratio': 0.1,
   'max_grad_norm': 0.3,
   'weight_decay': 0.001},
  'bnb_config': {'bnb_4bit_compute_dtype': torch.bfloat16,
   'load_in_4bit': True,
   'bnb_4bit_quant_type': 'nf4',
   'bnb_4bit_use_double_quant': True,
   'load_in_8bit': False},
  'peft_config': {'r': 8,
   'lora_alpha': 32,
   'lora_dropout': 0.05,
   'bias': 'none',
   'task_type': 'SEQ_CLS',
   'target_modules': 'all-linear'}}]

In [6]:
for run_params in run_params_serie:
    single_run(run_params, df)

          text
label         
negative   604
neutral   2879
positive  1363


Casting to class labels:   0%|          | 0/4846 [00:00<?, ? examples/s]

Casting the dataset:   0%|          | 0/1269 [00:00<?, ? examples/s]

Casting the dataset:   0%|          | 0/543 [00:00<?, ? examples/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/1269 [00:00<?, ? examples/s]

Map:   0%|          | 0/543 [00:00<?, ? examples/s]



Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 21,016,600 || all params: 7,525,986,352 || trainable%: 0.2793
pefted


Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: jakubpart (jpartyka). Use `wandb login --relogin` to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


Step,Training Loss,Validation Loss,Precision,Recall,F1-score,Accuracy
100,1.8296,1.210119,0.546508,0.508287,0.501747,0.508287
200,0.998,0.884843,0.718395,0.686924,0.669811,0.686924
300,0.7524,0.641259,0.754789,0.751381,0.752562,0.751381




VBox(children=(Label(value='0.002 MB of 0.027 MB uploaded\r'), FloatProgress(value=0.08941337069368108, max=1.…

0,1
eval/accuracy,▁▆█
eval/f1-score,▁▆█
eval/loss,█▄▁
eval/precision,▁▇█
eval/recall,▁▆█
eval/runtime,█▁▆
eval/samples_per_second,▁█▃
eval/steps_per_second,▁█▃
train/epoch,▁▁▄▄▇▇█
train/global_step,▁▁▄▄▇▇██

0,1
eval/accuracy,0.75138
eval/f1-score,0.75256
eval/loss,0.64126
eval/precision,0.75479
eval/recall,0.75138
eval/runtime,46.7823
eval/samples_per_second,11.607
eval/steps_per_second,2.907
total_flos,2503516367694912.0
train/epoch,1.0
