# 01 - Fine tune the Microsoft Phi3 model to create MaximusLLM

This notebook aims to fine tune the Phi3 Mini Instruct 128k model from Microsoft to create a custom model named MaximusLLM.
The fine-tuning process will be conducted using the Hugging Face Transformers library and the PEFT library for efficient training.

## a) Import the required libraries

The following libraries are requiredered to fine-tune the model:

- ***sys***: Provides access to some variables used or maintained by the Python interpreter and to functions that interact strongly with the interpreter.
- ***logging***: Allows you to emit messages to a log file or to the system console.
- ***datasets***: Provides a simple and efficient way to load, preprocess, and share datasets.
- ***huggingface_hub***: Offers a command-line interface to the Hugging Face Hub, allowing you to upload and download models, datasets, and other artifacts.
- ***peft***: Enables parameter-efficient fine-tuning of large language models, reducing the computational resources required for training.
- ***torch***: A popular machine learning library that provides a wide range of functionalities for building and training neural networks.
- ***transformers***: A library for state-of-the-art natural language processing (NLP) models, providing easy-to-use interfaces for tasks such as text classification, translation, and question answering.
- ***trl***: A library for training reinforcement learning agents in NLP tasks, such as dialogue generation and sentiment analysis.


In [38]:
import os
import sys
import logging
import dotenv
import datasets
from datasets import load_dataset
from datasets import DatasetDict
from huggingface_hub import notebook_login
from peft import LoraConfig
import torch
import wandb
import transformers
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from peft.utils.save_and_load import get_peft_model_state_dict
from peft import PeftModel

Now import the environment variables containing the various tokens.

In [2]:
dotenv.load_dotenv()

True

And set the CUDA_HOME varaible for the NVidia GPU libraries

In [3]:
os.environ["CUDA_HOME"]="/usr"

## b) Define the fine tuning parameters and configurations

This is a configuration dictionary for fine-tuning the Microsoft Phi3 model. It includes settings for training, evaluation, learning rate, logging, and saving checkpoints. The training uses bf16 precision, a cosine learning rate scheduler, and gradient checkpointing for memory efficiency.

In [4]:
training_config = {
    "bf16": True,
    "do_eval": False,
    "learning_rate": 5.0e-06,
    "log_level": "info",
    "logging_steps": 20,
    "logging_strategy": "steps",
    "lr_scheduler_type": "cosine",
    "num_train_epochs": 1,
    "max_steps": -1,
    "output_dir": "./checkpoint_dir",
    "overwrite_output_dir": True,
    "per_device_eval_batch_size": 4,
    "per_device_train_batch_size": 4,
    "remove_unused_columns": True,
    "save_steps": 100,
    "save_total_limit": 1,
    "seed": 0,
    "gradient_checkpointing": True,
    "gradient_checkpointing_kwargs":{"use_reentrant": False},
    "gradient_accumulation_steps": 1,
    "warmup_ratio": 0.2,
    }

This is a configuration dictionary for fine-tuning a Microsoft Phi3 model using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). It specifies the rank (r) of the update matrices, the scaling factor (lora_alpha), dropout rate, bias handling, task type, target modules for adaptation, and modules to save during the fine-tuning process.

In [5]:
peft_config = {
    "r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": "all-linear",
    "modules_to_save": None,
}

The code sets up training configurations for a model using Hugging Face's Transformers library, including parameters for training and Low-Rank Adaptation (LoRA) fine-tuning.

In [6]:
train_conf = TrainingArguments(**training_config)
peft_conf = LoraConfig(**peft_config)

## c) Setting up the logging for the fine tuining process

This logger will be used to log messages at various levels such as debug, info, warning, error, and critical. 

In [7]:
logger = logging.getLogger(__name__)

In [8]:
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[logging.StreamHandler(sys.stdout)],
)

The code snippet sets the logging level for various libraries (datasets, transformers) and the logger to the same level specified in the training configuration. This ensures consistent and appropriate verbosity across the different components of the software, making it easier to debug and understand the execution flow.

In [9]:
log_level = train_conf.get_process_log_level()
logger.setLevel(log_level)
datasets.utils.logging.set_verbosity(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()

The code snippet logs information about the training process, including the process rank, device, and whether distributed training or 16-bit training is being used. It also logs the training/evaluation parameters and PEFT parameters for reference.

In [10]:
logger.warning(
    f"Process rank: {train_conf.local_rank}, device: {train_conf.device}, n_gpu: {train_conf.n_gpu}"
    + f" distributed training: {bool(train_conf.local_rank != -1)}, 16-bits training: {train_conf.fp16}"
)
logger.info(f"Training/evaluation parameters {train_conf}")
logger.info(f"PEFT parameters {peft_conf}")

2024-07-08 20:46:55 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_stra

## d) Load the Phi3 model and tokenizer

The Microsoft Phi-3 model is a large language model developed by Microsoft. It's a part of the Phi series, which includes models like Phi-1 and Phi-2. The "Phi-3-mini-128k-instruct" is a variant of the Phi-3 model that has been fine-tuned on a smaller dataset of 128,000 tokens. This makes it more computationally efficient and faster to train, while still maintaining a good level of performance. The "-instruct" in the name suggests that this model has been trained to follow instructions, making it suitable for tasks like text generation and question answering. However, without more specific information, I can't provide a detailed description of its capabilities or limitations.

In [11]:
hd_model_name = "microsoft/Phi-3-mini-128k-instruct"

The code sets up parameters for loading a Microsoft Phi3 model, including using flash attention for faster computation, using bfloat16 data type for memory efficiency, and not using cache or device mapping.

In [12]:
model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    #attn_implementation="flash_attention_2",  # loading the model with flash-attention support
    attn_implementation="eager",
    torch_dtype=torch.bfloat16,
    device_map=None
)

The code snippet loads a pre-trained causal language model and its corresponding tokenizer from a specified checkpoint path. This is a common step in fine-tuning a model for a specific task.

In [13]:
model = AutoModelForCausalLM.from_pretrained(hd_model_name, **model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(hd_model_name)

[INFO|configuration_utils.py:733] 2024-07-08 20:46:55,947 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:733] 2024-07-08 20:46:56,073 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:46:56,075 >> Model config Phi3Config {
  "_name_or_path": "microsoft/Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop"



[INFO|modeling_utils.py:3556] 2024-07-08 20:46:56,382 >> loading weights file model.safetensors from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/model.safetensors.index.json
[INFO|modeling_utils.py:1531] 2024-07-08 20:46:56,384 >> Instantiating Phi3ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1000] 2024-07-08 20:46:56,384 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 32000,
  "pad_token_id": 32000,
  "use_cache": false
}



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4364] 2024-07-08 20:46:57,559 >> All model checkpoint weights were used when initializing Phi3ForCausalLM.

[INFO|modeling_utils.py:4372] 2024-07-08 20:46:57,560 >> All the weights of Phi3ForCausalLM were initialized from the model checkpoint at microsoft/Phi-3-mini-128k-instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Phi3ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:955] 2024-07-08 20:46:57,674 >> loading configuration file generation_config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/generation_config.json
[INFO|configuration_utils.py:1000] 2024-07-08 20:46:57,674 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": [
    32000,
    32001,
    32007
  ],
  "pad_token_id": 32000
}

[INFO|tokenization_utils_base.py:2161] 2024-07-08 20:46

This code sets the maximum token length to 2048, uses the unknown token for padding to prevent endless generation, adjusts the padding token ID, and sets the padding side to the right for the tokenizer in a machine learning model.

In [14]:
tokenizer.model_max_length = 2048
tokenizer.pad_token = tokenizer.unk_token  # use unk rather than eos token to prevent endless generation
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = 'right'

## e) Prepare the training and validation datasets

This function takes a conversation-style example and a tokenizer as input, formats the messages in the example using the chat template provided by the tokenizer, and adds the formatted string to the example dictionary under the key "text".


In [15]:
def apply_chat_template(example,tokenizer):
    """
    This function takes a conversation-style example and a tokenizer as input,
    formats the messages in the example using the chat template provided by the tokenizer,
    and adds the formatted string to the example dictionary under the key "text".

    Parameters:
    example (dict): A dictionary containing a key "messages" with a list of messages.
    tokenizer (object): An object with a method apply_chat_template that formats a list of messages into a single string.

    Returns:
    dict: The modified example dictionary with a new key "text" containing the formatted string.
    """
    messages = example["messages"]
    example["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False)
    return example

The code loads a dataset and prints the number of samples in the 'train', 'eval', and 'test' subsets. It shows the dataset's size distribution.

In [16]:
raw_dataset = load_dataset("awels/maximo_admin_dataset")
len(raw_dataset['train'])

Overwrite dataset info from restored data version if exists.


2024-07-08 20:47:00 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.


Loading Dataset info from /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65


2024-07-08 20:47:00 - INFO - datasets.info - Loading Dataset info from /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65


Found cached dataset maximo_admin_dataset (/home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65)


2024-07-08 20:47:00 - INFO - datasets.builder - Found cached dataset maximo_admin_dataset (/home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65)


Loading Dataset info from /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65


2024-07-08 20:47:00 - INFO - datasets.info - Loading Dataset info from /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65


12687

Now prepare the training and validation datasets.

In [17]:
# Diviser le dataset d'entraînement en deux splits : train et test
train_test_split = raw_dataset["train"].train_test_split(test_size=0.2)  # 20% pour le test, 80% pour l'entraînement

# Créer un DatasetDict avec les nouveaux splits
new_dataset = DatasetDict({
    'train': train_test_split['train'],
    'test': train_test_split['test']
})

train_dataset = new_dataset["train"]
test_dataset = new_dataset["test"]

print ("Train set : " + str(len(train_dataset)))
print ("Test set : " + str(len(test_dataset)))

Caching indices mapping at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-303a2a11d701da26.arrow


2024-07-08 20:47:00 - INFO - datasets.arrow_dataset - Caching indices mapping at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-303a2a11d701da26.arrow


Caching indices mapping at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-c9ef3ef2eecaae0d.arrow


2024-07-08 20:47:00 - INFO - datasets.arrow_dataset - Caching indices mapping at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-c9ef3ef2eecaae0d.arrow
Train set : 10149
Test set : 2538


In [18]:
train_dataset

Dataset({
    features: ['prompt', 'prompt_id', 'messages'],
    num_rows: 10149
})

The code applies a chat template to the training dataset using a tokenizer, with parallel processing for efficiency, and removes unnecessary columns. Do the same for test datasets as well.

In [19]:
column_names = list(train_dataset.features)

processed_train_dataset = train_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    num_proc=10,
    remove_columns=column_names,
    desc="Applying chat template to train",
)

processed_test_dataset = test_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    num_proc=10,
    remove_columns=column_names,
    desc="Applying chat template to test",
)

Process #0 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00000_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #0 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00000_of_00010.arrow


Process #1 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00001_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #1 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00001_of_00010.arrow


Process #2 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00002_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #2 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00002_of_00010.arrow


Process #3 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00003_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #3 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00003_of_00010.arrow


Process #4 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00004_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #4 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00004_of_00010.arrow


Process #5 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00005_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #5 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00005_of_00010.arrow


Process #6 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00006_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #6 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00006_of_00010.arrow


Process #7 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00007_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #7 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00007_of_00010.arrow


Process #8 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00008_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #8 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00008_of_00010.arrow


Process #9 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00009_of_00010.arrow


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Process #9 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00009_of_00010.arrow


Spawning 10 processes


2024-07-08 20:47:01 - INFO - datasets.arrow_dataset - Spawning 10 processes


Applying chat template to train (num_proc=10):   0%|          | 0/10149 [00:00<?, ? examples/s]

Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00000_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00000_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00001_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00001_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00003_of_00010.arrow
Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00002_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00003_of_00010.arrow
2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00002_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00004_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00004_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00005_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00005_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00006_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00006_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00007_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00007_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00008_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00008_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00009_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-01908981a16589d6_00009_of_00010.arrow


Concatenating 10 shards


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Concatenating 10 shards


Process #0 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00000_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #0 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00000_of_00010.arrow


Process #1 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00001_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #1 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00001_of_00010.arrow


Process #2 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00002_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #2 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00002_of_00010.arrow


Process #3 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00003_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #3 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00003_of_00010.arrow


Process #4 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00004_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #4 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00004_of_00010.arrow


Process #5 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00005_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #5 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00005_of_00010.arrow


Process #6 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00006_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #6 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00006_of_00010.arrow


Process #7 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00007_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #7 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00007_of_00010.arrow


Process #8 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00008_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #8 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00008_of_00010.arrow


Process #9 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00009_of_00010.arrow


2024-07-08 20:47:02 - INFO - datasets.arrow_dataset - Process #9 will write at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00009_of_00010.arrow


Spawning 10 processes


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Spawning 10 processes


Applying chat template to test (num_proc=10):   0%|          | 0/2538 [00:00<?, ? examples/s]

Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00000_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00000_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00001_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00001_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00002_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00002_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00003_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00003_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00004_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00004_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00005_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00005_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00006_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00006_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00007_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00007_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00008_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00008_of_00010.arrow


Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00009_of_00010.arrow


2024-07-08 20:47:03 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/franck/.cache/huggingface/datasets/awels___maximo_admin_dataset/default/0.0.0/91b3b084749da7d748f7750b5d350c582dd7ac65/cache-e72d48380e23bb03_00009_of_00010.arrow


Concatenating 10 shards


2024-07-08 20:47:04 - INFO - datasets.arrow_dataset - Concatenating 10 shards


## f) Execute the training of the MaximusLLM model

The code is using the SFTTrainer (Supervised Fine-Tuning Trainer) to fine-tune the Microsoft Phi3 model. It sets up the model, training configuration, and dataset for training and evaluation. The maximum sequence length is set to 64, and the trainer is configured to pack sequences efficiently.

In [20]:
trainer = SFTTrainer(
    model=model,
    args=train_conf,
    peft_config=peft_conf,
    train_dataset=processed_train_dataset,
    eval_dataset=processed_test_dataset,
    max_seq_length=64,
    dataset_text_field="text",
    tokenizer=tokenizer,
    packing=True
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
[INFO|training_args.py:2048] 2024-07-08 20:47:04,171 >> PyTorch: setting up devices
Using custom data configuration default-2a050f9e3c68d40b


2024-07-08 20:47:04 - INFO - datasets.builder - Using custom data configuration default-2a050f9e3c68d40b


Loading Dataset Infos from /home/franck/Applications/miniconda3/envs/MaximusLLM/lib/python3.11/site-packages/datasets/packaged_modules/generator


2024-07-08 20:47:04 - INFO - datasets.info - Loading Dataset Infos from /home/franck/Applications/miniconda3/envs/MaximusLLM/lib/python3.11/site-packages/datasets/packaged_modules/generator


Generating dataset generator (/home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0)


2024-07-08 20:47:04 - INFO - datasets.builder - Generating dataset generator (/home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0)


Downloading and preparing dataset generator/default to /home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0...


2024-07-08 20:47:04 - INFO - datasets.builder - Downloading and preparing dataset generator/default to /home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0...


Generating train split


2024-07-08 20:47:04 - INFO - datasets.builder - Generating train split


Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.


2024-07-08 20:47:05 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.


Dataset generator downloaded and prepared to /home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0. Subsequent calls will reuse this data.


2024-07-08 20:47:05 - INFO - datasets.builder - Dataset generator downloaded and prepared to /home/franck/.cache/huggingface/datasets/generator/default-2a050f9e3c68d40b/0.0.0. Subsequent calls will reuse this data.


Using custom data configuration default-b9b796c42d54b53c


2024-07-08 20:47:05 - INFO - datasets.builder - Using custom data configuration default-b9b796c42d54b53c


Loading Dataset Infos from /home/franck/Applications/miniconda3/envs/MaximusLLM/lib/python3.11/site-packages/datasets/packaged_modules/generator


2024-07-08 20:47:05 - INFO - datasets.info - Loading Dataset Infos from /home/franck/Applications/miniconda3/envs/MaximusLLM/lib/python3.11/site-packages/datasets/packaged_modules/generator


Generating dataset generator (/home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0)


2024-07-08 20:47:05 - INFO - datasets.builder - Generating dataset generator (/home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0)


Downloading and preparing dataset generator/default to /home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0...


2024-07-08 20:47:05 - INFO - datasets.builder - Downloading and preparing dataset generator/default to /home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0...


Generating train split


2024-07-08 20:47:05 - INFO - datasets.builder - Generating train split


Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.


2024-07-08 20:47:06 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.


Dataset generator downloaded and prepared to /home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0. Subsequent calls will reuse this data.


2024-07-08 20:47:06 - INFO - datasets.builder - Dataset generator downloaded and prepared to /home/franck/.cache/huggingface/datasets/generator/default-b9b796c42d54b53c/0.0.0. Subsequent calls will reuse this data.


[INFO|trainer.py:642] 2024-07-08 20:47:07,436 >> Using auto half precision backend


This code modifies the state_dict method of a PyTorch model to use the get_peft_model_state_dict function from the PEFT library. This is done to fine-tune the Microsoft Phi3 model. If the Torch version is 2 or higher and the platform is not Windows, the model is then compiled for performance optimization.

In [21]:
original_state_dict = model.state_dict

def custom_state_dict(self, *args, **kwargs):
    return get_peft_model_state_dict(self, original_state_dict(*args, **kwargs))

model.state_dict = custom_state_dict.__get__(model, type(model))

if torch.__version__ >= "2" and sys.platform != "win32":
    model = torch.compile(model)

Now let's train the model. PLease note that login to WanDB will be required

In [22]:
wandb.login(key=os.environ['WANDB_API_KEY'])

2024-07-08 20:47:08 - ERROR - wandb.jupyter - Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


[34m[1mwandb[0m: Currently logged in as: [33mfbeawels[0m ([33mfbeawels-awels-engineering[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/franck/.netrc


True

In [23]:
train_result = trainer.train()

[INFO|trainer.py:2128] 2024-07-08 20:47:09,463 >> ***** Running training *****
[INFO|trainer.py:2129] 2024-07-08 20:47:09,463 >>   Num examples = 12,499
[INFO|trainer.py:2130] 2024-07-08 20:47:09,463 >>   Num Epochs = 1
[INFO|trainer.py:2131] 2024-07-08 20:47:09,464 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:2134] 2024-07-08 20:47:09,464 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
[INFO|trainer.py:2135] 2024-07-08 20:47:09,464 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2136] 2024-07-08 20:47:09,464 >>   Total optimization steps = 3,125
[INFO|trainer.py:2137] 2024-07-08 20:47:09,466 >>   Number of trainable parameters = 25,165,824
[INFO|integration_utils.py:750] 2024-07-08 20:47:09,471 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


  0%|          | 0/3125 [00:00<?, ?it/s]

{'loss': 6.7887, 'grad_norm': 10.1875, 'learning_rate': 1.6e-07, 'epoch': 0.01}
{'loss': 6.5376, 'grad_norm': 6.0, 'learning_rate': 3.2e-07, 'epoch': 0.01}
{'loss': 6.2477, 'grad_norm': 12.875, 'learning_rate': 4.800000000000001e-07, 'epoch': 0.02}
{'loss': 6.5146, 'grad_norm': 8.0625, 'learning_rate': 6.4e-07, 'epoch': 0.03}


[INFO|trainer.py:3478] 2024-07-08 20:47:31,827 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-100


{'loss': 6.179, 'grad_norm': 4.8125, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.03}


[INFO|configuration_utils.py:733] 2024-07-08 20:47:32,054 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:47:32,055 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 6.0984, 'grad_norm': 8.875, 'learning_rate': 9.600000000000001e-07, 'epoch': 0.04}
{'loss': 6.2706, 'grad_norm': 8.5, 'learning_rate': 1.12e-06, 'epoch': 0.04}
{'loss': 6.2542, 'grad_norm': 3.15625, 'learning_rate': 1.28e-06, 'epoch': 0.05}
{'loss': 6.5909, 'grad_norm': 5.9375, 'learning_rate': 1.44e-06, 'epoch': 0.06}


[INFO|trainer.py:3478] 2024-07-08 20:47:51,313 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-200


{'loss': 5.9163, 'grad_norm': 7.0625, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.06}


[INFO|configuration_utils.py:733] 2024-07-08 20:47:51,542 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:47:51,543 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 5.7982, 'grad_norm': 4.625, 'learning_rate': 1.76e-06, 'epoch': 0.07}
{'loss': 6.391, 'grad_norm': 6.25, 'learning_rate': 1.9200000000000003e-06, 'epoch': 0.08}
{'loss': 6.2254, 'grad_norm': 9.0625, 'learning_rate': 2.08e-06, 'epoch': 0.08}
{'loss': 5.7959, 'grad_norm': 7.9375, 'learning_rate': 2.24e-06, 'epoch': 0.09}


[INFO|trainer.py:3478] 2024-07-08 20:48:10,145 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-300


{'loss': 5.2485, 'grad_norm': 4.09375, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.1}


[INFO|configuration_utils.py:733] 2024-07-08 20:48:10,384 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:48:10,385 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 5.682, 'grad_norm': 4.6875, 'learning_rate': 2.56e-06, 'epoch': 0.1}
{'loss': 5.4417, 'grad_norm': 5.5625, 'learning_rate': 2.7200000000000002e-06, 'epoch': 0.11}
{'loss': 5.1421, 'grad_norm': 5.1875, 'learning_rate': 2.88e-06, 'epoch': 0.12}
{'loss': 5.0777, 'grad_norm': 5.1875, 'learning_rate': 3.04e-06, 'epoch': 0.12}


[INFO|trainer.py:3478] 2024-07-08 20:48:29,160 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-400


{'loss': 4.6715, 'grad_norm': 3.8125, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.13}


[INFO|configuration_utils.py:733] 2024-07-08 20:48:29,393 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:48:29,395 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 4.5419, 'grad_norm': 7.34375, 'learning_rate': 3.3600000000000004e-06, 'epoch': 0.13}
{'loss': 4.4581, 'grad_norm': 8.125, 'learning_rate': 3.52e-06, 'epoch': 0.14}
{'loss': 4.2968, 'grad_norm': 3.453125, 'learning_rate': 3.6800000000000003e-06, 'epoch': 0.15}
{'loss': 4.2394, 'grad_norm': 5.84375, 'learning_rate': 3.8400000000000005e-06, 'epoch': 0.15}


[INFO|trainer.py:3478] 2024-07-08 20:48:47,539 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-500


{'loss': 3.771, 'grad_norm': 3.859375, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.16}


[INFO|configuration_utils.py:733] 2024-07-08 20:48:47,763 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:48:47,765 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 3.7784, 'grad_norm': 7.65625, 'learning_rate': 4.16e-06, 'epoch': 0.17}
{'loss': 3.8309, 'grad_norm': 6.0625, 'learning_rate': 4.32e-06, 'epoch': 0.17}
{'loss': 3.5355, 'grad_norm': 6.84375, 'learning_rate': 4.48e-06, 'epoch': 0.18}
{'loss': 3.4408, 'grad_norm': 4.15625, 'learning_rate': 4.6400000000000005e-06, 'epoch': 0.19}


[INFO|trainer.py:3478] 2024-07-08 20:49:06,556 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-600


{'loss': 3.5117, 'grad_norm': 7.78125, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.19}


[INFO|configuration_utils.py:733] 2024-07-08 20:49:06,788 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:49:06,790 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 3.5405, 'grad_norm': 5.625, 'learning_rate': 4.960000000000001e-06, 'epoch': 0.2}
{'loss': 3.1879, 'grad_norm': 6.1875, 'learning_rate': 4.999555880952023e-06, 'epoch': 0.2}
{'loss': 3.2989, 'grad_norm': 4.84375, 'learning_rate': 4.997582336695312e-06, 'epoch': 0.21}
{'loss': 3.1321, 'grad_norm': 2.796875, 'learning_rate': 4.9940312659030635e-06, 'epoch': 0.22}


[INFO|trainer.py:3478] 2024-07-08 20:49:25,756 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-700


{'loss': 3.1179, 'grad_norm': 3.734375, 'learning_rate': 4.9889049115077e-06, 'epoch': 0.22}


[INFO|configuration_utils.py:733] 2024-07-08 20:49:26,000 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:49:26,002 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 3.0694, 'grad_norm': 2.484375, 'learning_rate': 4.9822065114245345e-06, 'epoch': 0.23}
{'loss': 3.1717, 'grad_norm': 2.8125, 'learning_rate': 4.973940296506628e-06, 'epoch': 0.24}
{'loss': 2.8742, 'grad_norm': 3.421875, 'learning_rate': 4.964111487872496e-06, 'epoch': 0.24}
{'loss': 2.7917, 'grad_norm': 3.78125, 'learning_rate': 4.952726293608335e-06, 'epoch': 0.25}


[INFO|trainer.py:3478] 2024-07-08 20:49:44,137 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-800


{'loss': 2.9103, 'grad_norm': 2.65625, 'learning_rate': 4.939791904846869e-06, 'epoch': 0.26}


[INFO|configuration_utils.py:733] 2024-07-08 20:49:44,368 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:49:44,370 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.8098, 'grad_norm': 5.34375, 'learning_rate': 4.925316491225265e-06, 'epoch': 0.26}
{'loss': 2.8852, 'grad_norm': 4.875, 'learning_rate': 4.909309195725025e-06, 'epoch': 0.27}
{'loss': 2.8348, 'grad_norm': 3.421875, 'learning_rate': 4.891780128897077e-06, 'epoch': 0.28}
{'loss': 2.6624, 'grad_norm': 2.828125, 'learning_rate': 4.8727403624757365e-06, 'epoch': 0.28}


[INFO|trainer.py:3478] 2024-07-08 20:50:03,056 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-900


{'loss': 2.6674, 'grad_norm': 1.9140625, 'learning_rate': 4.852201922385564e-06, 'epoch': 0.29}


[INFO|configuration_utils.py:733] 2024-07-08 20:50:03,287 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:50:03,288 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.7929, 'grad_norm': 2.03125, 'learning_rate': 4.830177781145528e-06, 'epoch': 0.29}
{'loss': 2.7226, 'grad_norm': 3.0625, 'learning_rate': 4.8066818496752875e-06, 'epoch': 0.3}
{'loss': 2.6794, 'grad_norm': 2.46875, 'learning_rate': 4.781728968508757e-06, 'epoch': 0.31}
{'loss': 2.54, 'grad_norm': 1.4765625, 'learning_rate': 4.755334898420507e-06, 'epoch': 0.31}


[INFO|trainer.py:3478] 2024-07-08 20:50:22,124 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1000


{'loss': 2.7562, 'grad_norm': 1.8046875, 'learning_rate': 4.72751631047092e-06, 'epoch': 0.32}


[INFO|configuration_utils.py:733] 2024-07-08 20:50:22,383 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:50:22,385 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.66, 'grad_norm': 1.9453125, 'learning_rate': 4.6982907754763905e-06, 'epoch': 0.33}
{'loss': 2.5699, 'grad_norm': 3.03125, 'learning_rate': 4.667676752911225e-06, 'epoch': 0.33}
{'loss': 2.7018, 'grad_norm': 1.8984375, 'learning_rate': 4.635693579248238e-06, 'epoch': 0.34}
{'loss': 2.7083, 'grad_norm': 2.453125, 'learning_rate': 4.6023614557454235e-06, 'epoch': 0.35}


[INFO|trainer.py:3478] 2024-07-08 20:50:41,299 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1100


{'loss': 2.6462, 'grad_norm': 4.03125, 'learning_rate': 4.567701435686405e-06, 'epoch': 0.35}


[INFO|configuration_utils.py:733] 2024-07-08 20:50:41,526 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:50:41,528 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.5091, 'grad_norm': 1.90625, 'learning_rate': 4.531735411082735e-06, 'epoch': 0.36}
{'loss': 2.677, 'grad_norm': 2.046875, 'learning_rate': 4.494486098846428e-06, 'epoch': 0.36}
{'loss': 2.4764, 'grad_norm': 1.4140625, 'learning_rate': 4.455977026441471e-06, 'epoch': 0.37}
{'loss': 2.5442, 'grad_norm': 2.40625, 'learning_rate': 4.416232517023375e-06, 'epoch': 0.38}


[INFO|trainer.py:3478] 2024-07-08 20:51:00,339 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1200


{'loss': 2.4671, 'grad_norm': 1.328125, 'learning_rate': 4.3752776740761495e-06, 'epoch': 0.38}


[INFO|configuration_utils.py:733] 2024-07-08 20:51:00,563 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:51:00,564 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4465, 'grad_norm': 2.515625, 'learning_rate': 4.333138365556401e-06, 'epoch': 0.39}
{'loss': 2.5087, 'grad_norm': 1.859375, 'learning_rate': 4.289841207554578e-06, 'epoch': 0.4}
{'loss': 2.4803, 'grad_norm': 2.015625, 'learning_rate': 4.245413547483682e-06, 'epoch': 0.4}
{'loss': 2.5494, 'grad_norm': 1.4453125, 'learning_rate': 4.199883446806048e-06, 'epoch': 0.41}


[INFO|trainer.py:3478] 2024-07-08 20:51:19,285 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1300


{'loss': 2.485, 'grad_norm': 2.296875, 'learning_rate': 4.15327966330913e-06, 'epoch': 0.42}


[INFO|configuration_utils.py:733] 2024-07-08 20:51:19,547 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:51:19,548 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4529, 'grad_norm': 1.6640625, 'learning_rate': 4.1056316329414616e-06, 'epoch': 0.42}
{'loss': 2.4682, 'grad_norm': 2.4375, 'learning_rate': 4.056969451220282e-06, 'epoch': 0.43}
{'loss': 2.6337, 'grad_norm': 1.25, 'learning_rate': 4.007323854222562e-06, 'epoch': 0.44}
{'loss': 2.4067, 'grad_norm': 2.96875, 'learning_rate': 3.956726199171441e-06, 'epoch': 0.44}


[INFO|trainer.py:3478] 2024-07-08 20:51:37,591 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1400


{'loss': 2.3615, 'grad_norm': 2.9375, 'learning_rate': 3.905208444630326e-06, 'epoch': 0.45}


[INFO|configuration_utils.py:733] 2024-07-08 20:51:37,829 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:51:37,831 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.3905, 'grad_norm': 3.4375, 'learning_rate': 3.85280313031719e-06, 'epoch': 0.45}
{'loss': 2.3238, 'grad_norm': 1.7265625, 'learning_rate': 3.7995433565517737e-06, 'epoch': 0.46}
{'loss': 2.4784, 'grad_norm': 1.953125, 'learning_rate': 3.7454627633487274e-06, 'epoch': 0.47}
{'loss': 2.4399, 'grad_norm': 2.078125, 'learning_rate': 3.6905955091698483e-06, 'epoch': 0.47}


[INFO|trainer.py:3478] 2024-07-08 20:51:56,473 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1500


{'loss': 2.4648, 'grad_norm': 1.7734375, 'learning_rate': 3.634976249348867e-06, 'epoch': 0.48}


[INFO|configuration_utils.py:733] 2024-07-08 20:51:56,744 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:51:56,745 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4148, 'grad_norm': 1.6796875, 'learning_rate': 3.578640114202398e-06, 'epoch': 0.49}
{'loss': 2.4486, 'grad_norm': 3.0625, 'learning_rate': 3.521622686840873e-06, 'epoch': 0.49}
{'loss': 2.3974, 'grad_norm': 1.71875, 'learning_rate': 3.463959980693492e-06, 'epoch': 0.5}
{'loss': 2.459, 'grad_norm': 2.078125, 'learning_rate': 3.4056884167613646e-06, 'epoch': 0.51}


[INFO|trainer.py:3478] 2024-07-08 20:52:14,775 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1600


{'loss': 2.3272, 'grad_norm': 3.625, 'learning_rate': 3.346844800613229e-06, 'epoch': 0.51}


[INFO|configuration_utils.py:733] 2024-07-08 20:52:15,044 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:52:15,046 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.5175, 'grad_norm': 2.578125, 'learning_rate': 3.287466299138262e-06, 'epoch': 0.52}
{'loss': 2.4285, 'grad_norm': 1.6875, 'learning_rate': 3.2275904170706795e-06, 'epoch': 0.52}
{'loss': 2.3712, 'grad_norm': 1.9375, 'learning_rate': 3.1672549733009396e-06, 'epoch': 0.53}
{'loss': 2.3639, 'grad_norm': 1.59375, 'learning_rate': 3.106498076988519e-06, 'epoch': 0.54}


[INFO|trainer.py:3478] 2024-07-08 20:52:33,371 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1700


{'loss': 2.3723, 'grad_norm': 1.6875, 'learning_rate': 3.045358103491357e-06, 'epoch': 0.54}


[INFO|configuration_utils.py:733] 2024-07-08 20:52:33,602 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:52:33,604 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.521, 'grad_norm': 3.265625, 'learning_rate': 2.9838736701271514e-06, 'epoch': 0.55}
{'loss': 2.4792, 'grad_norm': 2.640625, 'learning_rate': 2.9220836117818346e-06, 'epoch': 0.56}
{'loss': 2.3851, 'grad_norm': 1.8671875, 'learning_rate': 2.8600269563806304e-06, 'epoch': 0.56}
{'loss': 2.4396, 'grad_norm': 3.265625, 'learning_rate': 2.797742900237175e-06, 'epoch': 0.57}


[INFO|trainer.py:3478] 2024-07-08 20:52:51,582 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1800


{'loss': 2.3736, 'grad_norm': 1.890625, 'learning_rate': 2.7352707832962865e-06, 'epoch': 0.58}


[INFO|configuration_utils.py:733] 2024-07-08 20:52:51,817 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:52:51,819 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.3766, 'grad_norm': 1.984375, 'learning_rate': 2.6726500642860155e-06, 'epoch': 0.58}
{'loss': 2.5268, 'grad_norm': 2.265625, 'learning_rate': 2.6099202957946624e-06, 'epoch': 0.59}
{'loss': 2.4639, 'grad_norm': 2.515625, 'learning_rate': 2.5471210992885207e-06, 'epoch': 0.6}
{'loss': 2.4178, 'grad_norm': 1.609375, 'learning_rate': 2.484292140086103e-06, 'epoch': 0.6}


[INFO|trainer.py:3478] 2024-07-08 20:53:09,174 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-1900


{'loss': 2.5282, 'grad_norm': 2.75, 'learning_rate': 2.4214731023046795e-06, 'epoch': 0.61}


[INFO|configuration_utils.py:733] 2024-07-08 20:53:09,400 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:53:09,401 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4304, 'grad_norm': 1.109375, 'learning_rate': 2.358703663794939e-06, 'epoch': 0.61}
{'loss': 2.3878, 'grad_norm': 1.8515625, 'learning_rate': 2.2960234710796065e-06, 'epoch': 0.62}
{'loss': 2.4993, 'grad_norm': 1.96875, 'learning_rate': 2.2334721143118506e-06, 'epoch': 0.63}
{'loss': 2.3776, 'grad_norm': 1.765625, 'learning_rate': 2.171089102269294e-06, 'epoch': 0.63}


[INFO|trainer.py:3478] 2024-07-08 20:53:26,886 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2000


{'loss': 2.2795, 'grad_norm': 1.5546875, 'learning_rate': 2.1089138373994226e-06, 'epoch': 0.64}


[INFO|configuration_utils.py:733] 2024-07-08 20:53:27,118 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:53:27,119 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4203, 'grad_norm': 1.90625, 'learning_rate': 2.0469855909321565e-06, 'epoch': 0.65}
{'loss': 2.2409, 'grad_norm': 2.109375, 'learning_rate': 1.9853434780752977e-06, 'epoch': 0.65}
{'loss': 2.3798, 'grad_norm': 2.0625, 'learning_rate': 1.9240264333085247e-06, 'epoch': 0.66}
{'loss': 2.3994, 'grad_norm': 2.125, 'learning_rate': 1.8630731857915451e-06, 'epoch': 0.67}


[INFO|trainer.py:3478] 2024-07-08 20:53:44,912 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2100


{'loss': 2.2968, 'grad_norm': 2.25, 'learning_rate': 1.8025222349019273e-06, 'epoch': 0.67}


[INFO|configuration_utils.py:733] 2024-07-08 20:53:45,138 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:53:45,139 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.3884, 'grad_norm': 3.328125, 'learning_rate': 1.7424118259180656e-06, 'epoch': 0.68}
{'loss': 2.4467, 'grad_norm': 2.71875, 'learning_rate': 1.6827799258626443e-06, 'epoch': 0.68}
{'loss': 2.4846, 'grad_norm': 4.59375, 'learning_rate': 1.623664199521853e-06, 'epoch': 0.69}
{'loss': 2.4619, 'grad_norm': 1.7734375, 'learning_rate': 1.5651019856554995e-06, 'epoch': 0.7}


[INFO|trainer.py:3478] 2024-07-08 20:54:02,660 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2200


{'loss': 2.4477, 'grad_norm': 2.0, 'learning_rate': 1.5071302734130488e-06, 'epoch': 0.7}


[INFO|configuration_utils.py:733] 2024-07-08 20:54:02,885 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:54:02,886 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.6064, 'grad_norm': 2.421875, 'learning_rate': 1.4497856789704844e-06, 'epoch': 0.71}
{'loss': 2.5605, 'grad_norm': 2.453125, 'learning_rate': 1.3931044224027468e-06, 'epoch': 0.72}
{'loss': 2.4607, 'grad_norm': 4.125, 'learning_rate': 1.3371223048063543e-06, 'epoch': 0.72}
{'loss': 2.3538, 'grad_norm': 2.0625, 'learning_rate': 1.2818746856866688e-06, 'epoch': 0.73}


[INFO|trainer.py:3478] 2024-07-08 20:54:20,409 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2300


{'loss': 2.5046, 'grad_norm': 1.8046875, 'learning_rate': 1.2273964606240718e-06, 'epoch': 0.74}


[INFO|configuration_utils.py:733] 2024-07-08 20:54:20,645 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:54:20,647 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.5178, 'grad_norm': 2.359375, 'learning_rate': 1.1737220392331643e-06, 'epoch': 0.74}
{'loss': 2.502, 'grad_norm': 2.46875, 'learning_rate': 1.1208853234289247e-06, 'epoch': 0.75}
{'loss': 2.4659, 'grad_norm': 1.578125, 'learning_rate': 1.0689196860135234e-06, 'epoch': 0.76}
{'loss': 2.4089, 'grad_norm': 1.8515625, 'learning_rate': 1.017857949597352e-06, 'epoch': 0.76}


[INFO|trainer.py:3478] 2024-07-08 20:54:38,809 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2400


{'loss': 2.4489, 'grad_norm': 2.015625, 'learning_rate': 9.677323658675594e-07, 'epoch': 0.77}


[INFO|configuration_utils.py:733] 2024-07-08 20:54:39,062 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:54:39,063 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4048, 'grad_norm': 2.875, 'learning_rate': 9.18574595217189e-07, 'epoch': 0.77}
{'loss': 2.4408, 'grad_norm': 1.9296875, 'learning_rate': 8.704156867478037e-07, 'epoch': 0.78}
{'loss': 2.2163, 'grad_norm': 2.71875, 'learning_rate': 8.232860586582e-07, 'epoch': 0.79}
{'loss': 2.4113, 'grad_norm': 1.8515625, 'learning_rate': 7.772154790316295e-07, 'epoch': 0.79}


[INFO|trainer.py:3478] 2024-07-08 20:54:56,534 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2500


{'loss': 2.4757, 'grad_norm': 2.671875, 'learning_rate': 7.322330470336314e-07, 'epoch': 0.8}


[INFO|configuration_utils.py:733] 2024-07-08 20:54:56,778 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:54:56,779 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4691, 'grad_norm': 2.5, 'learning_rate': 6.883671745323834e-07, 'epoch': 0.81}
{'loss': 2.3801, 'grad_norm': 1.453125, 'learning_rate': 6.456455681531524e-07, 'epoch': 0.81}
{'loss': 2.4441, 'grad_norm': 2.359375, 'learning_rate': 6.040952117781954e-07, 'epoch': 0.82}
{'loss': 2.3974, 'grad_norm': 2.609375, 'learning_rate': 5.637423495031657e-07, 'epoch': 0.83}


[INFO|trainer.py:3478] 2024-07-08 20:55:14,590 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2600


{'loss': 2.2226, 'grad_norm': 1.59375, 'learning_rate': 5.24612469060774e-07, 'epoch': 0.83}


[INFO|configuration_utils.py:733] 2024-07-08 20:55:14,866 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:55:14,867 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4938, 'grad_norm': 2.265625, 'learning_rate': 4.867302857221953e-07, 'epoch': 0.84}
{'loss': 2.4004, 'grad_norm': 2.484375, 'learning_rate': 4.501197266863691e-07, 'epoch': 0.84}
{'loss': 2.4434, 'grad_norm': 1.703125, 'learning_rate': 4.148039159670722e-07, 'epoch': 0.85}
{'loss': 2.3764, 'grad_norm': 1.4921875, 'learning_rate': 3.808051597872925e-07, 'epoch': 0.86}


[INFO|trainer.py:3478] 2024-07-08 20:55:32,118 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2700


{'loss': 2.3822, 'grad_norm': 1.640625, 'learning_rate': 3.481449324901412e-07, 'epoch': 0.86}


[INFO|configuration_utils.py:733] 2024-07-08 20:55:32,373 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:55:32,374 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4647, 'grad_norm': 1.75, 'learning_rate': 3.168438629752002e-07, 'epoch': 0.87}
{'loss': 2.4091, 'grad_norm': 1.8203125, 'learning_rate': 2.869217216688622e-07, 'epoch': 0.88}
{'loss': 2.3333, 'grad_norm': 1.6484375, 'learning_rate': 2.583974080369103e-07, 'epoch': 0.88}
{'loss': 2.4302, 'grad_norm': 2.515625, 'learning_rate': 2.312889386472078e-07, 'epoch': 0.89}


[INFO|trainer.py:3478] 2024-07-08 20:55:50,132 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2800


{'loss': 2.4022, 'grad_norm': 2.03125, 'learning_rate': 2.0561343579004716e-07, 'epoch': 0.9}


[INFO|configuration_utils.py:733] 2024-07-08 20:55:50,438 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:55:50,440 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4839, 'grad_norm': 3.21875, 'learning_rate': 1.8138711666334684e-07, 'epoch': 0.9}
{'loss': 2.3147, 'grad_norm': 2.390625, 'learning_rate': 1.586252831295193e-07, 'epoch': 0.91}
{'loss': 2.4312, 'grad_norm': 3.171875, 'learning_rate': 1.3734231205048825e-07, 'epoch': 0.92}
{'loss': 2.3268, 'grad_norm': 2.515625, 'learning_rate': 1.1755164620695314e-07, 'epoch': 0.92}


[INFO|trainer.py:3478] 2024-07-08 20:56:08,362 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-2900


{'loss': 2.3033, 'grad_norm': 2.734375, 'learning_rate': 9.926578580764234e-08, 'epoch': 0.93}


[INFO|configuration_utils.py:733] 2024-07-08 20:56:08,598 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:56:08,599 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.4841, 'grad_norm': 3.078125, 'learning_rate': 8.249628059391251e-08, 'epoch': 0.93}
{'loss': 2.2977, 'grad_norm': 3.09375, 'learning_rate': 6.725372254468344e-08, 'epoch': 0.94}
{'loss': 2.5331, 'grad_norm': 3.234375, 'learning_rate': 5.3547739186319836e-08, 'epoch': 0.95}
{'loss': 2.5446, 'grad_norm': 2.1875, 'learning_rate': 4.138698751167597e-08, 'epoch': 0.95}


[INFO|trainer.py:3478] 2024-07-08 20:56:26,053 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-3000


{'loss': 2.4795, 'grad_norm': 1.6640625, 'learning_rate': 3.077914851215585e-08, 'epoch': 0.96}


[INFO|configuration_utils.py:733] 2024-07-08 20:56:26,276 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:56:26,277 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.3037, 'grad_norm': 3.796875, 'learning_rate': 2.1730922326233806e-08, 'epoch': 0.97}
{'loss': 2.5372, 'grad_norm': 1.765625, 'learning_rate': 1.4248024007502693e-08, 'epoch': 0.97}
{'loss': 2.3542, 'grad_norm': 1.609375, 'learning_rate': 8.335179914925329e-09, 'epoch': 0.98}
{'loss': 2.4343, 'grad_norm': 2.234375, 'learning_rate': 3.9961247275624446e-09, 'epoch': 0.99}


[INFO|trainer.py:3478] 2024-07-08 20:56:43,895 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-3100


{'loss': 2.3465, 'grad_norm': 1.921875, 'learning_rate': 1.2335990856710001e-09, 'epoch': 0.99}


[INFO|configuration_utils.py:733] 2024-07-08 20:56:44,131 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:56:44,132 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_va

{'loss': 2.2627, 'grad_norm': 2.109375, 'learning_rate': 4.934785965721167e-11, 'epoch': 1.0}


[INFO|trainer.py:3478] 2024-07-08 20:56:48,946 >> Saving model checkpoint to ./checkpoint_dir/checkpoint-3125
[INFO|configuration_utils.py:733] 2024-07-08 20:56:49,187 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 20:56:49,188 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embe

{'train_runtime': 580.1929, 'train_samples_per_second': 21.543, 'train_steps_per_second': 5.386, 'train_loss': 3.0464051943969728, 'epoch': 1.0}


This snippet retrieves training metrics, logs them for monitoring, and saves them for future reference. It's a common practice in machine learning to log and save metrics to understand and compare model performance.

In [24]:
metrics = train_result.metrics
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)

***** train metrics *****
  epoch                    =        1.0
  total_flos               = 16752384GF
  train_loss               =     3.0464
  train_runtime            = 0:09:40.19
  train_samples_per_second =     21.543
  train_steps_per_second   =      5.386


In [25]:
# Save the state of the trainer
trainer.save_state() 

## g) Evaluate the new model

The code sets the padding side of the tokenizer to 'left', evaluates the model using a trainer, calculates the number of evaluation samples, logs and saves the evaluation metrics. This is a common step in fine-tuning a model like Microsoft Phi3.

In [27]:
tokenizer.padding_side = 'left'
metrics = trainer.evaluate()
metrics["eval_samples"] = len(processed_test_dataset)
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

[INFO|trainer.py:3788] 2024-07-08 21:16:31,274 >> 
***** Running Evaluation *****
[INFO|trainer.py:3790] 2024-07-08 21:16:31,274 >>   Num examples = 3042
[INFO|trainer.py:3793] 2024-07-08 21:16:31,274 >>   Batch size = 4


  0%|          | 0/761 [00:00<?, ?it/s]

***** eval metrics *****
  epoch                   =        1.0
  eval_loss               =     2.4107
  eval_runtime            = 0:00:31.07
  eval_samples            =       2538
  eval_samples_per_second =     97.883
  eval_steps_per_second   =     24.487


And now save it locally before proceeding to preparation before upload to HF hub

In [28]:
trainer.save_model(train_conf.output_dir)

[INFO|trainer.py:3478] 2024-07-08 21:18:15,955 >> Saving model checkpoint to ./checkpoint_dir
[INFO|configuration_utils.py:733] 2024-07-08 21:18:16,247 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 21:18:16,248 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,

## h) Transform the model to GGUF 8-bit quantized format and load it to Hugging Face Hub    

In [29]:
!cd ../modules/ && git clone https://github.com/ggerganov/llama.cpp.git

Clonage dans 'llama.cpp'...
remote: Enumerating objects: 29214, done.[K
remote: Counting objects: 100% (8439/8439), done.[K
remote: Compressing objects: 100% (577/577), done.[K
remote: Total 29214 (delta 8177), reused 7869 (delta 7862), pack-reused 20775[K
Réception d'objets: 100% (29214/29214), 50.89 Mio | 14.83 Mio/s, fait.
Résolution des deltas: 100% (21008/21008), fait.


In [30]:
!pip install -r ../modules/llama.cpp/requirements.txt

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu, https://download.pytorch.org/whl/cpu
Collecting sentencepiece~=0.2.0 (from -r ../modules/llama.cpp/./requirements/requirements-convert_legacy_llama.txt (line 2))
  Using cached sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting torch~=2.2.1 (from -r ../modules/llama.cpp/./requirements/requirements-convert_hf_to_gguf.txt (line 3))
  Downloading https://download.pytorch.org/whl/cpu/torch-2.2.2%2Bcpu-cp311-cp311-linux_x86_64.whl (186.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m186.8/186.8 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Using cached sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Installing collected packages: sentencepiece, torch
  Attempting uninstall: sentencepiece
    Found existing installation: sentencepiece 0.1.99
    Uninstalling sentencepiece-0.1.

The code fine-tunes the Microsoft Phi3 model using a custom adapter model. It loads the base model and the adapter, then merges them into a new model, which is then unloaded from memory to save resources.

In [50]:
base_model_name = "microsoft/Phi-3-mini-128k-instruct"
adapter_model_name = "./checkpoint_dir"

model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",  # loading the model with flash-attenstion support
    torch_dtype=torch.bfloat16,
    device_map=None
)
model = AutoModelForCausalLM.from_pretrained(base_model_name, **model_kwargs)

model = PeftModel.from_pretrained(model, adapter_model_name)

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

[INFO|configuration_utils.py:733] 2024-07-08 21:46:12,253 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:733] 2024-07-08 21:46:12,459 >> loading configuration file config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/config.json
[INFO|configuration_utils.py:800] 2024-07-08 21:46:12,460 >> Model config Phi3Config {
  "_name_or_path": "microsoft/Phi-3-mini-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop"



[INFO|modeling_utils.py:3556] 2024-07-08 21:46:12,584 >> loading weights file model.safetensors from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/model.safetensors.index.json
[INFO|modeling_utils.py:1531] 2024-07-08 21:46:12,586 >> Instantiating Phi3ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1000] 2024-07-08 21:46:12,591 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 32000,
  "pad_token_id": 32000,
  "use_cache": false
}



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4364] 2024-07-08 21:46:13,912 >> All model checkpoint weights were used when initializing Phi3ForCausalLM.

[INFO|modeling_utils.py:4372] 2024-07-08 21:46:13,913 >> All the weights of Phi3ForCausalLM were initialized from the model checkpoint at microsoft/Phi-3-mini-128k-instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Phi3ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:955] 2024-07-08 21:46:14,022 >> loading configuration file generation_config.json from cache at /home/franck/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/snapshots/d548c233192db00165d842bf8edff054bb3212f8/generation_config.json
[INFO|configuration_utils.py:1000] 2024-07-08 21:46:14,023 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": [
    32000,
    32001,
    32007
  ],
  "pad_token_id": 32000
}

[INFO|tokenization_utils_base.py:2161] 2024-07-08 21:46

In [46]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Phi3ForCausalLM(
      (model): Phi3Model(
        (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
        (embed_dropout): Dropout(p=0.0, inplace=False)
        (layers): ModuleList(
          (0-31): 32 x Phi3DecoderLayer(
            (self_attn): Phi3FlashAttention2(
              (o_proj): lora.Linear(
                (base_layer): Linear(in_features=3072, out_features=3072, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3072, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=3072, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              

In [47]:
new_model = model.merge_and_unload()


In [51]:
new_model.save_pretrained("./phi3-128k-3b-v0.1")
tokenizer.save_pretrained("./phi3-128k-3b-v0.1")

[INFO|configuration_utils.py:472] 2024-07-08 21:46:37,438 >> Configuration saved in ./phi3-128k-3b-v0.1/config.json
[INFO|configuration_utils.py:769] 2024-07-08 21:46:37,439 >> Configuration saved in ./phi3-128k-3b-v0.1/generation_config.json
[INFO|modeling_utils.py:2698] 2024-07-08 21:47:07,422 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./phi3-128k-3b-v0.1/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2574] 2024-07-08 21:47:07,424 >> tokenizer config file saved in ./phi3-128k-3b-v0.1/tokenizer_config.json
[INFO|tokenization_utils_base.py:2583] 2024-07-08 21:47:07,424 >> Special tokens file saved in ./phi3-128k-3b-v0.1/special_tokens_map.json


('./phi3-128k-3b-v0.1/tokenizer_config.json',
 './phi3-128k-3b-v0.1/special_tokens_map.json',
 './phi3-128k-3b-v0.1/tokenizer.model',
 './phi3-128k-3b-v0.1/added_tokens.json',
 './phi3-128k-3b-v0.1/tokenizer.json')

In [52]:
!python ../modules/llama.cpp/convert_hf_to_gguf.py phi3-128k-3b-v0.1 \
  --outfile ../models/maximusLLM-phi3-128k-3b-v0.1.gguf \
  --outtype f16

INFO:hf-to-gguf:Loading model: phi3-128k-3b-v0.1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf

Now let's finish by uploading the model to Hugging Face Hub

In [54]:
from huggingface_hub import HfApi
api = HfApi(token=os.environ['HF_TOKEN'])

In [56]:
model_id = "awels/maximusLLM-3b-128k-gguf"
api.create_repo(model_id, exist_ok=True, repo_type="model")

RepoUrl('https://huggingface.co/awels/maximusLLM-3b-128k-gguf', endpoint='https://huggingface.co', repo_type='model', repo_id='awels/maximusLLM-3b-128k-gguf')

In [61]:
api.upload_file(
    path_or_fileobj="../models/maximusLLM-phi3-128k-3b-v0.1.gguf",
    path_in_repo="maximusLLM-phi3-128k-3b-v0.1.gguf",
    repo_id=model_id,
)

maximusLLM-phi3-128k-3b-v0.1.gguf:   0%|          | 0.00/7.64G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/awels/maximusLLM-3b-128k-gguf/commit/f3ec0e77b755d375849a3465632bcffb13d23f2c', commit_message='Upload maximusLLM-phi3-128k-3b-v0.1.gguf with huggingface_hub', commit_description='', oid='f3ec0e77b755d375849a3465632bcffb13d23f2c', pr_url=None, pr_revision=None, pr_num=None)