## Describe your model -> fine-tuned microsoft/phi-2
By Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

#Data generation step

Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

Run this to generate the dataset.

Now let's put our examples into a dataframe and turn them into a final pair of datasets.

Split into train and test sets.

# Install necessary libraries

In [2]:
!pip install -q accelerate peft bitsandbytes transformers trl einops

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel #, prepare_model_for_int8_training
from trl import SFTTrainer

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
%ls drive/MyDrive/ERA_V1/S27/data

oasst1_test.jsonl  oasst1_train.jsonl


# Define Hyperparameters

In [5]:
model_name = "microsoft/phi-2" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument
dataset_name = "drive/MyDrive/ERA_V1/S27/data/oasst1_train.jsonl"
new_model = "microsoft-phi-2-custom" #"llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "drive/MyDrive/ERA_V1/S27/results"
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = False
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 200
logging_steps = 5
max_seq_length = 256 #None
packing = False
device_map = {"": 0}

#Load Datasets and Train

In [6]:
# Load datasets
train_dataset = load_dataset('json', data_files='drive/MyDrive/ERA_V1/S27/data/oasst1_train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='drive/MyDrive/ERA_V1/S27/data/oasst1_test.jsonl', split="train")

# # Preprocess datasets
# train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
# valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)


Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [7]:
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
# # model.use_gradient_checkpointing = False
# # print(model.)
# # model.config.use_gradient_checkpointing = False
# # model = prepare_model_for_int8_training(model, use_gradient_checkpointing=gradient_checkpointing)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [8]:
model

PhiForCausalLM(
  (model): PhiModel(
    (embed_tokens): Embedding(51200, 2560)
    (layers): ModuleList(
      (0-31): 32 x PhiDecoderLayer(
        (self_attn): PhiAttention(
          (q_proj): Linear4bit(in_features=2560, out_features=2560, bias=True)
          (k_proj): Linear4bit(in_features=2560, out_features=2560, bias=True)
          (v_proj): Linear4bit(in_features=2560, out_features=2560, bias=True)
          (dense): Linear4bit(in_features=2560, out_features=2560, bias=True)
        )
        (mlp): PhiMLP(
          (activation_fn): NewGELUActivation()
          (fc1): Linear4bit(in_features=2560, out_features=10240, bias=True)
          (fc2): Linear4bit(in_features=10240, out_features=2560, bias=True)
        )
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (rotary_emb): PhiRotaryEmbedding()
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (final_layernorm): 

In [9]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["Wqkv","fc1","fc2"]
)

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

In [None]:

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps= 20 #5,  # Evaluate every 20 steps
    # gradient_checkpointing=gradient_checkpointing
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()
trainer.model.save_pretrained(new_model)



Map:   0%|          | 0/7856 [00:00<?, ? examples/s]

Map:   0%|          | 0/418 [00:00<?, ? examples/s]

You are using 8-bit optimizers with a version of `bitsandbytes` < 0.41.1. It is recommended to update your version as a major bug has been fixed in 8-bit optimizers.
You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
20,1.3636,1.70162
40,1.8437,1.677984
60,1.2647,1.680656
80,1.7658,1.663594
100,1.9717,1.672795
120,1.2962,1.659232
140,1.8471,1.64825
160,1.4165,1.651985
180,1.6688,1.64336
200,2.1377,1.657212


In [None]:
trainer.model.save_pretrained("drive/MyDrive/ERA_V1/S27/models/"+new_model)

In [None]:
# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST]What is monospony[/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])



[INST]What is monospony[/INST]Monospony is a term used in the field of linguistics to describe a language that has only one grammatical gender. In other words, all nouns in the language are either masculine or feminine, and there is no third gender. This is in contrast to polygendered languages, which have multiple grammatical genders.

For example, in Spanish, all nouns are either masculine or feminine, and there is no third gender. In contrast, in many languages spoken in the Caucasus region, such as Georgian and Abkhaz, there are three grammatical genders: masculine, feminine, and neuter.

Monospony is a linguistic phenomenon that can be observed in a variety of languages, and it is often associated with the historical development of a language. For example, many Indo-European languages, such as English and French, have evolved from earlier languages that had multiple grammatical genders, and have gradually lost one or


#Run Inference

In [None]:
from transformers import pipeline

prompt = f"[INST]What is monospony in economics?[/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

Monospony is a concept in economics that refers to a situation where a single factor of production, such as labor or capital, is the only variable that affects the output of a firm or industry. In other words, the output of a firm or industry is determined solely by the amount of the single factor of production that is used.

For example, if a firm produces widgets and the only factor of production that affects the output is the number of workers, then the firm is said to be


#Merge the model and store in Google Drive

In [None]:
# # Merge and save the fine-tuned model
# from google.colab import drive
# drive.mount('/content/drive')

model_path = "drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

The repository for microsoft/phi-2 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/phi-2.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

('drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/tokenizer_config.json',
 'drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/special_tokens_map.json',
 'drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/vocab.json',
 'drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/merges.txt',
 'drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/added_tokens.json',
 'drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom/tokenizer.json')

# Load a fine-tuned model from Drive and run inference

In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers trl einops

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.1/141.1 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.9/78.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [None]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%ls drive/MyDrive/Shared

[0m[01;34m'Anji Farewell '[0m/
 [01;36mAssignment3_FaceAging_CAAE.ipynb[0m@
 [01;34mBoltzmannMachine[0m/
 Book1.xlsx
 [01;34mBooks[0m/
 [01;36mCAAE[0m@
[01;34m'Colab Notebooks'[0m/
'Copy of custom_depth_mask.zip'
 [01;34mCroppedYale[0m/
 [01;34mCroppedYaleGeneratedOutput[0m/
 dataset.csv
[01;34m'DL A-Z'[0m/
 [01;34mEND2[0m/
 [01;36mEthicDayGM[0m@
 [01;36mEthnicDayNov2019[0m@
 [01;34mEVA4[0m/
 FA__CAAE.ipynb
 [01;34mFaceAging[0m/
 football-events.zip
 GARIMA_MAHATO_GeneratingIndianFaces.ipynb
 Garima_Mahato_Resume.docx
 GARIMA_MAHATO_Session7.ipynb
 [01;34mGarima_TeamLunch_31Dec2021[0m/
'GitHub - llSourcell Learn_Computer_Vision: This is the curriculum for "Learn Computer Vision" by Siraj Raval on Youtube'
 [01;34mGradCam[0m/
'https:  www.lin.txt'
 i2.PNG
 [01;34mImageNet[0m/
 IMG-20180616-WA0000.jpg
 IMG-20180616-WA0001.jpg
 IMG-20180617-WA0000.jpg
 IMG-20180617-WA0001.jpg
 IMG-20180617-WA0002.jpg
 IMG-20180617-WA0003.jpg
 IMG-20180617-WA0004.jpg
 IMG

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "drive/MyDrive/ERA_V1/S27/results/microsoft-phi2-custom"   # change to the path where your model is saved

phi_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
phi_tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

MessageError: Error: credential propagation was unsuccessful

In [None]:
from transformers import pipeline

prompt = "What is monopoly in economics?"  # change to your desired prompt
gen = pipeline('text-generation', model=phi_model, tokenizer=phi_tokenizer)
result = gen(prompt)
print(result[0]['generated_text'])

NameError: name 'phi_model' is not defined