## Setup and Selecting Model
Configure the base model and a few other variables that we'll use later.

In [None]:
model = '3B' # Pick your poison'


if model == '7B':
    model_name = ("togethercomputer/RedPajama-INCITE-Base-7B-v0.1","togethercomputer/RedPajama-INCITE-Base-7B-v0.1")
    run_name = 'redpj7B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj7B-lora-int8-alpaca'
    output_dir = 'redpj7B-lora-int8-alpaca-results'
else: #3B
    model_name = ("togethercomputer/RedPajama-INCITE-Base-3B-v1","togethercomputer/RedPajama-INCITE-Base-3B-v1")
    run_name = 'redpj3B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj3B-lora-int8-alpaca'
    output_dir = 'redpj3B-lora-int8-alpaca-results'

model_name[1],dataset,peft_name,run_name

('togethercomputer/RedPajama-INCITE-Base-3B-v1',
 'johnrobinsn/alpaca-cleaned',
 'redpj3B-lora-int8-alpaca',
 'redpj3B-lora-int8-alpaca')

Install the required dependencies.

In [None]:
def install_dependencies():
    !pip install -Uqq  git+https://github.com/huggingface/peft.git
    !pip install -Uqq transformers datasets accelerate bitsandbytes
    # !pip install -Uqq wandb

# uncomment the following line to install the required dependencies
install_dependencies()


  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m78.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m85.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

## Tokenizer
The tokenizer converts words into a list/tensor of numbers so that the model can process them.  Each language model has been trained using a specific tokenizer.Just use the AutoTokenizer class to create an instance of the correct tokenizer by just specifying the model name.

In [None]:
from transformers import AutoTokenizer

print("Loading tokenizer for model: ", model_name[1])
tokenizer = AutoTokenizer.from_pretrained(model_name[1],add_eos_token=True)
tokenizer.pad_token_id = 0

Loading tokenizer for model:  togethercomputer/RedPajama-INCITE-Base-3B-v1


Downloading (…)okenizer_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

One problem that found with many of the finetuning scripts and notebooks found online is that the "end-of-stream" handling is not done correctly, so in many cases the finetuned models don't know when to stop emitting tokens and tend to "blabber" on.  here is to explicitly add a new token to represent end-of-stream, &lt;eos&gt; and use that eos token during training to teach the model when it should stop. Then during inference, we can use that token to recognize when the model is done responding.

In [None]:
tokenizer.add_special_tokens({'eos_token':'<eos>'})
print('eos_token_id:',tokenizer.eos_token_id)

eos_token_id: 50277


In [None]:
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data in the HIV dataset

def tokenize(prompt, tokenizer,add_eos_token=True):
    result = tokenizer(
        prompt+"<eos>",  # add the end-of-stream token
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
    return {
        "input_ids": result["input_ids"],
        "attention_mask": result["attention_mask"],
    }


Let's give it a quick try and note the <eos> token id at the end of the sequence.

In [None]:
tokenizer('hi there<eos>')

{'input_ids': [5801, 627, 50277], 'attention_mask': [1, 1, 1]}

## Dataset

Here data import in json formate after that convert into dataset formate

In [None]:
# Python program to read
# json file

import json

# Opening JSON file
f = open('/content/clean_Data.json')

# returns JSON object as
# a dictionary
data = json.load(f)

# Iterating through the json
# list


# Closing file
f.close()


In [None]:
from datasets import Dataset, DatasetDict

# Your original list


# Convert the list to a Dataset
dataset = Dataset.from_dict({"instruction": [entry['instruction'] for entry in data],
                             "output": [entry['output'] for entry in data]})

# Create a DatasetDict with a key and the dataset
Hiv_Data = DatasetDict({"my_dataset": dataset})

# print(dataset_dict.keys())
print(Hiv_Data)


DatasetDict({
    my_dataset: Dataset({
        features: ['instruction', 'output'],
        num_rows: 787
    })
})


We can see that the dataset consists of 787 rows with the following features ['instruction','input','output'].  Let's take a look at one.

In [None]:
Hiv_Data['my_dataset'][4]

{'instruction': 'I have posted before not sure if it went through or not. So I am 20 Female and super anxious im a nursing student but I drew blood from a patient and without taking off or changing the gloves , I openedpicked up the cleaning wipe to wipe the chair down and my thumb started to burn super bad, I panicked and didnt look at the glove and just took it off and started squeezing my finger there was no blood and no visible mark at the time, showed my instructor she said I probably scraped it opening the wipes container and that she didnt see anything so not to worry and that I wouldve felt it if I got stuck by a dirty needle with the patient but I cannot stop thinking about it. And when I got home I put my thumb on a flash light and you can see a deeper line that could be needle sized, that looks like a scrape. It only shows up when my thumb is pressed into the light. What are your thoughts .  I even took a clean needle and tried to replicate it on a different finger because I

We can see an item that includes an 'instruction' to direct our model.  An optional 'input' which provides context to the instruction.  And then an expected output for the model.


But we can't directly use this JSON object to train our model.  Our model can only process an ordered sequence of tokens that represent words.  So we use a "prompt template" to convert each of these JSON objects in our dataset into a sequence of words.  The prompt template follows a consistent pattern.

In [None]:
def generate_prompt(data_point):
  return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:
{data_point["output"]}"""


Let's see what what our example looks like when "templatized".

In [None]:
print(generate_prompt(Hiv_Data['my_dataset'][5]))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Sex Male Age 28 Height 6 ft Weight 196 Lbs no medical conditions and no medication Hi Docs, so a month ago I was vacationing in Cuba, on my last day I got stung by a sea urchin in my right foot. During my flight back in the plane, I started getting strong intense muscle pains in my right arm starting from the wrist to shoulder, a pain so intense it felt like my arm was going to explode. This pain went on for 2 days waking me up screaming and crying at night, so I went to the ER, the doctor says I have a shoulder inflammation and gave me naproxen. That did not help. Then I went to my family physician and she said it is from my neck and recommended physiotherapy. That did nothing. Fast forward a few days I started getting chills and feeling very unwell, my CRP blood test came back at 69, doctor said I have an infection somewhere but she was on vacation. No fucks giv

The exact wording of the template is somewhat arbitrary.  It's more of a consistent pattern that after training will drive the model into responding similarly when exposed to a similar prompt.  You should be able to pick out the "instruction", "input", and "output" from the example.  

It is important that the output from the dataset is at the end of templatized prompt, since at inference time we will only provide the prompt up to **but not including the output**.  We'll expect our model to respond to our instruction on its own.

We now split out a validation dataset from our training dataset. so that we can track how well the finetuning process is learning to generalize to unseen prompts and so that we make sure we're only checkpointing our model when the validation loss is improving.

In [None]:
VAL_SET_SIZE = 100
train_val = Hiv_Data["my_dataset"].train_test_split(
    test_size=VAL_SET_SIZE,shuffle=True, seed=42
)
train_data = train_val["train"]
val_data = train_val["test"]

In [None]:
train_data.shape

(687, 2)

We prepare the training dataset and the validation dataset by running the data through the prompt templating process and then by tokenizing the prompts.

In [None]:
train_data = train_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))
val_data = val_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))

Map:   0%|          | 0/687 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

## Load and Configure the Model for Training

Load the specified RedPajama base model from the HuggingFace hub.

_Note: Llama, Redpajama and other decoder-only models are supported by the AutoModelForCausalLM class.

In [None]:
import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

# Define LoRA Config
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Load the language model using Hugging Face Transformers
# model_name = ("togethercomputer/RedPajama-INCITE-Base-3B-v1", "togethercomputer/RedPajama-INCITE-Base-3B-v1")
model = AutoModelForCausalLM.from_pretrained(
    model_name[0],
    device_map="auto",
    offload_folder="offload",
    offload_state_dict=True,
    torch_dtype=torch.float16
)

# Prepare int-8 model for training
model = prepare_model_for_int8_training(model)

# Add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()


Downloading (…)lve/main/config.json:   0%|          | 0.00/604 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.69G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]



trainable params: 5,242,880 || all params: 2,781,107,200 || trainable%: 0.18851772416395887


Now, we can prepare our model for the LoRA int-8 training using the HF peft library.

_Note: After installing the Lora Adapters into the model notice the significant reduction in the number of trainable paramters._

We'll leverage the training loop from the transformers library since it does a pretty good job with handling the details.

In [None]:
import transformers
eval_steps = 200
save_steps = 200
logging_steps = 20

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=transformers.TrainingArguments(
        num_train_epochs=3,
        learning_rate=3e-4,
        logging_steps=logging_steps,
        evaluation_strategy="steps",
        save_strategy="steps",
        eval_steps=eval_steps,
        save_steps=save_steps,
        output_dir=output_dir,
        # report_to=report_to if report_to else "none",
        save_total_limit=3,
        load_best_model_at_end=True,
        push_to_hub=False,
        auto_find_batch_size=True
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

## Train
Run the training loop.

In [None]:
trainer.train()

You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
200,2.4425,2.544533
400,2.1851,2.481153
600,1.8986,2.425902
800,1.5601,2.389363
1000,1.6116,2.349982


TrainOutput(global_step=1032, training_loss=2.0765993761461834, metrics={'train_runtime': 1774.5313, 'train_samples_per_second': 1.161, 'train_steps_per_second': 0.582, 'total_flos': 8395429844090880.0, 'train_loss': 2.0765993761461834, 'epoch': 3.0})

## Save the Trained Adpater Model to Disk

Now that we've trained the model we'll want to save our weights.  First I demonstrate how to save them to disk.

In [None]:
# Save our LoRA model & tokenizer results
trainer.model.save_pretrained(peft_name)
tokenizer.save_pretrained(peft_name)

('redpj3B-lora-int8-alpaca/tokenizer_config.json',
 'redpj3B-lora-int8-alpaca/special_tokens_map.json',
 'redpj3B-lora-int8-alpaca/tokenizer.json')

In [None]:
def install_dependencies():
    !pip install -Uqq  git+https://github.com/huggingface/peft.git
    !pip install -Uqq transformers datasets accelerate bitsandbytes
    # !pip install -Uqq wandb

# uncomment the following line to install the required dependencies
install_dependencies()
model = '7' # Pick your poison'


if model == '7B':
    model_name = ("togethercomputer/RedPajama-INCITE-Base-7B-v0.1","togethercomputer/RedPajama-INCITE-Base-7B-v0.1")
    run_name = 'redpj7B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj7B-lora-int8-alpaca'
    output_dir = 'redpj7B-lora-int8-alpaca-results'
else: #3B
    model_name = ("togethercomputer/RedPajama-INCITE-Base-3B-v1","togethercomputer/RedPajama-INCITE-Base-3B-v1")
    run_name = 'redpj3B-lora-int8-alpaca'
    dataset = 'johnrobinsn/alpaca-cleaned'
    peft_name = 'redpj3B-lora-int8-alpaca'
    output_dir = 'redpj3B-lora-int8-alpaca-results'

model_name[1],dataset,peft_name,run_name

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m77.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

('togethercomputer/RedPajama-INCITE-Base-3B-v1',
 'johnrobinsn/alpaca-cleaned',
 'redpj3B-lora-int8-alpaca',
 'redpj3B-lora-int8-alpaca')

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
# load base LLM model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name[0],
    load_in_8bit=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name[1])
tokenizer.pad_token_id = 0
tokenizer.add_special_tokens({'eos_token':'<eos>'})

model.eval()


GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50432, 2560)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear8bitLt(in_features=2560, out_features=7680, bias=True)
          (dense): Linear8bitLt(in_features=2560, out_features=2560, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear8bitLt(in_features=2560, out_features=10240, bias=True)
          (dense_4h_to_h): Linear8bitLt(in_features=10240, out_fe

Here is the prompt template we'll use for inference.

_Note: It's important that it's identical to one we used for training above, but it omits the "output/response" as our model will generate that for us._

In [None]:
def generate_prompt(data_point):
  return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:"""


Here is a small utility function that lets us easily prompt our model with an instruction and an optional input.  It handles templating the prompt, tokenizing the templatized prompt, decoding the result and then finally stripping off the prompt from the response and just leaving us with the model response.

In [None]:
def generate(instruction,input=None,maxTokens=256):
    prompt = generate_prompt({'instruction':instruction,'input':input})
    input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids, max_new_tokens=maxTokens,
                             do_sample=True, top_p=0.9,pad_token_id=tokenizer.eos_token_id,
                             forced_eos_token_id=tokenizer.eos_token_id)
    outputs = outputs[0].tolist()
    # Stop decoding when hitting the EOS token
    if tokenizer.eos_token_id in outputs:
        eos_index = outputs.index(tokenizer.eos_token_id)
        decoded = tokenizer.decode(outputs[:eos_index])
        # Don't show the prompt template
        sentinel = "### Response:"
        sentinelLoc = decoded.find(sentinel)
        if sentinelLoc >= 0:
            print(decoded[sentinelLoc+len(sentinel):])
        else:
            print('Warning: Expected prompt template to be emitted.  Ignoring output.')
    else:
        print('Warning: no <eos> detected ignoring output')

### Generating using the Base Model

This demonstrates the behavior of the RedPajama model with no finetuning applied.

**BEFORE FINETUNING**

In [None]:
torch.manual_seed(42)
generate('i had sex with a guy who is HIV positive, is there any chance i also get also HIV?',maxTokens=300)


1. According to the study, it is very possible to contract HIV from non-infected sexual partner and the risk of transmission increases if there are other infected persons in your social environment.
2. There is a high possibility of HIV transmission if there is no condom. Therefore, if you are sexually active, use a condom. If you do not want to use a condom, then you should use a barrier method.
3. HIV can be transmitted through various body fluids like blood, semen, vaginal fluid, and saliva.
4. In case you are in a monogamous relationship and you have not had sex, you should avoid vaginal sex with the partner as much as possible. If you have a vaginal sex, then use a condom.

## Reading Passage 10

## The Power of Language

The power of language is one of the most underrated tools in the world. We see its effects in everyday life as well as in scientific research. Scientists use the power of language to help them describe their subjects of study. For example, in the field of medici

### Load the LoRA Adapter

As you we see the generated text doesn't seem very responsive to the prompt.  Now let's load the trained LoRA adapter and see what happens.

_Note: Here you can either load up my pretrained Lora adapter from HuggingFace hub.  Or if you trained your own adapter above you can uncomment the specified line below to load your adapter from disk._

In [None]:
# peft_model_id = f'johnrobinsn/{peft_name}' # By default use my pretrained adapter weights
peft_model_id = '/content/drive/MyDrive/llama(hiv)'# Uncomment to use locally saved adapter weights if you trained above

# Load the LoRA model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()

print("Peft model adapter loaded")

Peft model adapter loaded


let's try the same prompt again.

**AFTER FINETUNING**

In [None]:
x='what are the main symptoms of HIV?'

In [None]:
torch.manual_seed(42)
x=generate(x,maxTokens=300)


the main symptoms of HIV include tiredness, flu-like illness (feeling generally unwell with high temperature or shivering), sore throat, muscle and joint pain, flu-like illness, headache, mouth ulcers or sores, mouth ulcers and skin changes that may be fluid-filled, especially around the nose or in the genital area (vulva or penis).


As you can see this response is much much more responsive to the provided instruction.

### A Few More Prompts