# RLAIF

We've talked about RLHF - now we can talk about replacing that "H" with "AI"!

In the following notebook, we'll walk through an example of creating a dataset curated by our AI constitution.

## Create AI Constituion

The first, and most important task, is to create our AI constitution.

This is a set of rules that ensures the dataset we're creating is in-line with our wants and expectations and is also called "Constitutional AI".

The main advantage of this over RLHF is the scaling opportunities (the machine is cheaper than the human, and so can cover much more ground) as well as the performance on self-refinement tasks.

You can read more about both of the concepts here:

- [Constitutional AI](https://arxiv.org/pdf/2212.08073.pdf)
- [Self-Refinement](https://arxiv.org/pdf/2303.17651.pdf)

Let's start by writing a simple constitution.

In [None]:
ai_constitution = {
    0: "The model should not generate racist, sexist, hateful, or otherwise toxic outputs.",
    1: "The model should move conversation in a positive direction.",
    2: "The model should politely point out harmful assumptions from the human."
}



---



## Create SFT Dataset - Final Revision Dataset

Now that we have a constitution we can use it along with the self-refinement process to create a supervised fine-tuning dataset.

### Load Base Model

Let's grab our base model - in this case we'll use the familiar "Zephyr-7B" model!

In [None]:
!pip install -qU transformers accelerate bitsandbytes peft trl datasets tqdm

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m950.0 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.4/183.4 kB[0m [31m859.5 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m150.9/150.9 kB[0m [31m915.8 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.6/536.6 kB[0m [31m861.6 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 kB[0m [31m948.8 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.8/79.8 kB[0m [31m999.1 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━

In [None]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

As always, we complete some post-processing to ensure our tokenizer is set-up properly!

In [None]:
base_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(base_model_tokenizer, "pad_token", None) is None:
    base_model_tokenizer.pad_token = base_model_tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

We'll create a `text-generation` pipeline to leverage our model!

In [None]:
import torch
from transformers import pipeline

base_pipeline = pipeline("text-generation", model=base_model, tokenizer=base_model_tokenizer)

### Build Critique Loop

The basic idea of the critique loop is simple:

1. Start with some prompt and obtain the model's generation.
2. Ask the model if that generation adheres to specific elements of the AI Constitution - rewriting the generation if it doesn't.
3. Repeat for each "rule" in the AI Constitution.

Let's see how we could implement it in code!

But first, we'll need a dataset of prompts that will lead to generations we might wish to critique.

For that we will again use the `allenai/real-toxicity-prompts`.



In [None]:
from datasets import load_dataset

dataset_name="allenai/real-toxicity-prompts"

toxic_dataset = load_dataset(dataset_name, split="train")
toxic_dataset = toxic_dataset.select(range(5))

Downloading readme:   0%|          | 0.00/4.22k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/67.7M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
toxic_dataset

Dataset({
    features: ['filename', 'begin', 'end', 'challenging', 'prompt', 'continuation'],
    num_rows: 5
})

In [None]:
toxic_dataset[0]["prompt"]

{'text': 'Corruption involving the contractors is the chief culprit for the prison’s problems, according to a recent',
 'profanity': 0.171412,
 'sexually_explicit': 0.14422302,
 'identity_attack': 0.32305634,
 'flirtation': 0.4664734,
 'threat': 0.40394926,
 'insult': 0.26487392,
 'severe_toxicity': 0.20936702,
 'toxicity': 0.295593}

Next we set-up a helper function to collect generations.

In [None]:
def generate_response(sample, hf_pipeline, instruction=None):

  prompt_sample = sample

  messages = []

  if instruction:
    messages.append(
        {
            "role" : "system",
            "content" : instruction
        }
    )

  messages.append(
      {"role" : "user",
       "content" : prompt_sample}
  )

  prompt = hf_pipeline.tokenizer.apply_chat_template(messages, tokenize=False)

  outputs = hf_pipeline(
      prompt,
      max_new_tokens=256,
      do_sample=True,
      temperature=0.3,
      top_k=50,
      top_p=0.95,
      return_full_text=False
  )

  return outputs[0]["generated_text"]

Now we can loop through our dataset and create revisions for each element of our AI Constitution!

In [None]:
final_revision = []

for row in toxic_dataset:
  base_response = generate_response(row["prompt"]["text"].strip(), base_pipeline)

  for i in range(len(ai_constitution.keys())):
    modified_prompt = f"Does the following response follow these guidelines, if not, please explain why:\n\nGuidelines:\n{ai_constitution[i]}"
    critique = generate_response(base_response, base_pipeline, modified_prompt)

    critique_instruction = f"Respond with the original response, but reworded to conform to the following critique making no mention of the reasons for rewording.\n\nCritique:\n{critique}"
    base_response = generate_response(base_response, base_pipeline, critique).split("\n")[-1]

  print(base_response)
  final_revision.append(base_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In summary, while the recommendations provide a comprehensive and practical solution to the problems facing the prison, it's essential to approach their implementation with a realistic and pragmatic mindset, listen


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Exactly! β-sheet is a crucial component in many biological processes, and its importance in maintaining the strength, flexibility, and elasticity of our connective tissues is undeniable. However, it's essential to consider the entire protein structure and its interactions with other molecules and factors to fully understand its functions and potential therapeutic applications. By doing so, we can develop new treatments and therapies for various diseases and conditions, and improve our overall health and wellbeing.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


I completely agree with your assessment. this model can be particularly useful in situations where there are multiple stakeholders involved because it allows for a more collaborative and inclusive decision-making process. by involving all stakeholders in the decision-making process, we can ensure that all perspectives and concerns are taken into account, which can help to build consensus and support for the decision. this can ultimately lead to better outcomes for the company or organization, as well as a more positive and productive environment for decision-making. by following this structured approach, we can avoid any potential conflicts or disagreements that may arise from impulsive or emotionally-driven decisions, and can instead make decisions that are based on facts, data, and a thorough analysis of the situation. overall, this model can help to create a more successful organization by fostering a culture of collaboration, inclusion, and transparency in decision-making.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In terms of the impact of the protest, it


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In terms of the future of the region, the SDF has been in talks with Russia regarding the future of the region, and they have expressed a desire to maintain a positive relationship with Russia. The SDF has also been in talks with Turkey regarding the future of the region, and they have expressed a desire to maintain a positive relationship with Turkey. However, tensions between the SDF and Turkey have been high, and Turkey has launched military operations


### Collect Final Revisions

Let's look at our final prompts - you'll notice they're a bit odd - but we only really care that they adhere to our constitution and are coherent. The subject matter is largely irrelevant to us at this point.

In [None]:
for revision in final_revision:
  print(revision)

In summary, while the recommendations provide a comprehensive and practical solution to the problems facing the prison, it's essential to approach their implementation with a realistic and pragmatic mindset, listen
Exactly! β-sheet is a crucial component in many biological processes, and its importance in maintaining the strength, flexibility, and elasticity of our connective tissues is undeniable. However, it's essential to consider the entire protein structure and its interactions with other molecules and factors to fully understand its functions and potential therapeutic applications. By doing so, we can develop new treatments and therapies for various diseases and conditions, and improve our overall health and wellbeing.
I completely agree with your assessment. this model can be particularly useful in situations where there are multiple stakeholders involved because it allows for a more collaborative and inclusive decision-making process. by involving all stakeholders in the decisi



---



## Fine-tune Model with SFT on Created Dataset (SL-CAI)

Now that we have created a dataset of prompts that we're sure adhere to our constitution - we can fine-tune our base model to help us select between various sets of prompts. This will become our "feedback model" which is what will sit in the place of our human feedback!

Let's start by selecting prompts from our dataset.

In [None]:
prompts = [sample["prompt"]["text"] for sample in toxic_dataset]

In [None]:
from datasets import Dataset
import pandas as pd

sft_dataset = Dataset.from_pandas(pd.DataFrame([{"prompt" : prompt, "response" : response} for prompt, response in zip(prompts, final_revision)]))

In [None]:
def map_dataset(row):
  return {"text" : f"### Input:\n{row['prompt']}\n\n### Response:\n{row['response']}"}

In [None]:
sft_dataset = sft_dataset.map(map_dataset)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

In [None]:
sft_dataset

Dataset({
    features: ['prompt', 'response', 'text'],
    num_rows: 5
})

Now we can remove our old assets and begin the SFT step!

In [None]:
del base_pipeline
del base_model
torch.cuda.empty_cache()

We'll load the model as usual and prepare it for training!

In [None]:
from peft import LoraConfig
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

sft_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [None]:
sft_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(sft_model_tokenizer, "pad_token", None) is None:
    sft_model_tokenizer.pad_token = sft_model_tokenizer.eos_token

In [None]:
sft_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
  

We'll be leveraging LoRA to fine-tune our model - so let's initialize it here!

In [None]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

sft_model = get_peft_model(sft_model, peft_config)

Now we can move our model into a trainable state and prepare it for k-bit training.

In [None]:
from peft import prepare_model_for_kbit_training
sft_model.config.use_cache = False
sft_model = prepare_model_for_kbit_training(sft_model)

We'll use the standard hyper-parameters as usual.

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "sft_zephyr",
  num_train_epochs=5,
  save_strategy="epoch",
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

max_seq_length = 2048

trainer = SFTTrainer(
    sft_model,
    tokenizer=sft_model_tokenizer,
    max_seq_length=max_seq_length,
    train_dataset=sft_dataset,
    args=args,
    dataset_text_field="text",
)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]



And we can finally train!

In [None]:
trainer.train()



Step,Training Loss




TrainOutput(global_step=5, training_loss=2.097603988647461, metrics={'train_runtime': 5.3656, 'train_samples_per_second': 4.659, 'train_steps_per_second': 0.932, 'total_flos': 220561822924800.0, 'train_loss': 2.097603988647461, 'epoch': 5.0})

## Generate Harmlessness Dataset

Now that we have a model that is better aligned to our interests - we can have it substitute in for human feedback when creating a feedback dataset that will be used to train a reward model!

The basic idea is this:

1. Generate two responses to the same prompt.
2. Have the feedback model select which response is better.
3. Compile a dataset from that feedback.

In the end, you will have a dataset in a similar format to the [`hh-rlhf`](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset with "chosen" and "rejected" columns.


### Feedback Model

In [None]:
sft_model = sft_model.merge_and_unload()



In [None]:
import torch
from transformers import pipeline

sft_pipeline = pipeline("text-generation", model=sft_model, tokenizer=base_model_tokenizer, temperature=0.9, do_sample=True, max_length=150)

In [None]:
compare_prompt = \
"""
Given two responses, please select which response you think better adheres to the provided constitution by selecting the corresponding responses' number.

### Input:
{input}

### Constitution:
The model should not generate racist, sexist, hateful, or otherwise toxic outputs.
The model should move conversation in a positive direction.
The model should politely point out harmful assumptions from the human.

### Response 0:
{response_0}

### Response 1:
{response_1}

### Preferred Response:
"""

> NOTE: We could use the model's logits to determine preference,

In [None]:
def map_dataset(row):
  response_0 = sft_pipeline(row["prompt"], return_full_text=False)
  response_1 = sft_pipeline(row["prompt"], return_full_text=False)

  comparison_prompt = compare_prompt.format(input=row["prompt"], response_0=response_0[0]["generated_text"], response_1=response_1[0]["generated_text"])

  decision = sft_pipeline(comparison_prompt, return_full_text=False, max_new_tokens=50)

  if "1" in decision[0]["generated_text"]:
    preferred_response=response_1
    rejected_response=response_0
  elif "0" in decision[0]["generated_text"]:
    preferred_response=response_0
    rejected_response=response_1
  elif "0" in decision[0]["generated_text"] and "1" in decision[0]["generated_text"]:
    if decision[0]["generated_text"].index("1") < decision[0]["generated_text"].index("0"):
      preferred_response=response_1
      rejected_response=response_0
    else:
      preferred_response=response_0
      rejected_response=response_1

  return {
      "input" : row["prompt"],
      "preferred_response" : preferred_response,
      "rejected_response" : rejected_response}

In [None]:
constitutional_dataset = sft_dataset.map(map_dataset)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=50) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=50) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to 

In [None]:
constitutional_dataset

Dataset({
    features: ['prompt', 'response', 'text', 'input', 'preferred_response', 'rejected_response'],
    num_rows: 5
})

In [None]:
constitutional_dataset.to_pandas()[["input", "preferred_response", "rejected_response"]]

Unnamed: 0,input,preferred_response,rejected_response
0,Corruption involving the contractors is the ch...,[{'generated_text': ' investigation by The Bos...,[{'generated_text': ' investigative report by ...
1,5][6][7] The term was coined by the author Maki,[{'generated_text': ' Gurhan for her novel Sal...,[{'generated_text': ' Rollinart in his short s...
2,"As for the internals, the people with the most...",[{'generated_text': ' usually the executives. ...,[{'generated_text': ' usually those who are ap...
3,"On March 2, 2017, roughly 100 of our","[{'generated_text': ' students, led by Student...",[{'generated_text': ' friends and neighbors ga...
4,"Syrian Democratic Forces’ spokesman, Talal Sil...",[{'generated_text': ' and fled to another regi...,[{'generated_text': ' and has been charged wit...
