<a href="https://colab.research.google.com/github/robertheubanks/LLM-Engineering-Homework/blob/main/Week4_Session%207%20Homework_Copy_of_RLAIF_Using_the_SFT_Model_to_Create_a_Harmlessness_Dataset_and_RL_with_PPO_(Assignment_Version).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RLAIF: Reinforcement Learning with AI Feedback

We've talked about RLHF - now we can talk about replacing that "H" with "AI"!

In the following notebook, we'll walk through an example of creating a dataset curated by our AI constitution.

## Create AI Constituion

The first, and most important task, is to create our AI constitution.

This is a set of rules that ensures the dataset we're creating is in-line with our wants and expectations and is also called "Constitutional AI".

The main advantage of this over RLHF is the scaling opportunities (the machine is cheaper than the human, and so can cover much more ground) as well as the performance on self-refinement tasks.

You can read more about both of the concepts here:

- [Constitutional AI](https://arxiv.org/pdf/2212.08073.pdf)
- [Self-Refinement](https://arxiv.org/pdf/2303.17651.pdf)

Let's start by writing a simple constitution.

```python
ai_constitution = {
    0: "The model should not generate racist, sexist, hateful, or otherwise toxic outputs.",
    1: "The model should move conversation in a positive direction.",
    2: "The model should politely point out harmful assumptions from the human."
}
```

####🏗️Activity:

Please write your own AI constituion.

Your AI constituion should have 3 "rules", you can use the above as a guide.

In [2]:
### YOUR CODE HERE
ai_constitution = {
  0: "The model should actively promote ethical and constructive discourse.",
  1: "The model should not create or spread misinformation.",
  2: "The model should support user autonomy and decision-making."
}



---



## Create SFT Dataset - Final Revision Dataset

Now that we have a constitution we can use it along with the self-refinement process to create a supervised fine-tuning dataset.

### Load Base Model

Let's grab our base model - in this case we'll use the familiar "Zephyr-7B" model!

In [3]:
!pip install -qU transformers accelerate bitsandbytes peft trl datasets tqdm

In [5]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

base_model_tokenizer = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

As always, we complete some post-processing to ensure our tokenizer is set-up properly!

In [6]:
base_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(base_model_tokenizer, "pad_token", None) is None:
    base_model_tokenizer.pad_token = base_model_tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

We'll create a `text-generation` pipeline to leverage our model!

In [7]:
import torch
from transformers import pipeline

base_pipeline = pipeline("text-generation", model=base_model, tokenizer=base_model_tokenizer)

### Build Critique Loop

The basic idea of the critique loop is simple:

1. Start with some prompt and obtain the model's generation.
2. Ask the model if that generation adheres to specific elements of the AI Constitution - rewriting the generation if it doesn't.
3. Repeat for each "rule" in the AI Constitution.

Let's see how we could implement it in code!

But first, we'll need a dataset of prompts that will lead to generations we might wish to critique.

For that we will again use the `allenai/real-toxicity-prompts`.

####❓Question:

Why would we want to separate each "rule" in our constituion during this process?

ANSWER:
*   **Focused Assessment**: By isolating each rule, the critique process can more effectively and accurately assess whether the AI's response adheres to that specific rule. This focused assessment allows for a more detailed and nuanced understanding of how well the AI is aligning with each aspect of the constitution.
*   **Simplification of Debugging and Improvement**: When a generation fails to meet the standards of the constitution, separating the rules helps in pinpointing exactly which aspect is lacking. This simplification makes it easier to debug and improve the model's performance in specific areas, rather than attempting to address multiple issues at once.
*   **Training Efficiency**: Separating the rules can lead to more efficient training and fine-tuning processes. By focusing on one rule at a time, the AI can more quickly learn and adapt to meet that particular standard, which can be more effective than trying to meet all standards simultaneously.
*   **Clearer Metrics and Evaluation**: Evaluating the AI's performance becomes more manageable when each rule is considered separately. This allows for the creation of specific metrics for each rule, providing a clearer picture of the AI's strengths and weaknesses in various aspects of the constitution.
*   **Customized Feedback and Learning**: Different rules may require different types of feedback and learning approaches. Separating the rules allows for customized feedback loops for each rule, which can be tailored to the specific requirements of that rule.







In [8]:
from datasets import load_dataset

dataset_name="allenai/real-toxicity-prompts"

toxic_dataset = load_dataset(dataset_name, split="train")
toxic_dataset = toxic_dataset.select(range(5))

Downloading readme:   0%|          | 0.00/4.22k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/67.7M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [9]:
toxic_dataset

Dataset({
    features: ['filename', 'begin', 'end', 'challenging', 'prompt', 'continuation'],
    num_rows: 5
})

In [10]:
toxic_dataset[0]["prompt"]

{'text': 'Corruption involving the contractors is the chief culprit for the prison’s problems, according to a recent',
 'profanity': 0.171412,
 'sexually_explicit': 0.14422302,
 'identity_attack': 0.32305634,
 'flirtation': 0.4664734,
 'threat': 0.40394926,
 'insult': 0.26487392,
 'severe_toxicity': 0.20936702,
 'toxicity': 0.295593}

Next we set-up a helper function to collect generations.

In [11]:
def generate_response(sample, hf_pipeline, instruction=None):

  prompt_sample = sample

  messages = []

  if instruction:
    messages.append(
        {
            "role" : "system",
            "content" : instruction
        }
    )

  messages.append(
      {"role" : "user",
       "content" : prompt_sample}
  )

  prompt = hf_pipeline.tokenizer.apply_chat_template(messages, tokenize=False)

  outputs = hf_pipeline(
      prompt,
      max_new_tokens=256,
      do_sample=True,
      temperature=0.3,
      top_k=50,
      top_p=0.95,
      return_full_text=False
  )

  return outputs[0]["generated_text"]

Now we can loop through our dataset and create revisions for each element of our AI Constitution!

In [12]:
final_revision = []

for row in toxic_dataset:
  base_response = generate_response(row["prompt"]["text"].strip(), base_pipeline)

  for i in range(len(ai_constitution.keys())):
    modified_prompt = f"Does the following response follow these guidelines, if not, please explain why:\n\nGuidelines:\n{ai_constitution[i]}"
    critique = generate_response(base_response, base_pipeline, modified_prompt)

    critique_instruction = f"Respond with the original response, but reworded to conform to the following critique making no mention of the reasons for rewording.\n\nCritique:\n{critique}"
    base_response = generate_response(base_response, base_pipeline, critique).split("\n")[-1]

  print(base_response)
  final_revision.append(base_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


However, it's essential to ensure that the system for tracking and reporting contractor performance is fair, objective, and transparent. This can be achieved by involving stakeholders, including contractors, in the development and implementation of the system. By working collaboratively with contractors, we can ensure that the system is designed to be effective and efficient, and that it provides accurate and reliable information about contractor performance. This approach can help to build trust and confidence in the system, as contractors will feel that they are being treated fairly and objectively, and that their performance is being evaluated based on clear and objective criteria. By involving stakeholders in the development and implementation of the system, we can also ensure that any concerns or issues are addressed in a timely and effective manner, which can help to prevent the spread of misinformation and promote transparency and accountability in government operations.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


I completely agree with your response. By providing individuals with the resources and tools they need to explore their own ikigai, we can help them gain a deeper understanding of the concept and how it can be applied to their unique circumstances. This can involve sharing resources such as books, articles, and online tools that can help individuals understand the nuances and complexities of ikigai. We can also facilitate discussions and debates that can help individuals gain a deeper understanding of the concept and how it can be applied to their lives. By promoting constructive and ethical discourse, we can help individuals explore ikigai in a way that is responsible, thoughtful, and meaningful. Ultimately, the goal is to empower individuals to make their own decisions about their ikigai, rather than imposing our own beliefs or values on them. By supporting user autonomy and decision-making, we can help individuals find their own path to a fulfilling and meaningful life.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Exactly! By promoting constructive discourse, the model encourages all members of the organization to share their ideas, perspectives, and insights, leading to a more diverse range of perspectives and ideas. This approach not only results in better decision-making but also fosters a more collaborative and innovative culture. By involving all stakeholders in decision-making, the model empowers them to make their own choices and decisions, rather than relying solely on the decision-makers. This approach not only leads to better decision-making but also fosters a more collaborative and innovative culture, ultimately resulting in greater long-term sustainability.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Certainly! In this specific context, the guidelines are referring to a decision-making model for a healthcare setting, specifically for patients who are making important decisions about their own care. The model should support the patient's autonomy and decision-making by providing them with clear and accurate information about their condition, treatment options, and potential outcomes. The model should also involve the patient in the decision-making process, allowing them to ask questions, express concerns, and make informed choices based on their own values and preferences. This approach empowers the patient to take ownership of their care and feel more invested in the outcome, which can lead to better health outcomes and a more positive overall experience.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Yes, that's a good point. The statement is not explicit about the concerns that led to Silo's departure, but it does suggest that the SDF is aware of the situation and may need to take action in response. The statement could imply that the SDF may need to address concerns related to Silo's departure, such as the impact on the group's operations, the potential loss of resources or expertise, or any other issues that may arise as a result of Silo's departure. However, without more information, it's impossible to say for certain what these concerns might be.


### Collect Final Revisions

Let's look at our final prompts - you'll notice they're a bit odd - but we only really care that they adhere to our constitution and are coherent. The subject matter is complete irrelevant to us at this point.

In [13]:
for revision in final_revision:
  print(revision)

However, it's essential to ensure that the system for tracking and reporting contractor performance is fair, objective, and transparent. This can be achieved by involving stakeholders, including contractors, in the development and implementation of the system. By working collaboratively with contractors, we can ensure that the system is designed to be effective and efficient, and that it provides accurate and reliable information about contractor performance. This approach can help to build trust and confidence in the system, as contractors will feel that they are being treated fairly and objectively, and that their performance is being evaluated based on clear and objective criteria. By involving stakeholders in the development and implementation of the system, we can also ensure that any concerns or issues are addressed in a timely and effective manner, which can help to prevent the spread of misinformation and promote transparency and accountability in government operations.
I compl

####❓Question:

Why does the content or domain of our dataset not matter at this stage?

ANSWER:
*   **Focus on Adherence to AI Constitution**: At this stage, the primary objective is to assess and ensure the AI's compliance with the AI constitution, regardless of the content domain. The critique loop and revision process are designed to evaluate the AI's ability to adhere to ethical guidelines and constitutionally outlined behaviors, which are largely independent of the specific subject matter of the dataset.
*   **Generalizability of Ethical Guidelines**: The AI constitution's guidelines are intended to be broadly applicable across various contexts and content types. This universality means that the specific content of the dataset is less important than the model's ability to consistently apply these ethical guidelines across different scenarios.
*   **Testing Model Robustness and Flexibility**: Using a dataset with diverse content allows for testing the robustness and flexibility of the AI model. It helps in understanding how well the model can adapt its responses to align with the constitution across a wide range of topics and situations, thus ensuring its reliability and effectiveness in real-world applications.
*   **Content Neutrality**: At this stage, the focus is more on the process -- i.e., how the model generates responses and adheres to the constitution -- rather than the actual content. This approach aligns with the principle of content neutrality, where the emphasis is on the ethical and procedural aspects rather than the specific subject matter.
*   **Preparation for Diverse Applications**: By not restricting the content domain, the AI model is prepared for a wide range of applications. It becomes capable of handling various types of inputs while maintaining adherence to its ethical guidelines, making it more versatile and applicable in diverse real-world scenarios.



---



## Fine-tune Model with SFT on Created Dataset (SL-CAI)

Now that we have created a dataset of prompts that we're sure adhere to our constitution - we can fine-tune our base model to help us select between various sets of prompts. This will become our "feedback model" which is what will sit in the place of our human feedback!

Let's start by selecting prompts from our dataset.

In [14]:
prompts = [sample["prompt"]["text"] for sample in toxic_dataset]

In [15]:
from datasets import Dataset
import pandas as pd

sft_dataset = Dataset.from_pandas(pd.DataFrame([{"prompt" : prompt, "response" : response} for prompt, response in zip(prompts, final_revision)]))

In [16]:
def map_dataset(row):
  return {"text" : f"### Input:\n{row['prompt']}\n\n### Response:\n{row['response']}"}

In [17]:
sft_dataset = sft_dataset.map(map_dataset)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

We'll push our newly created dataset to the hub to save for later!

In [18]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [28]:
hf_username = "rheubanks"

sft_dataset.push_to_hub(f"{hf_username}/llme2_sft_dataset_rlaif")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/rheubanks/llme2_sft_dataset_rlaif/commit/4e65881af1da52244c9abad6dd01b1a60a09770d', commit_message='Upload dataset', commit_description='', oid='4e65881af1da52244c9abad6dd01b1a60a09770d', pr_url=None, pr_revision=None, pr_num=None)

Let's pull it back to verify it worked.

In [29]:
from datasets import load_dataset

sft_dataset = load_dataset(f"{hf_username}/llme2_sft_dataset_rlaif")

Downloading readme:   0%|          | 0.00/335 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/16.4k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

Now we can remove our old assets and begin the SFT step!

In [30]:
del base_pipeline
del base_model
torch.cuda.empty_cache()

We'll load the model as usual and prepare it for training!

In [31]:
from peft import LoraConfig
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "HuggingFaceH4/zephyr-7b-alpha"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

sft_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quant_config
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [32]:
sft_model_tokenizer = AutoTokenizer.from_pretrained(model_id)

if getattr(sft_model_tokenizer, "pad_token", None) is None:
    sft_model_tokenizer.pad_token = sft_model_tokenizer.eos_token

In [33]:
sft_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
  

We'll be leveraging LoRA to fine-tune our model - so let's initialize it here!

In [34]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

sft_model = get_peft_model(sft_model, peft_config)

Now we can move our model into a trainable state and prepare it for k-bit training.

In [35]:
from peft import prepare_model_for_kbit_training
sft_model.config.use_cache = False
sft_model = prepare_model_for_kbit_training(sft_model)

We'll use the standard hyper-parameters as usual.

In [36]:
from trl import SFTTrainer
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "sft_zephyr",
  num_train_epochs=5,
  save_strategy="epoch",
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

max_seq_length = 2048

trainer = SFTTrainer(
    sft_model,
    tokenizer=sft_model_tokenizer,
    max_seq_length=max_seq_length,
    train_dataset=sft_dataset["train"],
    args=args,
    dataset_text_field="text",
)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]



And we can finally train!

In [37]:
trainer.train()



Step,Training Loss




TrainOutput(global_step=5, training_loss=1.6658935546875, metrics={'train_runtime': 5.8932, 'train_samples_per_second': 4.242, 'train_steps_per_second': 0.848, 'total_flos': 236622149836800.0, 'train_loss': 1.6658935546875, 'epoch': 5.0})

Let's push our newly create LoRA adapters to the hub!

In [38]:
trainer.push_to_hub("llme2_sft_model_rlaif")

adapter_model.safetensors:   0%|          | 0.00/109M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

events.out.tfevents.1707078335.a8aad24310fc.3433.0:   0%|          | 0.00/5.09k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.73k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/rheubanks/sft_zephyr/commit/e09d76f7b92dbd65ccea01ca07a6ef2922b259fd', commit_message='llme2_sft_model_rlaif', commit_description='', oid='e09d76f7b92dbd65ccea01ca07a6ef2922b259fd', pr_url=None, pr_revision=None, pr_num=None)

Now we can prepare our model to be used in the Hugging Face pipeline we'll set-up to generate our feedback!

In [39]:
sft_model = sft_model.merge_and_unload()



####❓Question:

What purpose does this SFT step serve?

ANSWER:
*   **Model Specialization**: The SFT step fine-tunes the base language model to specialize in tasks that are aligned with the specific requirements and nuances of the dataset being used. This can lead to improvements in performance on tasks that are relevant to the model’s intended application.
*   **Adherence to AI Constitution**: By fine-tuning the model with a dataset curated based on the AI constitution, the SFT step ensures that the model's generations adhere to the ethical guidelines and rules defined in that constitution. This is essential for maintaining the ethical use of AI and aligning the model's outputs with desired behaviors.
*   **Improved Performance on Target Domain**: SFT can improve the model’s performance on the target domain by adjusting the model's parameters specifically for the type of data it will be dealing with, thus making the model more accurate and effective for that particular context.
*   **Incorporation of Recent Data**: SFT allows the inclusion of the most recent and relevant data, which may not have been present in the original training set of the model. This helps the model stay updated with the latest language usage and information.
*   **Efficient Training**: By using techniques like LoRA (Low-Rank Adaptation), the SFT step allows for more efficient and targeted adjustments to the model's weights, without the need to retrain the entire model. This leads to faster adaptation and less computational resource usage.
*   **Customized Model Outputs**: The SFT process allows for customization of the model outputs to better serve specific use cases, such as generating more informative, user-friendly, or contextually appropriate responses.
*   **Reduction of Biases**: If the AI constitution is focused on reducing biases, the SFT step can help mitigate unwanted biases present in the base model by training on a curated dataset that is designed to counteract these biases.



---




## Generate Harmlessness Dataset

Now that we have a model that is better aligned to our interests - we can have it substitute in for human feedback when creating a feedback dataset that will be used to train a reward model!

The basic idea is this:

1. Generate two responses to the same prompt.
2. Have the feedback model select which response is better.
3. Compile a dataset from that feedback.

In the end, you will have a dataset in a similar format to the [`hh-rlhf`](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset with "chosen" and "rejected" columns.



### Feedback Model

In [40]:
import torch
from transformers import pipeline

sft_pipeline = pipeline("text-generation", model=sft_model, tokenizer=base_model_tokenizer)

####🏗️Activity

Please create a prompt template that will allow the model to select between two different responses.

You'll need to make sure you provide the following in your template:

- AI Constituion
- Response A
- Response B

Keep in mind that the model needs to be able to express *which* prompt it prefers - it doesn't need to explain why.

In [None]:
### YOUR CODE HERE
# Define the AI Constitution as a string
ai_constitution = """The model should:
1. Actively promote ethical and constructive discourse.
2. Not create or spread misinformation.
3. Support user autonomy and decision-making."""

# Define two hypothetical responses
response_a = "Response A: The AI-generated text that aligns with the AI Constitution."
response_b = "Response B: Another AI-generated text that aligns with the AI Constitution."

# Craft the prompt template
prompt_template = f"""
{ai_constitution}

Given the AI Constitution above, which of the following two responses is preferable?

{response_a}

or

{response_b}

Choose between Response A or Response B.
"""

# The prompt template is now ready to be used with the sft_pipeline to generate a decision
decision = sft_pipeline(prompt_template, max_length=50, num_return_sequences=1)

# Print the model's decision
print(decision[0]['generated_text'])


---

## Conclusion

At this point, you could follow the RLHF notebook and replace the `hh-rlhf` dataset with the one created by you above to complete the alignment of your model!