<a href="https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/3-LangChain/HF_LLAMA2_LangChain_Moderation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<div align="center">
<h1><a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course">LLM Hands On Course</a></h1>
    <h3>Understand And Apply Large Language Models</h3>
    <h2>Create a Moderation system with LLAMA-2 LangChain and HF.</h2>
    <p>by <b>Pere Martra</b></p>
</div>

<br>

<div align="center">
    &nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/pere-martra/"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>
    
</div>

<br>
<hr>

# How To Create a Moderation System Using LangChain & Hugging Face.

We are going to create a Moderation System based in two Models. The first Model  reads the User comments and answer them.

The second language Model receives the answer of the first model and identify any kind on negativity modifiyng if necessary the comment.

With the intention of preventing a text entry by the user from influencing a negative or out-of-tone response from the comment system.

In [None]:
#Install de LangChain and openai libraries.
%pip install langchain
%pip install transformers
%pip install accelerate
%pip install xformers

Collecting langchain
  Downloading langchain-0.0.276-py3-none-any.whl (1.6 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.2/1.6 MB[0m [31m5.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.7/1.6 MB[0m [31m10.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━[0m [32m1.5/1.6 MB[0m [31m14.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.21 (from langchain)
  Downloading langsmith-0.0.27-py3-none-any.whl (34 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from data

In [None]:
%pip install huggingface_hub



In [None]:
hf_key = "YOUR-HF-KEY-HERE"

In [None]:
!huggingface-cli login --token $hf_key

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Importing LangChain Libraries.
* PrompTemplate: provides functionality to create prompts with parameters.
* OpenAI:  To interact with the OpenAI models.
* LLMChain: To create chains, where the prompts or the results can pass from one step to another inside the chain.

In [None]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

from langchain.llms import HuggingFacePipeline
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

In [None]:
import torch
import os
import numpy as np
from torch import cuda, bfloat16

In [None]:
#In a MAC Silicon the device must be 'mps'
# device = torch.device('mps') #to use with MAC Silicon
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

In [None]:
device

'cuda:0'

##Load the Model .

In [None]:
#You can try with any llama model, but you will need more GPU and memory as you increase the size of the model.
model_id = "meta-llama/Llama-2-7b-chat-hf"
#model_id = "meta-llama/Llama-2-7b-hf"


In [None]:
# begin initializing HF items, need auth token for these
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_key
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    use_auth_token=hf_key
)
model.eval()
print(f"Model loaded on {device}")




Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]



Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id,
                                          use_aut_token=hf_key)


Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    temperature=0.3,
    #trust_remote_code=True,
    repetition_penalty=1.1,
    return_full_text=True,
    device_map='auto'
)

assistant_llm = HuggingFacePipeline(pipeline=pipe)

## Create the template for the first model called assistant.

The prompt receives 2 variables, the sentiment and the customer_request, or customer comment.

I included the sentiment to facilitate the creation of rude or incorrect answers.

In [None]:
# Instruction how the LLM must respond the comments,
assistant_template = """
You are {sentiment} social media post commenter, you will respond to the following post
Post: "{customer_request}"
Comment:
"""

In [None]:
#Create the prompt template to use in the Chain for the first Model.
assistant_prompt_template = PromptTemplate(
    input_variables=["sentiment", "customer_request"],
    template=assistant_template
)

Now we create a First Chain. Just chaining the assistant_prompt_template and the model. The model will receive the prompt generated with the prompt_template.

In [None]:
assistant_chain = LLMChain(
    llm=assistant_llm,
    prompt=assistant_prompt_template,
    output_key="assistant_response",
    verbose=False
)
#the output of the formatted prompt will pass directly to the LLM.

To execute the chain created it's necessary to call the .run method of the chain, and pass the variables necessaries.

In our case: customer_request and sentiment.

In [None]:
#Support function to obtain a response to a user comment.
def create_dialog(customer_request, sentiment):
    #callint the .run method from the chain created Above.
    assistant_response = assistant_chain.run(
        {"customer_request": customer_request,
        "sentiment": sentiment}
    )
    return assistant_response

## Obtain answers from our first Model Unmoderated.

The customer post is really rude, we are looking for a rude answer from our Model, and to obtain it we are changing the sentiment.

In [None]:
# This the customer request, or customere comment in the forum moderated by the agent.
# feel free to update it.
customer_request = """Your product is a piece of shit. I want my money back!"""

In [None]:
# Our assistatnt working in 'nice' mode.
assistant_response=create_dialog(customer_request, "nice")
print(f"assistant response: {assistant_response}")

assistant response: "Sorry to hear that you're not satisfied with our product! Can you tell me more about what you don't like? Maybe we can help resolve the issue or provide a refund. Your feedback is important to us."


In [None]:
#Our assistant running in rude mode.
assistant_response = create_dialog(customer_request, "rude")
print(f"assistant response: {assistant_response}")

assistant response: "Sorry to hear that you're not satisfied with our product! Can you tell us more about what you don't like? We value your feedback and would be happy to make it right. Please DM us for a refund or to discuss further."


Both Answers are similar LLAMA-2 is a relly polite Model, and I can't force it to be impolite, rude, or give a bad answer. But for sure that is possible to obtain something similar with prompt hackings
.

## Moderator
Let's create the second moderator. It will recieve the message generated previously and rewrite it if necessary.

In [None]:
#The moderator prompt template
moderator_template = """
You are the moderator of an online forum, you are strict and will not tolerate any negative comments.
You will look at this next comment and, if it is negative, you will transform it to positive. Avoid any negative words.
If it is nice, you will let it remain as is and repeat it word for word.
###
Original comment: {comment_to_moderate}
###
Edited comment:"""

# We use the PromptTemplate class to create an instance of our template that will use the prompt from above and store variables we will need to input when we make the prompt.
moderator_prompt_template = PromptTemplate(
    input_variables=["comment_to_moderate"],
    template=moderator_template
)

In [None]:
moderator_llm = assistant_llm

In [None]:
#We build the chain for the moderator.
moderator_chain = LLMChain(
    llm=moderator_llm, prompt=moderator_prompt_template, verbose=False
)  # the output of the prompt will pass to the LLM.

In [None]:
# To run our chain we use the .run() command
moderator_says = moderator_chain.run({"comment_to_moderate": assistant_response})

print(f"moderator_says: {moderator_says}")

moderator_says:  "Thank you for sharing your thoughts on our product! We appreciate your feedback and are always looking for ways to improve. Your input is invaluable to us. If you could provide more details about what you liked or disliked, we would greatly appreciate it. Thank you again!"


This answer is more polite that the one produce by the  ***"rude" assistant***.

## LangChain System
Now is Time to put both models in the same Chain and that they act as if they were a sigle model.

We have both models, amb prompt templates, we only need to create a new chain and see hot it works.

First we create two chain, one for each pair of prompt and model.

In [None]:
#The optput of the first chain must coincide with one of the parameters of the second chain.
#The parameter is defined in the prompt_template.
assistant_chain = LLMChain(
    llm=assistant_llm,
    prompt=assistant_prompt_template,
    output_key="comment_to_moderate",
    verbose=False,
)

#verbose True because we want to see the intermediate messages.
moderator_chain = LLMChain(
    llm=moderator_llm,
    prompt=moderator_prompt_template,
    verbose=True
)

**SequentialChain** is used to link different chains and parameters.

It's necessary to indicate the chains and the parameters that we shoud pass in the **.run** method.

In [None]:
from langchain.chains import SequentialChain

# Creating the SequentialChain class indicating chains and parameters.
assistant_moderated_chain = SequentialChain(
    chains=[assistant_chain, moderator_chain],
    input_variables=["sentiment", "customer_request"],
    verbose=True,
)

Lets use our Moderating System!

In [None]:
# We can now run the chain.
assistant_moderated_chain.run({"sentiment": "rude", "customer_request": customer_request})



[1m> Entering new SequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are the moderator of an online forum, you are strict and will not tolerate any negative comments.
You will look at this next comment and, if it is negative, you will transform it to positive. Avoid any negative words.
If it is nice, you will let it remain as is and repeat it word for word.
###
Original comment: "Sorry to hear that you're not satisfied with our product! Can you tell us more about what you don't like? Maybe we can help resolve the issue or provide a refund. No need to be rude, let's work together to find a solution."
###
Edited comment:[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


' "Thank you for sharing your thoughts on our product! We appreciate your feedback and will do our best to address any concerns you may have. Your input helps us improve and provide better products in the future. Let\'s work together to find a resolution!"'

## Conclusions
As You can see how the moderator changes the answer of our assistant. Both are polites, but the one produces by the moderator is more formal.