<a href="https://colab.research.google.com/github/ki-ki13/chatbot-sentimentAnalysis/blob/main/Chatbot_Bounty.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Customer Support Chatbot based on Sentiment Analysis

This notebook is made for Llama Chatbot with Sentiment Analysis Integration By StackUp
</br>
rmph13's notebook

##Brief
Develop a chatbot that detects user frustration or satisfaction and adjusts its responses to provide better customer service. For example, if the sentiment is negative, the chatbot could apologize and offer assistance.  If the sentiment is positive, the chatbot can thank the user for their feedback and ask if they would like to leave a review or share their experience.

##Steps
* Installing the necessary dependencies and setting up the environment.
* Importing the required libraries and logging into Hugging Face.
* Setting up the Sentiment Analysis
* Pipeline for real-time sentiment detection.
* Creating and managing a QA dataset for frequently asked questions.
* Initializing the Llama 2 Model and Tokenizer for advanced text generation.
* Implementing the Question-Answering Function and Sentiment Detection Function.
* Testing the Sentiment Analysis and Question-Answering Functions thoroughly.
* Creating a user-friendly interface using Gradio for interaction with the chatbot.

## Step 1 : Installing the dependencies

First step is installing the dependencies like :
*  accelerate for optimizes the performance of models
* protobuf for serialization library used for data exchange
* sentencepiece as tokenizer and detokenizer library
* huggingface_hub provide acess to hugging face model hub
* transformers for loading and using pre-trained models like Llama 2 or sentiment analysis models from Hugging Face.
* gradio for the UI
* pytorch cause the model is built on PyTorch


In [1]:
!pip install -q accelerate protobuf sentencepiece torch git+https://github.com/huggingface/transformers huggingface_hub gradio

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.1/18.1 MB[0m [31m90.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.7/318.7 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.6/94.6 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.9/141.9 kB[0m [31m11.6 MB/s[0m eta [36m0:00:0

## Step 2: Import libraries and logging to Hugging Face


Import the libraries that will be needed.
* Pandas will be used for managing QA dataset
* os provides a way to interact with operating system like to open file that is in our operating system
* AutoModelForCausalLM: Loads the pre-trained Llama 2 model  for text generation tasks.
* AutoTokenizer: Loads the tokenizer that prepares text for the Llama 2 model.
* pipeline: Simplifies the process of creating a task-specific pipeline. It combines model and tokenizer to handle inference.
* login function from huggingface_hub to authenticate to hugging face
* torch is the core PyTorch library used for creating and running deep learning models

In [4]:
import pandas as pd
import os
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoModelForSequenceClassification, pipeline
from huggingface_hub import login
import torch

Login to hugging face
</br> Previously, I already obtain token from hugging face

In [3]:
login(token="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")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Set Up the Sentiment Analysis Pipeline

Load a sentiment analysis model (e.g., a BERT-based model fine-tuned for sentiment analysis)

In [5]:
sentiment_tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
sentiment_model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
sentiment_analyzer = pipeline('sentiment-analysis', model=sentiment_model, tokenizer=sentiment_tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


## Create and Manage QA Dataset

This file will serve as a knowledge base that chat agent will reference when answering questions. If the file already exists, the code will load the existing data instead of creating a new file. This approach allows the agent to build on previously acquired knowledge.

In [22]:
csv_file = "qa_dataset.csv"

if not os.path.exists(csv_file):
  qa_data = {
      'question': [
            "How do I reset my password?",
            "What is the return policy for your products?"
        ],
        'answer': [
            "To reset your password, go to the login page, click on 'Forgot Password', and follow the instructions.",
            "Our return policy allows returns within 30 days of purchase. Please ensure the product is in its original condition."
        ]
  }
  qa_df = pd.DataFrame(qa_data)
  qa_df.to_csv(csv_file, index=False)
else:
  qa_df = pd.read_csv(csv_file)

In [23]:
# see the dataset
print(qa_df)

                                       question  \
0                   How do I reset my password?   
1  What is the return policy for your products?   

                                              answer  
0  To reset your password, go to the login page, ...  
1  Our return policy allows returns within 30 day...  


## Initializing the Llama 2 Model and Tokenizer

Load Llama 2 for question-answering or customer support tasks.

In [8]:
# Initialize the Llama 2 model and tokenizer
model_id = "NousResearch/Llama-2-7b-chat-hf"
llama_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
llama_tokenizer = AutoTokenizer.from_pretrained(model_id)
llama_tokenizer.use_default_system_prompt = False

# Initialize the pipeline using Hugging Face pipeline
llama_pipeline = pipeline(
    "text-generation",  # LLM task
    model=llama_model,
    tokenizer=llama_tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
    max_length=1024,)

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Implement the Question-Answering and Sentiment Detection Function

* Create a function that integrates both sentiment analysis and Llama 2’s response generation.
* Based on the sentiment, adjust the chatbot’s tone.

In [24]:
def detect_sentiment(user_input):
    sentiment = sentiment_analyzer(user_input)[0]
    return sentiment['label'], sentiment['score']

def generate_response(user_input):
    global qa_df

    # Check if the question exists in the dataset
    existing_answer = qa_df[qa_df['question'].str.lower() == user_input.lower()]['answer']
    if not existing_answer.empty:
        return existing_answer.values[0]

    # Detect sentiment
    sentiment, score = detect_sentiment(user_input)
    response_prefix = ""

    if sentiment == "NEGATIVE":
        response_prefix = "I understand that you're frustrated. Let me help you with that. "
    elif sentiment == "POSITIVE":
        response_prefix = "I'm glad you're satisfied! "

    # Format the prompt for Llama 2
    prompt = f"""<s>[INST] <<SYS>>
You are a helpful customer support assistant. Provide concise and accurate responses.
<</SYS>>

{user_input} [/INST]"""

    # Generate the response using the Llama pipeline
    response = llama_pipeline(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)[0]['generated_text']

    # Extract the assistant's response
    assistant_response = response.split('[/INST]')[-1].strip()

    # Combine the response prefix with the generated response
    final_response = response_prefix + assistant_response

    # Save the new question-answer pair to the dataset
    new_entry = pd.DataFrame({'question': [user_input], 'answer': [final_response]})
    qa_df = pd.concat([qa_df, new_entry], ignore_index=True)
    qa_df.to_csv(csv_file, index=False)

    return final_response


## Test the Sentiment Analysis and QA Function

Test the chatbot for both sentiment detection and proper response generation.

In [25]:
user_input = "I'm really unhappy with the service I received."
response = generate_response(user_input)
print(response)

Both `max_new_tokens` (=100) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


I understand that you're frustrated. Let me help you with that. I apologize to hear that you're unhappy with the service you received. Can you please provide more details about the issue you're experiencing, such as the product or service you received, the problem you encountered, and any steps you've taken so far to resolve the issue? This will help me better understand your concern and provide you with the most effective solution.


In [26]:
user_input = "How do I reset my password?"
response = generate_response(user_input)
print(response)

To reset your password, go to the login page, click on 'Forgot Password', and follow the instructions.


In [27]:
user_input = "Can I exchange an item?"
response = generate_response(user_input)
print(response)

Both `max_new_tokens` (=100) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


I understand that you're frustrated. Let me help you with that. Of course! I'd be happy to help you with that. Can you please provide me with some more details about the item you would like to exchange? For example, what is the item's name, what is the reason for the exchange, and what would you like to receive in exchange?


In [28]:
# see the updated dataset
print(qa_df)

                                          question  \
0                      How do I reset my password?   
1     What is the return policy for your products?   
2  I'm really unhappy with the service I received.   
3                          Can I exchange an item?   

                                              answer  
0  To reset your password, go to the login page, ...  
1  Our return policy allows returns within 30 day...  
2  I understand that you're frustrated. Let me he...  
3  I understand that you're frustrated. Let me he...  


## Create the interface with Gradio


In [29]:
import gradio as gr

In [30]:
def chatbot(message, history):
    response = generate_response(message)
    return response

def load_qa_dataset():
    global qa_df
    csv_file = "qa_dataset.csv"
    if not os.path.exists(csv_file):
        qa_data = {
            'question': [
                "How do I reset my password?",
                "What is the return policy for your products?"
            ],
            'answer': [
                "To reset your password, go to the login page, click on 'Forgot Password', and follow the instructions.",
                "Our return policy allows returns within 30 days of purchase. Please ensure the product is in its original condition."
            ]
        }
        qa_df = pd.DataFrame(qa_data)
        qa_df.to_csv(csv_file, index=False)
    else:
        qa_df = pd.read_csv(csv_file)


In [31]:
# Load the dataset
load_qa_dataset()

In [33]:
# Create the Gradio interface
iface = gr.ChatInterface(
    chatbot,
    title="Customer Support Chatbot",
    description="Ask any customer support related questions!",
    theme="soft",
    examples=[
        "How do I reset my password?",
        "What is your return policy?",
        "I'm having trouble with my order.",
        "Can you tell me about your shipping options?"
    ],
    retry_btn="Retry",
    undo_btn="Undo",
    clear_btn="Clear",
)

# Launch the interface
iface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://e971269b066ac2a8bb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


