# Customer Support Question Answering Chatbot

Author: Mohammed Nashaat

Date: June 14, 2025

---
This notebook demonstrates how to build a customer support chatbot using LangChain, DeepLake, and Mistral-7B. The chatbot retrieves answers from a knowledge base of support articles and generates responses using an LLM.

## Install Required Libraries

In [None]:
!pip install unstructured
!pip install selenium
!pip install llama-cpp-python
!pip install langchain
!pip install langchain_community
!pip install langchain-text-splitters
!pip install deeplake==3.9.27
!pip install tiktoken
!pip install transformers==4.30.0
!pip install sentence-transformers

Collecting unstructured
  Downloading unstructured-0.17.2-py3-none-any.whl.metadata (24 kB)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Collecting dataclasses-json (from unstructured)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2025.2.18-py3-none-any.whl.metadata (14 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rapidfuzz (from unstructured)
  Downloading rapidfuzz-3.13.0-cp311-c

Collecting transformers==4.30.0
  Downloading transformers-4.30.0-py3-none-any.whl.metadata (113 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.6/113.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.30.0)
  Downloading tokenizers-0.13.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.30.0-py3-none-any.whl (7.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tokenizers-0.13.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m87.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.21.1
    Uninstalling tokenizers-0.21.1:
      Successful

## Load and Preprocess Support Articles
define the URLs of support articles and load them using Selenium:

In [3]:
from langchain.document_loaders import SeleniumURLLoader
from langchain.text_splitter import CharacterTextSplitter

# we'll use information from the following articles
urls = ['https://beebom.com/what-sos-mean-iphone/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-download-gif-twitter/',
        'https://beebom.com/how-delete-apple-id/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-replace-airtag-battery/',
        'https://beebom.com/how-sync-iphone-ipad/',
        'https://beebom.com/how-check-disk-usage-linux/']

# use the selenium scraper to load the documents
loader = SeleniumURLLoader(urls=urls)
docs_without_split = loader.load()

# split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_without_split)

**Why Chunking?**

- Large documents are split into smaller chunks (1000 characters) to improve retrieval accuracy.
- Ensures the LLM processes manageable text segments.



## Initialize Sentence Embedding Model
i'm using `sentence-transformers/all-MiniLM-L6-v2` to generate embeddings for semantic search:


In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings

# Initialize the proper embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embedding_model = HuggingFaceEmbeddings(
Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

**Why This Model?**

- Efficient for semantic similarity tasks.
- Balances speed and accuracy for retrieval-augmented generation (RAG).

## Setting Up DeepLake Vector Store
store document embeddings in DeepLake for fast retrieval:

In [8]:
from langchain.vectorstores import DeepLake
from google.colab import userdata
import os

# set your ActiveLoop token as an environment variable
os.environ["ACTIVELOOP_TOKEN"] = userdata.get('ACTIVELOOP_TOKEN')

my_activeloop_org_id = "mohammednashaat29"
my_activeloop_dataset_name = "langchain_customer_support_chatbot"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embedding_model)

# add documents to our Deep Lake dataset
db.add_documents(docs)



Your Deep Lake dataset has been successfully created!


Creating 97 embeddings in 1 batches of size 97:: 100%|██████████| 1/1 [00:28<00:00, 28.82s/it]

Dataset(path='hub://mohammednashaat29/langchain_customer_support_chatbot', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (97, 1)     str     None   
 metadata     json      (97, 1)     str     None   
 embedding  embedding  (97, 384)  float32   None   
    id        text      (97, 1)     str     None   





['a8b255fa-48ec-11f0-87dd-0242ac1c000c',
 'a8b25780-48ec-11f0-87dd-0242ac1c000c',
 'a8b258ac-48ec-11f0-87dd-0242ac1c000c',
 'a8b2592e-48ec-11f0-87dd-0242ac1c000c',
 'a8b259a6-48ec-11f0-87dd-0242ac1c000c',
 'a8b25a0a-48ec-11f0-87dd-0242ac1c000c',
 'a8b25a8c-48ec-11f0-87dd-0242ac1c000c',
 'a8b25b18-48ec-11f0-87dd-0242ac1c000c',
 'a8b25b86-48ec-11f0-87dd-0242ac1c000c',
 'a8b25be0-48ec-11f0-87dd-0242ac1c000c',
 'a8b25c44-48ec-11f0-87dd-0242ac1c000c',
 'a8b25c9e-48ec-11f0-87dd-0242ac1c000c',
 'a8b25cf8-48ec-11f0-87dd-0242ac1c000c',
 'a8b25d52-48ec-11f0-87dd-0242ac1c000c',
 'a8b25dac-48ec-11f0-87dd-0242ac1c000c',
 'a8b25e06-48ec-11f0-87dd-0242ac1c000c',
 'a8b25e60-48ec-11f0-87dd-0242ac1c000c',
 'a8b25eb0-48ec-11f0-87dd-0242ac1c000c',
 'a8b25f0a-48ec-11f0-87dd-0242ac1c000c',
 'a8b25f64-48ec-11f0-87dd-0242ac1c000c',
 'a8b25fbe-48ec-11f0-87dd-0242ac1c000c',
 'a8b2600e-48ec-11f0-87dd-0242ac1c000c',
 'a8b2605e-48ec-11f0-87dd-0242ac1c000c',
 'a8b260b8-48ec-11f0-87dd-0242ac1c000c',
 'a8b26112-48ec-

**Key Steps:**

- Authenticate with ActiveLoop using an API token.
- Create a vector store and populate it with document embeddings.

## Test Document Retrieval
verify that the system retrieves relevant documents for a query:

In [10]:
# let's see the top relevant documents to a specific query
query = "What type of data do we share with Spotify?"
similar_docs = db.similarity_search(query)
print(similar_docs[0].page_content)

Spotify collects a plethora of information about you. During signup, you share your email address, phone number, and geolocation with Spotify. After that, Spotify continues to collect data like your playlists, search queries, your followers and following, and much more. To read in detail about the type of data that Spotify collects about you, read this article right away. To read more about the save visit the Spotify Privacy Policy page

What type of data will Spotify have access to after I permanently delete my account?

Even if you delete your Spotify account permanently, Spotify will still retain some of your data for tax, accounting, and regulation purposes. The company can use the retained data to resolve disputes related to your account or any sort of situation that requires fraud and grievance redressals.

I permanently deleted my Spotify account but my data is not completely removed from Spotify yet. What to do?


## Define the Prompt Template
craft a structured prompt for the LLM to generate accurate responses:

In [13]:
from langchain import PromptTemplate

# i'll write a prompt for a customer support chatbot that
# answer questions using information extracted from my vector db

template = """<s>[INST]You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.
{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.
Question: {query}

Answer:[/INST]"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template
)

**Prompt Design:**

- Instructs the LLM to answer based on retrieved context only.
- Prevents hallucinations by restricting external knowledge.

## Load Mistral-7B Model
load the quantized Mistral-7B model for response generation:

In [11]:
# Load the Model from Drive

from langchain_community.llms import LlamaCpp
from google.colab import drive
drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/Models/mistral-7b-instruct-v0.1.Q6_K.gguf"

Mistral = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=40,
    n_ctx=2048,
    temperature=0 # Deterministic responses
)

Mounted at /content/drive


llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /content/drive/MyDrive/Models/mistral-7b-instruct-v0.1.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7

**Model Configuration:**

- `temperature=0` ensures factual, deterministic responses.
- Runs on GPU for faster inference.

## Run the Full Pipeline
combine retrieval and generation to answer a user query:

In [14]:
# User question
query = "What SOS mean in iphone?"

# Retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# Format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# Generate answer
answer = Mistral(prompt_formatted)

  answer = Mistral(prompt_formatted)
llama_perf_context_print:        load time =  589052.88 ms
llama_perf_context_print: prompt eval time =  589052.54 ms /   962 tokens (  612.32 ms per token,     1.63 tokens per second)
llama_perf_context_print:        eval time =  100556.56 ms /   101 runs   (  995.61 ms per token,     1.00 tokens per second)
llama_perf_context_print:       total time =  689744.07 ms /  1063 tokens


 SOS on an iPhone means that your device is out of your network carrier's range to make/receive calls, text messages, or access the internet. It indicates that you can only make emergency calls to numbers like 112 (India & Europe), 911 (United States), 999 (UK), etc. This is to ensure that you're safe and can reach out to emergency services even if your iPhone cannot connect to the cellular network.


In [15]:
print(answer)

 SOS on an iPhone means that your device is out of your network carrier's range to make/receive calls, text messages, or access the internet. It indicates that you can only make emergency calls to numbers like 112 (India & Europe), 911 (United States), 999 (UK), etc. This is to ensure that you're safe and can reach out to emergency services even if your iPhone cannot connect to the cellular network.


## Conclusion
This notebook demonstrates:

✅ Web scraping support articles

✅ Chunking and embedding documents

✅ Retrieval-augmented generation (RAG) with Mistral-7B

✅ A working customer support chatbot