# Customer Support Question Answering Chatbot

Author: Mohammed Nashaat

Date: June 14, 2025

---
This notebook demonstrates how to build a customer support chatbot using LangChain, DeepLake, and Mistral-7B. The chatbot retrieves answers from a knowledge base of support articles and generates responses using an LLM.

## Install Required Libraries

In [1]:
!pip install unstructured
!pip install selenium
!pip install llama-cpp-python
!pip install langchain
!pip install langchain_community
!pip install langchain-text-splitters
!pip install deeplake==3.9.27
!pip install tiktoken
!pip install transformers==4.30.0
!pip install sentence-transformers

## Load and Preprocess Support Articles
define the URLs of support articles and load them using Selenium:

In [2]:
from langchain.document_loaders import SeleniumURLLoader
from langchain.text_splitter import CharacterTextSplitter

# we'll use information from the following articles
urls = ['https://beebom.com/what-sos-mean-iphone/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-download-gif-twitter/',
        'https://beebom.com/how-delete-apple-id/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-replace-airtag-battery/',
        'https://beebom.com/how-sync-iphone-ipad/',
        'https://beebom.com/how-check-disk-usage-linux/']

# use the selenium scraper to load the documents
loader = SeleniumURLLoader(urls=urls)
docs_without_split = loader.load()

# split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_without_split)

**Why Chunking?**

- Large documents are split into smaller chunks (1000 characters) to improve retrieval accuracy.
- Ensures the LLM processes manageable text segments.



## Initialize Sentence Embedding Model
i'm using `sentence-transformers/all-MiniLM-L6-v2` to generate embeddings for semantic search:


In [3]:
from langchain_community.embeddings import HuggingFaceEmbeddings

# Initialize the proper embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embedding_model = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

**Why This Model?**

- Efficient for semantic similarity tasks.
- Balances speed and accuracy for retrieval-augmented generation (RAG).

## Setting Up DeepLake Vector Store
store document embeddings in DeepLake for fast retrieval:

In [4]:
from langchain.vectorstores import DeepLake
from google.colab import userdata
import os

# set your ActiveLoop token as an environment variable
os.environ["ACTIVELOOP_TOKEN"] = userdata.get('ACTIVELOOP_TOKEN')

my_activeloop_org_id = "mohammednashaat29"
my_activeloop_dataset_name = "langchain_customer_support_chatbot"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embedding_model, overwrite=True)

# add documents to our Deep Lake dataset
db.add_documents(docs)

  db = DeepLake(dataset_path=dataset_path, embedding_function=embedding_model)


Deep Lake Dataset in hub://mohammednashaat29/langchain_customer_support_chatbot already exists, loading from the storage


Creating 97 embeddings in 1 batches of size 97:: 100%|██████████| 1/1 [00:16<00:00, 16.60s/it]

Dataset(path='hub://mohammednashaat29/langchain_customer_support_chatbot', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (194, 384)  float32   None   
    id        text      (194, 1)     str     None   
 metadata     json      (194, 1)     str     None   
   text       text      (194, 1)     str     None   





['fa302386-494a-11f0-8532-0242ac1c000c',
 'fa3024ee-494a-11f0-8532-0242ac1c000c',
 'fa30258e-494a-11f0-8532-0242ac1c000c',
 'fa302606-494a-11f0-8532-0242ac1c000c',
 'fa30267e-494a-11f0-8532-0242ac1c000c',
 'fa3026ec-494a-11f0-8532-0242ac1c000c',
 'fa3027aa-494a-11f0-8532-0242ac1c000c',
 'fa30282c-494a-11f0-8532-0242ac1c000c',
 'fa302890-494a-11f0-8532-0242ac1c000c',
 'fa3028f4-494a-11f0-8532-0242ac1c000c',
 'fa302962-494a-11f0-8532-0242ac1c000c',
 'fa3029bc-494a-11f0-8532-0242ac1c000c',
 'fa302a16-494a-11f0-8532-0242ac1c000c',
 'fa302a70-494a-11f0-8532-0242ac1c000c',
 'fa302aca-494a-11f0-8532-0242ac1c000c',
 'fa302b1a-494a-11f0-8532-0242ac1c000c',
 'fa302b74-494a-11f0-8532-0242ac1c000c',
 'fa302bce-494a-11f0-8532-0242ac1c000c',
 'fa302c28-494a-11f0-8532-0242ac1c000c',
 'fa302c78-494a-11f0-8532-0242ac1c000c',
 'fa302cd2-494a-11f0-8532-0242ac1c000c',
 'fa302d2c-494a-11f0-8532-0242ac1c000c',
 'fa302d86-494a-11f0-8532-0242ac1c000c',
 'fa302de0-494a-11f0-8532-0242ac1c000c',
 'fa302e3a-494a-

**Key Steps:**

- Authenticate with ActiveLoop using an API token.
- Create a vector store and populate it with document embeddings.

## Test Document Retrieval
verify that the system retrieves relevant documents for a query:

In [5]:
# let's see the top relevant documents to a specific query
query = "What type of data do we share with Spotify?"
similar_docs = db.similarity_search(query)
print(similar_docs[0].page_content)

Spotify collects a plethora of information about you. During signup, you share your email address, phone number, and geolocation with Spotify. After that, Spotify continues to collect data like your playlists, search queries, your followers and following, and much more. To read in detail about the type of data that Spotify collects about you, read this article right away. To read more about the save visit the Spotify Privacy Policy page

What type of data will Spotify have access to after I permanently delete my account?

Even if you delete your Spotify account permanently, Spotify will still retain some of your data for tax, accounting, and regulation purposes. The company can use the retained data to resolve disputes related to your account or any sort of situation that requires fraud and grievance redressals.

I permanently deleted my Spotify account but my data is not completely removed from Spotify yet. What to do?


## Define the Prompt Template
craft a structured prompt for the LLM to generate accurate responses:

In [6]:
from langchain import PromptTemplate

# i'll write a prompt for a customer support chatbot that
# answer questions using information extracted from my vector db

template = """<s>[INST]You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.
{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.
Question: {query}

Answer:[/INST]"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template
)

**Prompt Design:**

- Instructs the LLM to answer based on retrieved context only.
- Prevents hallucinations by restricting external knowledge.

## Load Mistral-7B Model
load the quantized Mistral-7B model for response generation:

In [7]:
# Load the Model from Drive

from langchain_community.llms import LlamaCpp
from google.colab import drive
drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/Models/mistral-7b-instruct-v0.1.Q6_K.gguf"

Mistral = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=40,
    n_ctx=2048,
    temperature=0 # Deterministic responses
)

Mounted at /content/drive


llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /content/drive/MyDrive/Models/mistral-7b-instruct-v0.1.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7

**Model Configuration:**

- `temperature=0` ensures factual, deterministic responses.
- Runs on GPU for faster inference.

## Run the Full Pipeline
combine retrieval and generation to answer a user query:

In [8]:
# User question
query = "What SOS mean in iphone?"

# Retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# Format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# Generate answer
answer = Mistral(prompt_formatted)

  answer = Mistral(prompt_formatted)
llama_perf_context_print:        load time =  522982.77 ms
llama_perf_context_print: prompt eval time =  522982.58 ms /   843 tokens (  620.38 ms per token,     1.61 tokens per second)
llama_perf_context_print:        eval time =   79965.84 ms /    76 runs   ( 1052.18 ms per token,     0.95 tokens per second)
llama_perf_context_print:       total time =  603051.79 ms /   919 tokens


In [9]:
print(answer)

 SOS on an iPhone means that the device is experiencing network connectivity issues. It can occur when you're traveling to a new area or in a remote location where there is limited or no cellular coverage. The SOS icon appears in the top-left corner of your screen, and it prevents you from accessing the internet, sending messages, or making phone calls.


## Conclusion
This notebook demonstrates:

✅ Web scraping support articles

✅ Chunking and embedding documents

✅ Retrieval-augmented generation (RAG) with Mistral-7B

✅ A working customer support chatbot