# Conversational Chatbot With Local Data

### Content (Processess)

* PDF data processing : Extracts text ans split them into managable chunks.
* Query Handling : Process the input questions.
* Combining vector database and LLMs: We leverage langchain's capabilities to link vector database indexing with llama-2 LLMs, enabling a seamless conversational experience with memory and retrieval functionalities.
* Hallucination Check: Also detects any inaccuracy or hallucination occurs.

# Pre-requisite

* Install all the libraries from the requirements.txt file using "pip install -r requirements.txt" cammand 

In [1]:
import PyPDF2

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.llms import LlamaCpp
#from langchain.llms import LlamaCpp


from langchain.embeddings import HuggingFaceEmbeddings, GPT4AllEmbeddings # import hf embedding
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain


from langchain.prompts import PromptTemplate
from sentence_transformers import SentenceTransformer, util
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Step 1: Preparing pdf metadata

In [2]:
pdf_files=["C:/Users/Mrinal Kalita/Python Projects/AIML Capstone Project - CV - Pneumonia Detection-1.pdf"]

In [3]:
def process_pdf(pdf_files):
    documents = []
    metadata = []
    content = []

    for i in pdf_files:

        pdf_read = PyPDF2.PdfReader(i)
        for ind, text in enumerate(pdf_read.pages):
            doc_page = {'title': i + " page " + str(ind + 1),
                        'content': pdf_read.pages[ind].extract_text()}
            documents.append(doc_page)
    for doc in documents:
        content.append(doc["content"])
        metadata.append({
            "title": doc["title"]
        })
    print("Content and metadata are extracted from the documents")
    return content, metadata

In [4]:
content, metadata = process_pdf(pdf_files)

Content and metadata are extracted from the documents


# Step 2: Split the content into smaller portion

The split_content function takes text content and metadata as inputs and splits the content into smaller portion.

In [5]:
def split_content(content, metadata):
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=512,chunk_overlap=256)
    smaller_docs = splitter.create_documents(content, metadatas=metadata)
    print(f"Docs are split into {len(smaller_docs)} passages")
    return smaller_docs

In [6]:
smaller_docs=split_content(content, metadata)

Docs are split into 7 passages


# Step 3: Ingest into Vector Database locally

The ingest_into_vectordb function is designed for processing and indexing a collection of documents into a vector database using FAISS (Facebook AI Similarity Search) for efficient similarity searches

In [7]:
def ingest_into_vectordb(smaller_docs):
    emb = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2', model_kwargs={'device': 'cpu'})
    #emb = GPT4AllEmbeddings(model_name='all-MiniLM-L6-v2.gguf2.f16.gguf', gpt4all_kwargs={'allow_download': 'True'})
    db = FAISS.from_documents(smaller_docs, emb)

    DB_FAISS_PATH = 'vectorstore/db_faiss'
    db.save_local(DB_FAISS_PATH)
    return db

In [8]:
vector =ingest_into_vectordb(smaller_docs)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Step4 : LLM Prompt conversation

The conversation_func function is designed to create and configure a conversational chain for a language model, specifically using the LLaMA model and a vector database for retrievals.

In [9]:
template = """[INST]
As an AI expert, based on the provided document,please provide accurate, important and relevant information. Your responses should follow the following guidelines:
- Answer the question based on the provided documents.
- Be direct, factual and precise while answering, limited to 50 words and 2-3 sentences. Begin your response without using introductory phrases like yes, no etc.
- Maintain an ethical, unbiased and neutral tone, avoiding harmful or offensive content.
- If the document does not contain relevant information, state "The document doesn't have any relevent information avilable."
- Do not include questions in your responses.
- Answer the questions directly. do not ask me questions
{question}
[/INST]
"""

#template = """Given the document and the current conversation between a user and an agent, your task is as follows: Answer any user query by using information from the document. The response should be detailed."""
callback = CallbackManager([StreamingStdOutCallbackHandler()])
def conversation_func(vector):
    llama_llm = LlamaCpp(
    model_path="C:/Users/Mrinal Kalita/langchain-notes/mistral-7b-openorca.gguf2.Q4_0.gguf",
    temperature=0.75,
    max_tokens=200,
    top_p=1,
    callback_manager=callback,
    n_ctx=3000)

    retriever = vector.as_retriever()
    CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(template)

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    conversation_chat = (ConversationalRetrievalChain.from_llm
                          (llm=llama_llm,
                           retriever=retriever,
                           #condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                           memory=memory,
                           return_source_documents=True))
    print("Conversation function created for the LLM using the vector store")
    return conversation_chat

In [10]:
con = conversation_func(vector)

llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:/Users/Mrinal Kalita/langchain-notes/mistral-7b-openorca.gguf2.Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Open-Orca_Mistral-7B-OpenOrca
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_lo

Conversation function created for the LLM using the vector store


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
Model metadata: {'general.name': 'Open-Orca_Mistral-7B-OpenOrca', 'general.architecture': 'llama', 'llama.context_length': '32768', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '32000', 'general.file_type': '2', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '10000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.add_bos_token': 'true', 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.chat_template': "{% if not add_generation_

# Step 5: etect Hallucination in the LLMs Response

The validate_answer_against_sources function evaluates the reliability of a response by comparing it with source documents.

In [11]:
def validate_answer_against_sources(response_answer, source_documents):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    similarity_threshold = 0.5  
    source_texts = [doc.page_content for doc in source_documents]

    answer_embedding = model.encode(response_answer, convert_to_tensor=True)
    source_embeddings = model.encode(source_texts, convert_to_tensor=True)

    cosine_scores = util.pytorch_cos_sim(answer_embedding, source_embeddings)


    if any(score.item() > similarity_threshold for score in cosine_scores[0]):
        return True  

    return False  

# Asking Quetions to chatbot

# Question 1

In [12]:
user_question = "what is the objective of this project?"
response=con({"question": user_question})
print("Q: ",user_question)
print("A: ",response['answer'])

  warn_deprecated(


 The objective of this project is to design a DL based algorithm for detecting pneumonia.


llama_print_timings:        load time =    2894.31 ms
llama_print_timings:      sample time =       5.94 ms /    20 runs   (    0.30 ms per token,  3369.27 tokens per second)
llama_print_timings: prompt eval time =  369859.38 ms /   992 tokens (  372.84 ms per token,     2.68 tokens per second)
llama_print_timings:        eval time =    7729.95 ms /    20 runs   (  386.50 ms per token,     2.59 tokens per second)
llama_print_timings:       total time =  377675.37 ms /  1012 tokens


Q:  what is the objective of this project?
A:   The objective of this project is to design a DL based algorithm for detecting pneumonia.


In [13]:
if response['source_documents']:
    response_answer = response['answer']
    source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
    is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
    if not is_valid_answer:
        response['answer'] = "Sorry I can not answer the question based on the given documents"
else:
    response['answer'] ="Sorry, I cannot answer the question based on the given documents"

print("Q: ",user_question)
print("A: ",response['answer'])

Q:  what is the objective of this project?
A:   The objective of this project is to design a DL based algorithm for detecting pneumonia.


# Question 2

In [14]:
user_question = "How many steps are there in the project"
response=con({"question": user_question})
print("Q: ",user_question)
print("A: ",response['answer'])

Llama.generate: prefix-match hit


 Can you please provide an overview of the steps involved in the project?


llama_print_timings:        load time =    2894.31 ms
llama_print_timings:      sample time =       4.16 ms /    15 runs   (    0.28 ms per token,  3607.50 tokens per second)
llama_print_timings: prompt eval time =   27064.44 ms /    88 tokens (  307.55 ms per token,     3.25 tokens per second)
llama_print_timings:        eval time =    6551.60 ms /    14 runs   (  467.97 ms per token,     2.14 tokens per second)
llama_print_timings:       total time =   33661.22 ms /   102 tokens
Llama.generate: prefix-match hit


 The project involves three milestones. Milestone 1 is worth 40 points and focuses on importing data, mapping training and testing images to their classes and annotations, preprocessing and visualization, displaying images with bounding boxes, and designing, training, and testing basic CNN models for classification. Milestone 2, worth 60 points, involves fine-tuning trained CNN models, applying transfer learning models, designing and testing RCNN and its hybrids based object detection models to impose bounding boxes or masks, pickling the model for future prediction, and final report submission. Milestone 3 is optional and involves creating a clickable UI-based interface that allows users to input images, output classes, and bounding boxes or masks.


llama_print_timings:        load time =    2894.31 ms
llama_print_timings:      sample time =      46.83 ms /   159 runs   (    0.29 ms per token,  3395.62 tokens per second)
llama_print_timings: prompt eval time =  422851.76 ms /  1347 tokens (  313.92 ms per token,     3.19 tokens per second)
llama_print_timings:        eval time =   70556.20 ms /   158 runs   (  446.56 ms per token,     2.24 tokens per second)
llama_print_timings:       total time =  493956.45 ms /  1505 tokens


Q:  How many steps are there in the project
A:   The project involves three milestones. Milestone 1 is worth 40 points and focuses on importing data, mapping training and testing images to their classes and annotations, preprocessing and visualization, displaying images with bounding boxes, and designing, training, and testing basic CNN models for classification. Milestone 2, worth 60 points, involves fine-tuning trained CNN models, applying transfer learning models, designing and testing RCNN and its hybrids based object detection models to impose bounding boxes or masks, pickling the model for future prediction, and final report submission. Milestone 3 is optional and involves creating a clickable UI-based interface that allows users to input images, output classes, and bounding boxes or masks.


In [15]:
if response['source_documents']:
    response_answer = response['answer']
    source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
    is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
    if not is_valid_answer:
        response['answer'] = "Sorry I can not answer the question based on the given documents"
else:
    response['answer'] ="Sorry, I cannot answer the question based on the given documents"

print("Q: ",user_question)
print("A: ",response['answer'])

Q:  How many steps are there in the project
A:   The project involves three milestones. Milestone 1 is worth 40 points and focuses on importing data, mapping training and testing images to their classes and annotations, preprocessing and visualization, displaying images with bounding boxes, and designing, training, and testing basic CNN models for classification. Milestone 2, worth 60 points, involves fine-tuning trained CNN models, applying transfer learning models, designing and testing RCNN and its hybrids based object detection models to impose bounding boxes or masks, pickling the model for future prediction, and final report submission. Milestone 3 is optional and involves creating a clickable UI-based interface that allows users to input images, output classes, and bounding boxes or masks.


# Question 3

In [16]:
user_question = "Who is Elber Einstein"
response=con({"question": user_question})
print("Q: ",user_question)
print("A: ",response['answer'])

Llama.generate: prefix-match hit


 "Who was Albert Einstein?"


llama_print_timings:        load time =    2894.31 ms
llama_print_timings:      sample time =       1.95 ms /     7 runs   (    0.28 ms per token,  3587.90 tokens per second)
llama_print_timings: prompt eval time =   67622.65 ms /   260 tokens (  260.09 ms per token,     3.84 tokens per second)
llama_print_timings:        eval time =    2575.24 ms /     6 runs   (  429.21 ms per token,     2.33 tokens per second)
llama_print_timings:       total time =   70221.83 ms /   266 tokens
Llama.generate: prefix-match hit


 Albert Einstein was a German-born physicist who is best known for his theory of relativity and his famous equation E=mc². He made significant contributions to the field of physics and is considered one of the most influential scientists in history.


llama_print_timings:        load time =    2894.31 ms
llama_print_timings:      sample time =      14.18 ms /    50 runs   (    0.28 ms per token,  3526.09 tokens per second)
llama_print_timings: prompt eval time =  450300.61 ms /  1255 tokens (  358.81 ms per token,     2.79 tokens per second)
llama_print_timings:        eval time =   23405.70 ms /    49 runs   (  477.67 ms per token,     2.09 tokens per second)
llama_print_timings:       total time =  473882.47 ms /  1304 tokens


Q:  Who is Elber Einstein
A:   Albert Einstein was a German-born physicist who is best known for his theory of relativity and his famous equation E=mc². He made significant contributions to the field of physics and is considered one of the most influential scientists in history.


In [17]:
if response['source_documents']:
    response_answer = response['answer']
    source_docs = response['source_documents']

    # Post-processing step to validate the answer against the source documents
    is_valid_answer = validate_answer_against_sources(response_answer, source_docs)
    if not is_valid_answer:
        response['answer'] = "Sorry I can not answer the question based on the given documents"
else:
    response['answer'] ="Sorry, I cannot answer the question based on the given documents"

print("Q: ",user_question)
print("A: ",response['answer'])

Q:  Who is Elber Einstein
A:  Sorry I can not answer the question based on the given documents


We can see that, the chatbot is working well. Since the question no 3 is out of context for the document. So it responded with the out of context guideline.

# Running The Streamlit App

To run the streamlit app in your local system - Go to command propt and run "streamlit run app.py"

# Challenges Encountered

* Less Computer resource availability.
* Some LLMs are paid.