**Install the required dependencies**

1.langchain 
2.chromadb 
3.openai 
4.gradio
5.pypdf 

In [11]:
!pip install --quiet langchain chromadb openai gradio pypdf langchain-openai


 **Croma DB was not working with SQL lite 3.31 so installed latest 3.40 version**

In [12]:
# Download SQLite source
!wget https://www.sqlite.org/2023/sqlite-autoconf-3410200.tar.gz

# Extract
!tar -xzf sqlite-autoconf-3410200.tar.gz

# Build and install locally in ~/sqlite3
!cd sqlite-autoconf-3410200 && ./configure --prefix=$HOME/sqlite3 && make && make install


--2025-09-24 18:36:03--  https://www.sqlite.org/2023/sqlite-autoconf-3410200.tar.gz
Resolving www.sqlite.org (www.sqlite.org)... 194.195.208.62, 2600:3c02::f03c:95ff:fe07:695
Connecting to www.sqlite.org (www.sqlite.org)|194.195.208.62|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3125545 (3.0M) [application/x-gzip]
Saving to: ‘sqlite-autoconf-3410200.tar.gz.9’


2025-09-24 18:36:04 (6.82 MB/s) - ‘sqlite-autoconf-3410200.tar.gz.9’ saved [3125545/3125545]

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out


In [13]:
!pip install --quiet pysqlite3-binary

**Import all necessary modules.**

In [19]:
# Imports
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import sys
import types
import pysqlite3
sys.modules['sqlite3'] = pysqlite3

import gradio as gr


**Set the open AI key**

In [20]:
#from dotenv import load_dotenv, find_dotenv

#_ = load_dotenv(find_dotenv()) # read local .env file
#openai.api_key  = os.getenv('OPENAI_API_KEY')

**1.Loading the PDF file into document loader.2.Splitting the text in the document into chunks.**


In [21]:
# Load Nestlé HR policy PDF
loader = PyPDFLoader("Nestle-Hr-Policy-Sample.pdf")
documents = loader.load()

# Split into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)
print(docs)



In [22]:
embedding = OpenAIEmbeddings()
vector_store = Chroma.from_documents(docs, embedding, collection_name="nestle_hr_policy")
print("Chroma vector store created successfully.")


Chroma vector store created successfully.


In [33]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
import gradio as gr

# Assuming vector_store is already created and loaded
# e.g., vector_store = Chroma.from_documents(...)

# 1. Define prompt template with 'context' and 'question'
prompt_template = """
You are a helpful HR assistant for Nestlé. Use only the context below to answer the question.
If the answer is not found in the context, say: "I'm sorry, I couldn't find that information in the policy."

Context:
{context}

Question:
{input}

Helpful Answer:
"""

prompt = PromptTemplate.from_template(prompt_template)

# 2. Setup LLM and chains
chat_model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
document_chain = create_stuff_documents_chain(llm=chat_model, prompt=prompt)
retriever = vector_store.as_retriever(search_kwargs={"k": 4})
qa_chain = create_retrieval_chain(retriever, document_chain)

In [34]:
!pip install --upgrade gradio


Defaulting to user installation because normal site-packages is not writeable


In [36]:
# 3. Define Gradio interface function
def chatbot_interface(query):
    result = qa_chain.invoke({"input": query})  # note: use 'question' key here!
    return result["answer"]

# 4. Launch Gradio app with bigger output box
interface = gr.Interface(
    fn=chatbot_interface,
    inputs=gr.Textbox(lines=2, placeholder="Ask about Nestlé HR policies..."),
    outputs=gr.Textbox(lines=10),   # <-- bigger output box with 10 lines
    title="Nestlé HR Assistant",
    description="Ask any question about Nestlé’s HR policy. The assistant will provide accurate information based on internal documents."
)

interface.launch(share=True)


* Running on local URL:  http://127.0.0.1:7864
* Running on public URL: https://fe1cc4f89df17c91d2.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


