# AI-Powered HR Assistant Project

This notebook demonstrates how to build a question-answering chatbot using Nestlé's HR policy documents. The project uses Python, OpenAI's GPT model, LangChain, FAISS and (optionally) Gradio. It was developed and tested on a Mac using JupyterLab launched from Terminal.

## 1. Environment Setup

Install the required packages using pip. Running these commands in a code cell will install the dependencies needed for this project. If some packages are already installed, pip will skip them:

```bash
!pip install openai faiss-cpu langchain pypdf python-dotenv gradio
```

After installation, import the necessary libraries and load your OpenAI API key from a `.env` file. The `.env` file should be placed in the project directory and contain a line like `OPENAI_API_KEY=your-key-here`. Loading the key via `dotenv` avoids hard-coding sensitive information in the notebook.

In [None]:

from dotenv import load_dotenv
import os

# Load environment variables from a .env file
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key is None:
    raise ValueError('Please set your OPENAI_API_KEY in a .env file before running this notebook.')

# Import other libraries
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
import openai

openai.api_key = api_key


## 2. Load and Split the PDF Document

Use `PyPDFLoader` to read the HR policy PDF. Then split the document into smaller chunks using `RecursiveCharacterTextSplitter`. Adjust the `chunk_size` and `chunk_overlap` parameters to balance context and retrieval granularity.

In [None]:

# Path to the HR policy PDF (update if necessary)
policy_path = '1728286846_the_nestle_hr_policy_pdf_2012.pdf'

# Load the PDF
loader = PyPDFLoader(policy_path)
documents = loader.load()

# Split into smaller chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(documents)
print(f'Total chunks created: {len(docs)}')


## 3. Create Embeddings and Vector Store

Generate embeddings for each chunk using the OpenAI embeddings API. Store these vectors in a FAISS index for efficient similarity search.

In [None]:

# Create embeddings for each document chunk
embeddings = OpenAIEmbeddings()

# Build the FAISS vector store from the embedded documents
vector_store = FAISS.from_documents(docs, embeddings)

print('Vector store created successfully')


## 4. Define the Question-Answering Function

To answer questions, retrieve the most relevant document chunks from the vector store and pass them along with the query to GPT-3.5 via LangChain's QA chain. Limiting the context to the retrieved documents helps keep answers grounded in the source material.

In [None]:

# Load the QA chain with the OpenAI model
qa_chain = load_qa_chain(OpenAIEmbeddings().client, chain_type='stuff')

def answer_query(query: str, k: int = 4) -> str:
    # Retrieve relevant chunks and generate an answer using GPT-3.5
    docs_and_scores = vector_store.similarity_search_with_score(query, k=k)
    # Extract the documents only
    retrieved_docs = [doc for doc, score in docs_and_scores]
    # Run the chain
    result = qa_chain.run(input_documents=retrieved_docs, question=query)
    return result

# Example usage (uncomment to test once your API key is set)
# print(answer_query('What is the probation period for new employees?'))


## 5. (Optional) Build a Gradio Interface

You can wrap the `answer_query` function in a Gradio interface to create a simple web-based chatbot. Refer to the provided Gradio documentation for more details. If you choose to skip this step, leave a comment when submitting your project.

In [None]:

import gradio as gr

# Define the Gradio callback
def gradio_answer(question: str) -> str:
    return answer_query(question)

# Create the interface
interface = gr.Interface(
    fn=gradio_answer,
    inputs=gr.components.Textbox(lines=2, placeholder='Enter your question here...'),
    outputs=gr.components.Textbox(lines=5),
    title='Nestlé HR Policy Chatbot',
    description='Ask a question about the HR policy document and get an answer.'
)

# Launch the interface (set share=True to obtain a public link)
# interface.launch(debug=True)


## 6. Conclusion

This notebook outlines the process of building an AI‑powered HR assistant using OpenAI's language model and vector similarity search. By loading and chunking a policy PDF, generating embeddings, building a FAISS index, and querying GPT‑3.5 through LangChain, we can answer questions grounded in the original document. Optionally, the solution can be wrapped in a Gradio interface for an accessible end‑user experience. Make sure to replace the API key and file paths as needed when running the notebook on your own machine.