This project implements an HR Chatbot using LangChain, MongoDB, OpenAI's language models, and Google APIs. It includes synthetic data generation, embedding creation, and a chatbot interface for querying HR-related information and interacting with Google services.
- Prerequisites
- Installation
- Configuration
- Google API Setup
- Synthetic Data Generation
- Data Ingestion and Embedding Generation
- Running the Chatbot
- Project Structure
- Contributing
- License
- Python 3.8+
- MongoDB
- OpenAI API key
- Google Cloud Platform account
- Git (for version control)
-
Clone the repository:
git clone https://github.com/your-username/hr-agentic-chatbot.git cd hr-agentic-chatbot -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.envfile in the project root and add the following variables:MONGO_URI=your_mongodb_connection_string OPENAI_API_KEY=your_openai_api_key -
Replace
your_mongodb_connection_stringwith your actual MongoDB connection string andyour_openai_api_keywith your OpenAI API key.
- Go to the Google Cloud Console.
- Create a new project or select an existing one.
- Enable the following APIs for your project:
- Google Drive API
- Google Docs API
- Gmail API
- Create credentials (OAuth 2.0 Client ID) for a Desktop application:
- Go to "Credentials" in the left sidebar.
- Click "Create Credentials" and select "OAuth client ID".
- Choose "Desktop app" as the application type.
- Download the client configuration file and rename it to
credentials.json. - Place
credentials.jsonin the root directory of the project.
- The first time you run the application, it will prompt you to authorize access:
- A browser window will open asking you to log in to your Google account.
- Grant the requested permissions.
- The application will then create a
token.jsonfile in the project root.
Note: Keep credentials.json and token.json secure and do not share them publicly.
To generate synthetic data for companies, workforce, and employees:
-
Navigate to the
datadirectory:cd data -
Run the synthetic data generation script:
python synthetic_data_generation.py
This will create JSON files (companies.json, workforce.json, employees.json) in the data directory.
To ingest the synthetic data into MongoDB and generate embeddings for employees:
-
Ensure you're in the project root directory.
-
Run the data ingestion script:
python data/ingestion.py
This script will:
- Read the JSON files from the
datadirectory - Generate embeddings for employee data using OpenAI's API
- Insert the data (including embeddings) into MongoDB
To start the HR Chatbot:
-
Ensure you're in the project root directory.
-
Run the main script:
chainlit run app.py
-
Open your web browser and navigate to the URL provided in the terminal (usually
http://localhost:8000). -
Interact with the chatbot through the web interface.
HR_AGENTIC_CHATBOT/
│
├── .chainlit/
├── .files/
├── data/
│ ├── __init__.py
│ ├── companies.json
│ ├── employees.json
│ ├── ingestion.py
│ ├── synthetic_data_generation.py
│ └── workforce.json
│
├── mongodb/
│ ├── __init__.py
│ ├── checkpointer.py
│ └── connect.py
│
├── tools/
│ ├── google_tools.py
│ └── mongodb_tools.py
│
├── .env
├── .gitignore
├── agent.py
├── app.py
├── chainlit.md
├── config.py
├── credentials.json # Google OAuth 2.0 credentials
├── db_utils.py
├── graph.py
├── README.md
├── requirements.txt
├── temp.py
├── token.json # Generated after first Google auth
└── utilities.py
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.