## Problem Statement: Automated Data Query and Retrieval System Using Offline(free & open source) Large Language Models With CSV, MongoDB, LlamaIndex, and LangChain

## Task 1

## 1. CSVData Management:
1. Youwill be provided with a CSV file containing various columns of data.
2. Your first task is to write a Python script to load this data into a MongoDB collection. 
3. Each row of the CSV should be stored as a separate document in the MongoDB database.

In [1]:
!pip install pymongo

Defaulting to user installation because normal site-packages is not writeable


In [2]:
# Import Required Libraries
import pandas as pd
from pymongo import MongoClient

In [3]:
# Connect to MongoDB
# Make sure MongoDB is running before executing this line
client = MongoClient("mongodb://localhost:27017/")  # Update if using a remote server
db = client["shri"]  # Create or use existing database
collection = db["demo"]  # Create or use existing collection

In [4]:
# Load CSV
csv_file = "sample_data.csv"  # Replace with your actual file
df = pd.read_csv(csv_file)

In [5]:
# Convert to dictionary and insert into MongoDB
data = df.to_dict(orient="records")
collection.insert_many(data)

print("CSV data successfully loaded into MongoDB!")

CSV data successfully loaded into MongoDB!


## Task 2

## 2. Dynamic Query Generation using LLM: 
1. The next step involves building a Python-based interface where the user can input the name of a CSV column header. 
2. Based on the user's input, you will use an LLM to generate a MongoDB query that can retrieve relevant data from the database. 
3. Ensure that the generated query is both syntactically correct and logically sound for the given input.

In [6]:
!pip install langchain_community

Defaulting to user installation because normal site-packages is not writeable


In [7]:
pip install gpt4all

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [8]:
pip install llama-index transformers sentence-transformers

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.



In [11]:
import pandas as pd
import tkinter as tk
from tkinter import filedialog, ttk

# Function to load CSV and extract column names
def load_csv():
    global df
    file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
    if file_path:
        df = pd.read_csv('sample_data.csv')
        column_dropdown["values"] = list(df.columns)  # Update dropdown with column names
        column_var.set("")  # Reset dropdown selection
        result_label.config(text="CSV Loaded Successfully! Select a column.")

# Function to show unique values of the selected column
def show_unique_values():
    column_name = column_var.get()
    if column_name and column_name in df.columns:
        unique_values = df[column_name].dropna().unique()
        result_label.config(text=f"Unique values:\n{', '.join(map(str, unique_values[:10]))}...")
    else:
        result_label.config(text="Please select a valid column.")

# GUI Setup
root = tk.Tk()
root.title("CSV Column Explorer")
root.geometry("500x300")

# Load CSV Button
load_button = tk.Button(root, text="Load CSV", command=load_csv)
load_button.pack(pady=10)

# Column Selection Dropdown
column_var = tk.StringVar()
column_dropdown = ttk.Combobox(root, textvariable=column_var)
column_dropdown.pack(pady=5)

# Show Values Button
show_button = tk.Button(root, text="Show Unique Values", command=show_unique_values)
show_button.pack(pady=10)

# Result Label
result_label = tk.Label(root, text="Load a CSV file to begin.", wraplength=400, justify="left")
result_label.pack(pady=10)

# Run the Tkinter Loop
root.mainloop()


## 3. Data Retrieval and Presentation:
1. Execute the MongoDB query generated by the LLM to fetch the required data from the database.
2. Oncethe data is retrieved, you have two options for presenting it:

In [None]:
Generate a MongoDB Query Using LLM

In [None]:
from gpt4all import GPT4All

# Download the model if it doesn't exist locally
# Replace 'mistral-7b-instruct.gguf' with the correct model name if needed
GPT4All.download_model_if_not_exists("mistral-7b-instruct.gguf", allow_download=True)

# Load the local LLM
llm = GPT4All("mistral-7b-instruct.gguf")

# Define user input
user_question = "Get all users older than 30 from the users collection"

# Generate MongoDB query
prompt = f"Convert this into a MongoDB query: {user_question}"
mongo_query = llm.generate(prompt)

print("Generated Query:", mongo_query)  # Review the generated query