<a href="https://colab.research.google.com/github/jyotidabass/Chatbot/blob/main/Chatbot1_Employee.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This code imports the necessary libraries, including sqlite3, numpy, and a pre-trained FastText model from the gensim library. The FastText model is a pre-trained word embedding model that can be used for text processing and natural language tasks.**

In [None]:
import sqlite3
import numpy as np
import gensim.downloader as api

# Load a valid pre-trained FastText model
# Check the available models at https://github.com/RaRe-Technologies/gensim-data
model = api.load('fasttext-wiki-news-subwords-300')



**This code block initializes a list of candidate data, with each candidate represented as a dictionary containing various attributes, including a placeholder for an embedding. It then installs the faiss-cpu package and imports the faiss library. The code then extracts the embeddings from the candidate data into a NumPy array for further processing.**

In [None]:
# Example engineer data
engineer_data = [
    {"name": "John Doe", "company": "Google", "skills": ["Python", "React"], "full_time": True, "budget": 10000, "embedding": [0.1, 0.2, ...] }, # Replace with a placeholder embedding or a method to generate embeddings for names
    {"name": "Jane Smith", "company": "Microsoft", "skills": ["AWS", "SQL"], "full_time": True, "budget": 12000, "embedding": [0.3, 0.4, ...] },
    {"name": "Bob Johnson", "company": "Facebook", "skills": ["Python", "Node.js"], "full_time": False, "budget": 8000, "embedding": [0.5, 0.6, ...] },
]

!pip install faiss-cpu
import faiss

# Extract embeddings into a NumPy array
embeddings = np.array([eng["embedding"] for eng in engineer_data])

**This code defines a function, process_query, which takes a user query, embeddings, a Faiss index, a tokenizer, a pre-trained language model, and a database connection as inputs. It processes the query, generates embeddings for it, searches for nearest neighbors in the Faiss index, filters results based on scalar requirements, executes a SQL search, and yields the results. The function also prompts the user for follow-up questions and demonstrates its usage.**

In [None]:
def process_query(query, embeddings, index, tokenizer, model, conn):
    # Preprocess the user query
    input_ids = tokenizer(query, return_tensors='pt').input_ids.squeeze(0)

    # Generate embeddings for the query
    with model.no_grad():
        query_embedding = model(input_ids).last_hidden_state.mean(dim=1)

    # Search for the nearest neighbors
    D, I = index.search(query_embedding.unsqueeze(0), 20)

    # Filter results based on scalar requirements
    scalar_results = []
    for i in I[0]:
        if embeddings[i].dot(np.array([1, 1, 1, 0]).astype(np.float64)) >= 0:  # Full-time
            if embeddings[i].dot(np.array([1, 1, 1, 1, 1]).astype(np.float64)) >= 0:  # 100K+
                scalar_results.append(i)

    # Execute a SQL search based on the nearest neighbors
    with conn:
        for idx in scalar_results:
            cursor = conn.execute("SELECT * FROM engineers WHERE rowid = ?", (idx,))
            result = cursor.fetchone()
            if result:
                yield f"{result[1]} {result[2]} at {result[3]} ({result[4]} similarity)"

    # Follow up with the user
    print("Would you prefer a full-time or part-time worker?")

    # Example usage
    query = "I want to hire someone with experience in Python and Node.js. My budget is $10000 a month."
    conn = sqlite3.connect('engineers.db')
    index = faiss.IndexFlatL2(embeddings[0].shape[0])
    index.add(embeddings.T)
    response = list(process_query(query, embeddings, index, tokenizer, model, conn))
    print(response)