# Exit Interview App notebook
The Exit Interview & Onboarding Assistant notebook is designed to streamline knowledge transfer between departing employees and incoming new hires. The core objective of this application is to preserve the valuable insights and experiences of employees as they leave the organization and pass this knowledge to new hires during their onboarding process. This helps ensure that crucial knowledge is not lost, making the transition smoother for new employees.

we use essential components from `Langchain` and other libraries that will allow us to build and utilize the language model for generating the exit and onboarding interview insights. We utilised the `hugginfacehub` for using LM for summarization which is cost efficiant and fast.



##You’ll need several libraries for this:

* `langchain` for workflow orchestration.
* `sentence-transformers` for embeddings.
* `faiss` for vector database operations.
* `openai` for using an LLM.
* `pandas` to structure interview data.
* `groq` to make custom llm flow


In [35]:
! pip install langchain sentence-transformers faiss-cpu openai pandas
! pip install langchain_community
! pip install groq

Collecting groq
  Downloading groq-0.18.0-py3-none-any.whl.metadata (14 kB)
Downloading groq-0.18.0-py3-none-any.whl (121 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.18.0


In [36]:
# Import required libraries
import openai
import faiss
import requests
from groq import Groq
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from langchain.llms import BaseLLM
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.llms import OpenAI
from langchain.llms import HuggingFaceHub


In [81]:
# Setting up huggingfacehub API Key (replace 'your-api-key' with your actual huggingfacehub API key)
huggingfacehub_api_token="<your-huggingfacehub-api-token>"
model_name = "llama-3.3-70b-versatile"  # Replace with the groq model name if your using GroqLLM class

In [9]:
# Load the Sentence Transformer model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [10]:
# Create a FAISS index to store the embeddings (for fast search)
dimension = 384  # Dimensionality of embeddings from the 'all-MiniLM-L6-v2'
index = faiss.IndexFlatL2(dimension)  # Index to perform fast similarity search


In [11]:
# Sample exit interview questions based on the employee's designation
def generate_exit_interview_questions(designation):
    if designation == "Software Engineer":
        return [
            "What technologies did you work with on a daily basis?",
            "How did you manage project deadlines?",
            "Can you explain the most challenging bug or issue you resolved?",
            "What improvements would you suggest for the development process?"
        ]
    elif designation == "Sales Manager":
        return [
            "What strategies did you use to engage clients?",
            "Can you describe a recent successful sales pitch?",
            "How do you manage and track sales performance?",
            "What suggestions do you have for improving the sales pipeline?"
        ]
    # Add more designations as needed
    else:
        return ["Please describe your key responsibilities in your role."]


In [12]:
# Define the function to collect exit interview responses
def collect_exit_interview_responses(designation):
    questions = generate_exit_interview_questions(designation)
    responses = []
    for question in questions:
        print(f"Question: {question}")
        response = input("Answer: ")  # This is where you would collect responses from the employee
        responses.append(response)
    return questions, responses

Storing Answers in a Vector Database: After collecting the answers from the exit interviews, these responses can be stored in a vector database (such as Faiss or Pinecone) for efficient retrieval based on similarity to new employee questions. This would allow the system to provide more accurate responses.

In [13]:

# Store responses in the FAISS database
def store_responses_in_faiss(questions, responses):
    embeddings = embedding_model.encode(responses)  # Create embeddings for the responses
    faiss.normalize_L2(embeddings)  # Normalize embeddings before adding to the index
    embeddings = np.array(embeddings, dtype=np.float32)
    index.add(embeddings)  # Add to the FAISS index

    # Save the question-answer pairs in a DataFrame for later use
    df = pd.DataFrame({"Question": questions, "Answer": responses})
    return df

In [14]:
# Retrieve answers from FAISS for a new joinee's question
def retrieve_answer_from_faiss(question, k=1):
    question_embedding = embedding_model.encode([question])
    faiss.normalize_L2(question_embedding)
    question_embedding = np.array(question_embedding, dtype=np.float32)

    D, I = index.search(question_embedding, k)  # Retrieve the top k similar answers

    # Retrieve the most similar answers from the DataFrame based on index
    return D, I

In [62]:
class GroqLLM(BaseLLM):
    def __init__(self, api_key: str, model_name: str, **kwargs):
        # Initialize the Groq client
        self.client = Groq(api_key=api_key)  # Initialize client before calling super()
        self.model_name = model_name
        super().__init__(**kwargs)

    def _generate(self, prompt: str, stop=None) -> str:
        """
        Send the prompt to the Groq model API and return the response.
        """
        messages = [{"role": "user", "content": prompt}]

        # Ensure correct API usage
        chat_completion = self.client.chat.completions.create(
            messages=messages,
            model=self.model_name
        )

        return chat_completion.choices[0].message.content

    @property
    def _llm_type(self):
        """Return the type of the model."""
        return "groq"

In [78]:
# Define a simple LLM-based summarizer to enhance responses
def summarize_answer_using_llm(answer):
    prompt = f"Please summarize the following answer in a concise manner:\n{answer}"
    # you can also use groq if you want to
    # model=GroqLLM(api_key="<groq_api_key>",model_name=model_name):
    model = HuggingFaceHub(repo_id="EleutherAI/gpt-neox-20b",huggingfacehub_api_token=huggingfacehub_api_token,model_kwargs={"temperature": 0.7})
    chain = LLMChain(llm=model, prompt=PromptTemplate(input_variables=["answer"], template=prompt))
    summarized_answer = chain.run(answer=answer)
    return summarized_answer

# Main application function for onboarding new employee
def onboard_new_employee(question, df):
    D, I = retrieve_answer_from_faiss(question)
    # Retrieve the most relevant answer
    relevant_answer = df.iloc[I[0][0]]["Answer"]
    # Summarize the answer
    summarized_answer = summarize_answer_using_llm(relevant_answer)
    return summarized_answer


## collect the knowledge-base for future hires

In [16]:
designation = input("Enter the employee's designation (e.g., Software Engineer, Sales Manager): ")

Enter the employee's designation (e.g., Software Engineer, Sales Manager): Software Engineer


In [17]:
questions, responses = collect_exit_interview_responses(designation)

Question: What technologies did you work with on a daily basis?
Answer: As an AI Software Engineer, I worked primarily with the following technologies on a daily basis:  Python: The primary language I used for building AI models, machine learning pipelines, and automation scripts. TensorFlow / PyTorch: These deep learning frameworks were essential for building and training AI models, especially for tasks like NLP, image classification, and reinforcement learning. Langchain: For building conversational AI applications and integrating LLMs (Large Language Models) into various workflows. scikit-learn: For traditional machine learning models like decision trees, SVMs, and clustering algorithms. SQL / NoSQL databases: I regularly interacted with databases like PostgreSQL, MongoDB, and Elasticsearch for storing and querying large datasets used in training and inference. Docker & Kubernetes: For containerizing AI models and scaling them in a cloud environment. Git: Version control for collabo

# store the answers as embeddings.

In [20]:
 # Step 2: Store the responses in the FAISS vector database
df = store_responses_in_faiss(questions, responses)



# **INFERENCE**

In [72]:
# Step 3: New employee asks a question (onboarding interview)
new_employee_question = input("Enter a question from the new employee: ")
summarized_answer = onboard_new_employee(new_employee_question, df)

print("\nSummarized answer for the new employee:")
print(summarized_answer)

Enter a question from the new employee: how to manage deadlines?

Summarized answer for the new employee:
Please summarize the following answer in a concise manner:
To manage project deadlines, I implemented the following strategies:  Agile Methodology: I adopted Scrum and worked in 2-week sprints with daily standups, task breakdowns, and regular backlog grooming. This ensured that progress was constantly tracked, and we could adapt to changing requirements or delays quickly. Task Prioritization: I used tools like Jira or Trello to track tasks and prioritize them based on their importance and complexity. I would break down larger tasks into smaller milestones to keep the team on track. Time Management Tools: I leveraged tools like Pomodoro Technique for focused work periods and Time Blocking for allocating specific times to tasks, ensuring I focused on critical work and avoided distractions. Effective Communication: Regular check-ins with the project manager and team members helped ide



In [73]:
new_employee_question = input("Enter a question from the new employee: ")
summarized_answer = onboard_new_employee(new_employee_question, df)

print("\nSummarized answer for the new employee:")
print(summarized_answer)

Enter a question from the new employee: What strategies or techniques do you recommend for managing large datasets efficiently?

Summarized answer for the new employee:
Please summarize the following answer in a concise manner:
As an AI Software Engineer, I worked primarily with the following technologies on a daily basis:  Python: The primary language I used for building AI models, machine learning pipelines, and automation scripts. TensorFlow / PyTorch: These deep learning frameworks were essential for building and training AI models, especially for tasks like NLP, image classification, and reinforcement learning. Langchain: For building conversational AI applications and integrating LLMs (Large Language Models) into various workflows. scikit-learn: For traditional machine learning models like decision trees, SVMs, and clustering algorithms. SQL / NoSQL databases: I regularly interacted with databases like PostgreSQL, MongoDB, and Elasticsearch for storing and querying large datasets



In [31]:
new_employee_question = input("Enter a question from the new employee: ")
summarized_answer = onboard_new_employee(new_employee_question, df)

print("\nSummarized answer for the new employee:")
print(summarized_answer)

Enter a question from the new employee: what will be the new challenges for me?





Summarized answer for the new employee:
Please summarize the following answer in a concise manner:
Here are some key improvements I would suggest for the development process:  More Robust Testing: Implement automated tests for the entire machine learning pipeline, from data preprocessing to model training and evaluation. Unit tests for data cleaning, feature engineering, and custom functions can help catch bugs early. Model Monitoring: Implement a proper monitoring system for deployed models. Tracking model drift and ensuring that models are still accurate over time with new data is essential for AI systems. Tools like MLflow or TensorBoard can provide insights into model performance. Improved Collaboration: Foster greater collaboration between data scientists, AI engineers, and other software engineers. Better communication between teams on deployment, integration, and scaling will make sure models are more easily integrated into production systems. CI/CD for AI Models: Implement con