<a href="https://colab.research.google.com/github/sharmaraja/AI-Tools/blob/main/AI_Interview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# !pip install langchain-community langchain-chroma sentence-transformers pypdf
# !pip install mistralai langchain faiss-cpu pypdf sentence-transformers

Collecting opentelemetry-exporter-otlp-proto-common==1.39.1 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached opentelemetry_exporter_otlp_proto_common-1.39.1-py3-none-any.whl.metadata (1.8 kB)
Collecting opentelemetry-proto==1.39.1 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached opentelemetry_proto-1.39.1-py3-none-any.whl.metadata (2.3 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached opentelemetry_sdk-1.39.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached opentelemetry_api-1.39.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-semantic-conventions==0.60b1 (from opentelemetry-sdk>=1.2.0->chromadb<2.0.0,>=1.3.5->langchain-chroma)
  Using cached opentelemetry_semantic_conventions-0.60b1-py3-none-any.whl.metadata (2.4 kB

In [2]:
import os
from dotenv import load_dotenv
from google.colab import userdata
from mistralai import Mistral
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

In [15]:
mistral_key = userdata.get('mistral_02')
if mistral_key:
    print("Mistral_01 value fetched")

Mistral_01 value fetched


In [17]:
client = Mistral(api_key=mistral_key)
MODEL = "mistral-small-latest"

In [5]:
# --------- Load PDFs ----------
def load_pdf(path):
    loader = PyPDFLoader(path)
    return loader.load()

In [6]:
def rag_impl(resume_docs, jd_docs):
  ''' Implements RAGs for the input resume and JD description'''

  documents = resume_docs + jd_docs

  # --------- Chunking ----------
  splitter = RecursiveCharacterTextSplitter(
      chunk_size=500,
      chunk_overlap=100
  )
  chunks = splitter.split_documents(documents)

  # --------- Embeddings ----------
  embeddings = HuggingFaceEmbeddings(
      model_name="sentence-transformers/all-MiniLM-L6-v2"
  )

  vectorstore = FAISS.from_documents(chunks, embeddings)
  retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

  match_pct = get_match_percentage(retriever)
  print(f"\nResume–JD Match: {match_pct}%")

  if match_pct < 60:
    print("❌ Match below 60%. Candidate rejected.")
  else:
    print("✅ Match above 60%. Generating interview questions...\n")
    questions = generate_questions(retriever)
    print(questions)

In [7]:
# --------- Match Percentage Prompt ----------
MATCH_PROMPT = """
You are an ATS system.

Given the CONTEXT below (resume + job description):
1. Calculate percentage match between resume and JD.
2. Consider skills, experience, tools, projects.
3. Output ONLY a number between 0 and 100.

CONTEXT:
{context}
"""

In [8]:
def get_match_percentage(retriever):
    docs = retriever.invoke("resume job description match")
    context = "\n".join([d.page_content for d in docs])

    response = client.chat.complete(
        model=MODEL,
        messages=[
            {"role": "user", "content": MATCH_PROMPT.format(context=context)}
        ]
    )
    return float(response.choices[0].message.content.strip())

In [9]:
# --------- Question Generation Prompt ----------
QUESTION_PROMPT = """
You are a technical interviewer.

Using the CONTEXT:
- Job description requirements
- Skills mentioned in resume
- Projects done by candidate

Generate:
1. 5 technical questions on the job description
2. 3 project-based questions on the projects done by candidate
3. 2 skill-based questions on the skills mentioned in resume
3. 2 scenario-based questions based on the job description

CONTEXT:
{context}
"""

In [10]:
def generate_questions(retriever):
    docs = retriever.invoke("skills projects requirements")
    context = "\n".join([d.page_content for d in docs])

    response = client.chat.complete(
        model=MODEL,
        messages=[
            {"role": "user", "content": QUESTION_PROMPT.format(context=context)}
        ]
    )
    return response.choices[0].message.content

In [18]:
# --------- Pipeline ----------
if __name__ == "__main__":

  # Get file paths from user
    resume_path = input("Enter resume PDF path (e.g. Rajat__Sharma_AI_ML.pdf): ").strip()
    jd_path    = input("Enter Job Description PDF path: ").strip()

    # Loading pdf files
    resume_docs = load_pdf(resume_path)
    jd_docs = load_pdf(jd_path)

    print('Documents fetched!')

    # Calling
    rag_impl(resume_docs, jd_docs)

    print("\nProcessing...\n")
    # Rajat__Sharma_AI_ML, Yabble Machine Learning Engineer Job Description.pdf

Enter resume PDF path (e.g. Rajat__Sharma_AI_ML.pdf): Rajat__Sharma_AI_ML.pdf
Enter Job Description PDF path: Yabble Machine Learning Engineer Job Description.pdf
Documents fetched!


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.



Resume–JD Match: 75.0%
✅ Match above 60%. Generating interview questions...

Here are the tailored questions based on the provided context:

---

### **1. 5 Technical Questions (Job Description Focused)**
1. **NLP/NLU/NLG**: Can you explain the difference between NLP, NLU, and NLG? How would you approach designing a system that combines all three?
2. **AI/ML Model Development**: Walk us through your process of designing, developing, and training an ML model from scratch. What frameworks (e.g., PyTorch, Keras) have you used, and why?
3. **Large Language Models (LLMs)**: How would you evaluate the performance of an LLM for a specific use case? What metrics would you prioritize?
4. **Data Engineering**: How do you handle data extraction, cleaning, and feature engineering for an AI/ML project? What SQL techniques do you use for data manipulation?
5. **Generative AI (RAGs, Agents)**: How would you implement a Retrieval-Augmented Generation (RAG) system? What challenges might arise, and how