🎯 Objective:
To hire a passionate and experienced science educator with at least 5 years of teaching experience, who can teach Physics, Chemistry, or Mathematics to high school students, align lessons with NEP/CBSE/ICSE/National Curriculum standards, and contribute to STEM excellence.

✅ Key Responsibilities:
Design and deliver engaging science lessons using real-world examples and experiments.

Teach concepts across Physics, Mathematics, and General Science, integrating tools like digital labs, smartboards, and online simulations.

Prepare students for competitive exams and project-based learning.

Assess student progress and tailor lesson plans based on performance data.

Collaborate with faculty on cross-disciplinary projects.

Foster scientific thinking, inquiry, and curiosity among students.

🛠️ Required Skills:
In-depth subject knowledge in Physics, Chemistry, and/or Mathematics.

Hands-on experience with tools such as:

LabView, Arduino, MATLAB (preferred)

Microsoft Teams, Google Classroom, LMS platforms

Python basics or coding exposure for STEM enrichment is a plus

Strong communication and classroom management.

Ability to develop and execute science fairs, olympiads, or STEM clubs.

Creative use of teaching aids, multimedia content, and technology.

📘 Minimum Qualifications:
B.Ed. with M.Sc./M.Tech/B.Tech in Physics, Mathematics, or Chemistry.

Minimum 5 years of experience in a recognized school or coaching institute.

Exposure to NEP 2020 and competency-based teaching frameworks is advantageous.

🌟 Preferred Attributes:
Research background or published papers in educational innovation.

Participation in teacher training programs or curriculum design.

Prior experience teaching grades IX to XII or equivalent competitive exam preparation.

💼 Real-World Impact:
By hiring the right science educator, we aim to:

Elevate the school’s performance in STEM subjects.

Drive student engagement through inquiry-based learning.

Prepare students for future careers in science, technology, and engineering.

In [3]:
import pandas as pd

resume_dataset= pd.read_csv(r"C:\Users\sandi\Desktop\My Working Git\NLP Resume Parser\data\Resume\Resume.csv")

resume_dataset.sample(10)

Unnamed: 0,ID,Resume_str,Resume_html,Category
24,87968870,HR GENERALIST Summary Ener...,"<div class=""fontsize fontface vmargins hmargin...",HR
391,17481570,ASSISTANT TEACHER Summary A...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER
613,23246831,BILLING ACCOUNTANT Summary ...,"<div class=""fontsize fontface vmargins hmargin...",ACCOUNTANT
512,95350373,CONSULTANT Professional Ove...,"<div class=""fontsize fontface vmargins hmargin...",CONSULTANT
760,20992320,MANAGEMENT CONSULTANT Skills ...,"<div class=""fontsize fontface vmargins hmargin...",BANKING
673,79041971,BANKING Summary High-ene...,"<div class=""fontsize fontface vmargins hmargin...",BANKING
261,24083609,INFORMATION TECHNOLOGY SPECIALIST (IN...,"<div class=""fontsize fontface vmargins hmargin...",INFORMATION-TECHNOLOGY
80,25724495,REGIONAL HR MANAGER Summary ...,"<div class=""fontsize fontface vmargins hmargin...",HR
700,27120528,MORTGAGE BANKING DEFAULT OPERATIONS S...,"<div class=""fontsize fontface vmargins hmargin...",BANKING
501,95429627,CONSULTANT Highlights ...,"<div class=""fontsize fontface vmargins hmargin...",CONSULTANT


First let us see what are the unique job profiles the resumes have

In [4]:
resume_dataset['Category'].unique()

array(['HR', 'DESIGNER', 'INFORMATION-TECHNOLOGY', 'TEACHER',
       'CONSULTANT', 'ACCOUNTANT', 'BANKING'], dtype=object)

We will be shortlisting the top 5 Science Teacher candidates for the position, so we need to fist fileter out only the category with teacher profile.

In [5]:
teacher_resume = resume_dataset[resume_dataset['Category'] == 'TEACHER']

teacher_resume.head()


Unnamed: 0,ID,Resume_str,Resume_html,Category
337,12467531,TEACHER Professional Summary ...,"<div class=""LCA skn-cbg1 fontsize fontface vma...",TEACHER
338,19918523,TEACHER Summary I taught 5th...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER
339,62184086,TEACHER Skills chart...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER
340,28063132,TEACHER Summary Obtain a pos...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER
341,29797594,TEACHER Skills E ducato...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER


In [6]:
print(f"The total number of resumes for Data Scientist profiles is: {len(teacher_resume)}")

The total number of resumes for Data Scientist profiles is: 102


In [7]:
unique_resumes = teacher_resume['Resume_str'].drop_duplicates()
print(f"Number of unique resumes: {len(unique_resumes)}")

Number of unique resumes: 102


In [8]:
# Define keywords to search for
keywords = ["science", "Science"]

# Combine keywords into a regex pattern
pattern = '|'.join(keywords)

# Filter resumes that contain any of the keywords (case-insensitive)
matches = teacher_resume[teacher_resume['Resume_str'].str.contains(pattern, case=False, na=False)]

# Show the matched resumes
matches[['ID', 'Resume_str', 'Category']]

Unnamed: 0,ID,Resume_str,Category
339,62184086,TEACHER Skills chart...,TEACHER
342,22408666,TEACHER Summary Kind an...,TEACHER
343,13087952,TEACHER Farrah M. Bauman ...,TEACHER
346,33704389,TEACHER Summary My applied...,TEACHER
347,16210888,TEACHER Core Accomplishment...,TEACHER
...,...,...,...
434,20478831,HOMEBOUND TEACHER Career Focus ...,TEACHER
435,79663360,SUBSTITUTE TEACHER Professional...,TEACHER
436,76196367,CLASSROOM TEACHER Summary ...,TEACHER
437,27524018,ASSISTANT TEACHER Career Focus ...,TEACHER


In [9]:
print(f"The total number of resumes for the Science Teacher profile is: {len(matches)}")

The total number of resumes for the Science Teacher profile is: 63


In [10]:
science_teacher_jd = """
We are looking for a passionate and experienced science educator for grade 5th to 7th with at least 5 years of experience in teaching Science. 
The candidate should be skilled in modern teaching techniques, classroom technology, have exposure to STEM-based learning. 
Hands-on experience with labs, simulations, and digital tools is preferred. The candidate must be well-versed in syllabus like CBSE, ICSE, and NEP 2020.
Skills include classroom delivery, curriculum design, tech-enabled instruction, student-centric Science Teaching with hands-on demonstrations.
"""

In [11]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Ensure we work on a copy to avoid SettingWithCopyWarning
teacher_resume = teacher_resume.copy()

# Fill NaN values in resume text
teacher_resume['Resume_str'] = teacher_resume['Resume_str'].fillna("")

# Combine job description and resumes
documents = [science_teacher_jd] + teacher_resume['Resume_str'].tolist()

# TF-IDF vectorization
vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)
vectors = vectorizer.fit_transform(documents)

# Calculate cosine similarity
similarities = cosine_similarity(vectors[0:1], vectors[1:]).flatten()

# Assign similarity scores to dataframe
teacher_resume['match_score'] = similarities

# Sort top candidates
top_candidates = teacher_resume.sort_values(by='match_score', ascending=False).head(5)

# Display top candidates
# Display top candidates with ID, Resume_str, Resume_html, and match score
top_candidates[['ID', 'Resume_str', 'Resume_html', 'match_score']].reset_index(drop=True).reset_index(drop=True)

Unnamed: 0,ID,Resume_str,Resume_html,match_score
0,33704389,TEACHER Summary My applied...,"<div class=""fontsize fontface vmargins hmargin...",0.248734
1,69532425,PRE-SERVICE TEACHER Summary ...,"<div class=""fontsize fontface vmargins hmargin...",0.247312
2,28086303,TEACHER Summary Kind and com...,"<div class=""fontsize fontface vmargins hmargin...",0.239457
3,90363254,TEACHER Summary Highly e...,"<div class=""fontsize fontface vmargins hmargin...",0.188227
4,36206485,TEACHER Summary An elementar...,"<div class=""fontsize fontface vmargins hmargin...",0.176158


In [12]:
import re

def extract_fields_from_jd(jd_text):
    skills_keywords = re.findall(r"\b(?:skills?|proficient in|experienced in)\b.*?[.]", jd_text, re.IGNORECASE | re.DOTALL)
    experience_match = re.search(r"(\d+)\s*(?:\+)?\s*(?:years?|yrs?)\s+of\s+experience", jd_text, re.IGNORECASE)

    jd_skills = " ".join(skills_keywords)
    jd_experience = int(experience_match.group(1)) if experience_match else 0

    return jd_skills, jd_experience

jd_skills_text, jd_required_exp = extract_fields_from_jd(science_teacher_jd)

In [13]:
print(jd_skills_text)

Skills include classroom delivery, curriculum design, tech-enabled instruction, student-centric Science Teaching with hands-on demonstrations.


In [14]:
print(jd_required_exp)

5


In [15]:
from sentence_transformers import SentenceTransformer, util

# Load embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")  # Free and lightweight model

# Extract resume texts
resume_texts = top_candidates['Resume_str'].tolist()

# Compute embeddings
jd_embedding = model.encode(jd_skills_text, convert_to_tensor=True)
resume_embeddings = model.encode(resume_texts, convert_to_tensor=True)

# Compute cosine similarity scores
semantic_scores = util.cos_sim(jd_embedding, resume_embeddings).squeeze().tolist()

# Add to DataFrame
top_candidates['semantic_score'] = semantic_scores

  return forward_call(*args, **kwargs)


In [16]:
import re

# A>B>C in terms of experience

def classify_experience(text, required_exp):
    if not isinstance(text, str):
        return "C"  # If the resume text is missing or invalid
    
    exp_match = re.search(r"(\d+)\s*(?:\+)?\s*(?:years?|yrs?)", text, re.IGNORECASE)
    if exp_match:
        years = int(exp_match.group(1))
    else:
        years = 0

    if years >= required_exp:
        return "A"
    elif years >= 1:
        return "B"
    else:
        return "C"

# Define the required experience, e.g., 3 years
jd_required_exp = 5

# Apply to each resume text
exp_classes = [classify_experience(resume, jd_required_exp) for resume in top_candidates['Resume_str']]

# Add to the DataFrame
top_candidates['experience_class'] = exp_classes

In [17]:
from sentence_transformers import SentenceTransformer, util

# Step 1: Extract top candidate fields
top_resumes = top_candidates['Resume_str'].fillna("").tolist()
top_ids = top_candidates['ID'].tolist()
top_htmls = top_candidates['Resume_html'].tolist()

# Step 2: Compute semantic similarity scores
resume_embeddings = model.encode(top_resumes, convert_to_tensor=True)
semantic_scores = util.cos_sim(jd_embedding, resume_embeddings).squeeze().tolist()

# Step 3: Classify experience
exp_classes = [classify_experience(resume, jd_required_exp) for resume in top_resumes]

# Step 4: Build final DataFrame
scored_df = pd.DataFrame({
    "ID": top_ids,
    "Resume": top_resumes,
    "Resume_HTML": top_htmls,
    "Semantic_Score": semantic_scores,
    "Exp_Class": exp_classes
})

# Step 5: Rank and sort
exp_class_order = {"A": 1, "B": 2, "C": 3}
scored_df["Class_Rank"] = scored_df["Exp_Class"].map(exp_class_order)
scored_df = scored_df.sort_values(by=["Semantic_Score", "Class_Rank"], ascending=[False, False])

# Step 6: Display final top candidates
scored_df[["ID", "Semantic_Score", "Exp_Class", "Resume_HTML"]].reset_index(drop=True).head(5)

  return forward_call(*args, **kwargs)


Unnamed: 0,ID,Semantic_Score,Exp_Class,Resume_HTML
0,28086303,0.524466,C,"<div class=""fontsize fontface vmargins hmargin..."
1,36206485,0.440137,C,"<div class=""fontsize fontface vmargins hmargin..."
2,33704389,0.433565,C,"<div class=""fontsize fontface vmargins hmargin..."
3,69532425,0.400058,C,"<div class=""fontsize fontface vmargins hmargin..."
4,90363254,0.321361,A,"<div class=""fontsize fontface vmargins hmargin..."


In [18]:
from IPython.core.display import display, HTML

# Show first resume in original format
display(HTML(scored_df.iloc[0]['Resume_HTML']))

  from IPython.core.display import display, HTML


In [None]:
import re
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from openai import OpenAI
import os
from dotenv import load_dotenv

# --- Assume data and JD are already defined ---
# science_teacher_jd: str (defined earlier)
# teacher_resume: pd.DataFrame (loaded from CSV, filtered on "Teacher")
# Make sure 'Resume_str' and 'Resume_html' are present and non-null
teacher_resume['Resume_str'] = teacher_resume['Resume_str'].fillna("")

# --- Initialize embedding model ---
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

# --- Embed Job Description ---
jd_embedding = embed_model.encode(science_teacher_jd, convert_to_numpy=True)

# --- Embed all resumes ---
resume_embeddings = embed_model.encode(teacher_resume['Resume_str'].tolist(), convert_to_numpy=True)

# --- Semantic similarity matching ---
similarities = cosine_similarity([jd_embedding], resume_embeddings).flatten()
teacher_resume['match_score'] = similarities

# --- Top N candidates based on semantic match ---
top_k = 5
top_candidates = teacher_resume.sort_values(by="match_score", ascending=False).head(top_k).copy()

# --- Initialize OpenRouter DeepSeek client ---

load_dotenv(dotenv_path=os.path.join(os.path.dirname(os.getcwd()), '.env'))
api_key = os.getenv("OPENROUTER_API_KEY")

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=api_key
)

REASONING_MODEL = "mistralai/mistral-small-3.2-24b-instruct:free"

# --- Reasoning-based reranking ---
ranked_results = []

for _, row in top_candidates.iterrows():
    prompt = f"""
You are a highly analytical recruiter specializing in hiring science teachers. Evaluate the following candidate's resume against the given job description.

---

### 🧾 Job Description:
{science_teacher_jd}

Focus particularly on:
1. **Skills** (most important): classroom delivery, curriculum design, tech-enabled instruction, student-centric science teaching, hands-on experiments/demonstrations.
2. **Experience** (secondary): total years and relevance to the job.

---

### 📄 Candidate Resume:
{row['Resume_str']}

---

### ✅ Evaluation Instructions:
- First, **match the candidate's skills** against the required ones.
- Second, consider **relevant experience** (but only if core skills are present).
- Be **objective** and **brief**.
- Assign a **final score out of 100**, and justify it clearly.

---

### 🧠 Output Format:
**Score:** XX/100  
**Strengths:**  
- (List strong matches with skills or experience)

**Concerns:**  
- (Mention mismatches, missing criteria, or unclear parts)
"""

    try:
        response = client.chat.completions.create(
            model=REASONING_MODEL,
            messages=[
                {"role": "system", "content": "You are a highly accurate, concise hiring assistant. Your job is to score candidates based on skill and experience match with scientific clarity."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7
        )
        reasoning = response.choices[0].message.content
    except Exception as e:
        reasoning = f"Error in reasoning: {e}"

    ranked_results.append({
        "ID": row["ID"],
        "match_score": row["match_score"],
        "Resume_str": row["Resume_str"],
        "Resume_HTML": row.get("Resume_html", ""),  # Safely get HTML if present
        "LLM_reasoning": reasoning
    })

# --- Convert to final dataframe ---
final_df = pd.DataFrame(ranked_results)

# --- Extract numeric LLM score ---
def extract_score(text):
    match = re.search(r'(\d+(\.\d+)?)/100', text)
    return float(match.group(1)) if match else None

final_df['llm_score'] = final_df['LLM_reasoning'].apply(extract_score)

# --- Show result ---
final_df[["ID", "match_score", "llm_score", "LLM_reasoning", "Resume_HTML"]].head(top_k)

  return forward_call(*args, **kwargs)


Unnamed: 0,ID,match_score,llm_score,LLM_reasoning,Resume_HTML
0,28086303,0.54768,,Error in reasoning: Error code: 401 - {'error'...,"<div class=""fontsize fontface vmargins hmargin..."
1,29930479,0.528443,,Error in reasoning: Error code: 401 - {'error'...,"<div class=""fontsize fontface vmargins hmargin..."
2,10504237,0.524966,,Error in reasoning: Error code: 401 - {'error'...,"<div class=""fontsize fontface vmargins hmargin..."
3,13087952,0.518985,,Error in reasoning: Error code: 401 - {'error'...,"<div class=""fontsize fontface vmargins hmargin..."
4,33704389,0.513293,,Error in reasoning: Error code: 401 - {'error'...,"<div class=""fontsize fontface vmargins hmargin..."


In [19]:
from IPython.core.display import display, HTML

# Show first resume in original format
display(HTML(final_df.iloc[0]['Resume_HTML']))

  from IPython.core.display import display, HTML


In [20]:
from IPython.core.display import display, HTML

# Show first resume in original format
display(HTML(final_df.iloc[0]['LLM_reasoning']))

  from IPython.core.display import display, HTML


In [21]:
top_candidates_llm= final_df.to_csv(r"C:\Users\sandi\Desktop\My Working Git\NLP Resume Parser\data\Resume\top_candidates.csv", index=False)