<a href="https://colab.research.google.com/github/prayag2301/ACP_proto/blob/shivam/Sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [27]:
'''Model Used:
Utilizes distilbert-base-uncased-finetuned-sst-2-english, a DistilBERT model fine-tuned for binary sentiment classification (POSITIVE/NEGATIVE).

Device Compatibility:
Loads model and tokenizer, and sets them to run on available device (CPU or GPU).

Text Sections Analyzed:
From CV: summary and experience descriptions
From JD: summary and responsibilities

Function - analyze_sentiment:
Tokenizes input text (with padding and truncation to 512-token limit).
Runs tokenized inputs through the model to get logits.
Applies softmax to convert logits into probabilities.
Returns sentiment label (POSITIVE/NEGATIVE) and confidence score.
Returns NEUTRAL with score 0.5 if input is empty or an error occurs.

Storing Results:
CV results stored in sentiment_scores
JD results stored in jd_sentiment_scores

Function - compare_sentiments:
Compares CV and JD sentiment per section.

Computes:
Label alignment: 1 if sentiment labels match, 0 otherwise.
Score difference: absolute difference between confidence scores.
Alignment score: (1 - score difference) * label alignment.

Purpose of Comparison:
Quantifies tonal alignment between candidate and job (e.g., summary alignment score = 0.9985 → strong match).

Application in Enrichment:
Cultural Fit Indicators: e.g., “Strong cultural fit with a dynamic, growth-oriented environment”
JD Enrichment: e.g., “Candidate likely prefers roles involving strategic leadership”

Goal:
Improve candidate-job matching by aligning sentiment tone, indicating cultural fit and implied preferences.
'''

'Model Used:\nUtilizes distilbert-base-uncased-finetuned-sst-2-english, a DistilBERT model fine-tuned for binary sentiment classification (POSITIVE/NEGATIVE).\n\nDevice Compatibility:\nLoads model and tokenizer, and sets them to run on available device (CPU or GPU).\n\nText Sections Analyzed:\nFrom CV: summary and experience descriptions\nFrom JD: summary and responsibilities\n\nFunction - analyze_sentiment:\nTokenizes input text (with padding and truncation to 512-token limit).\nRuns tokenized inputs through the model to get logits.\nApplies softmax to convert logits into probabilities.\nReturns sentiment label (POSITIVE/NEGATIVE) and confidence score.\nReturns NEUTRAL with score 0.5 if input is empty or an error occurs.\n\nStoring Results:\nCV results stored in sentiment_scores\nJD results stored in jd_sentiment_scores\n\nFunction - compare_sentiments:\nCompares CV and JD sentiment per section.\n\nComputes:\nLabel alignment: 1 if sentiment labels match, 0 otherwise.\nScore difference

In [30]:
import openai
import pdfplumber
import json
from openai import OpenAI
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import os

# Load the sentiment analysis model
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# STEP 1: Load PDF and extract text (for both CV and JD)
def extract_text_from_pdf(file_path):
    with pdfplumber.open(file_path) as pdf:
        return "\n".join([page.extract_text() for page in pdf.pages if page.extract_text()])

# STEP 2: Perform sentiment analysis on extracted text using the loaded DistilBERT model
def analyze_sentiment(text_sections: dict) -> dict:
    """
    Analyzes sentiment of various text sections (e.g., summary, experience descriptions) using DistilBERT.

    Args:
        text_sections (dict): Dictionary with section names and their text content.

    Returns:
        dict: Sentiment scores for each section (label: POSITIVE/NEGATIVE, score: 0–1).
    """
    sentiment_scores = {}
    for section, text in text_sections.items():
        if text and isinstance(text, str):  # Ensure text is a non-empty string
            try:
                # Tokenize the text
                inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
                inputs = {key: value.to(device) for key, value in inputs.items()}  # Move to device (CPU/GPU)

                # Perform inference
                with torch.no_grad():
                    outputs = model(**inputs)
                    logits = outputs.logits
                    probabilities = torch.softmax(logits, dim=1).cpu().numpy()[0]  # Convert logits to probabilities

                # DistilBERT SST-2 has two labels: 0 (NEGATIVE), 1 (POSITIVE)
                label_id = probabilities.argmax()
                label = "POSITIVE" if label_id == 1 else "NEGATIVE"
                score = float(probabilities[label_id])  # Confidence score for the predicted label

                sentiment_scores[section] = {
                    "label": label,
                    "score": score
                }
            except Exception as e:
                print(f"Error analyzing sentiment for {section}: {e}")
                sentiment_scores[section] = {"label": "NEUTRAL", "score": 0.5}  # Default on error
        else:
            sentiment_scores[section] = {"label": "NEUTRAL", "score": 0.5}  # Default for empty text
    return sentiment_scores

# STEP 3: Define JSON schema and prompt for CV parsing
def build_cv_prompt(resume_text):
    json_schema = {
        "name": "",
        "email": "",
        "phone": "",
        "country": "",
        "city": "",
        "summary": "",
        "skills": [
            {
                "specialized skill": "",
                "common skill": ""
            }
        ],
        "experience": [
            {
                "job_title": "",
                "company": "",
                "start_date": "",
                "end_date": "",
                "description": ""
            }
        ],
        "education": [
            {
                "degree": "",
                "institution": "",
                "start_year": "",
                "end_year": ""
            }
        ],
        "enrichment parameters": [
            {
                "Employment Pattern & Progression": "",
                "Company Type & Sector": "",
                "Education Quality & Ranking": "",
                "Skill Demand & Market Relevance": "",
                "Leadership Experience": "",
                "Budget & Project Management": "",
                "International Experience & Mobility": "",
                "Soft Skills from Sales Calls": "",
                "Personality & Behavioral Traits": "",
                "Future Career Goals (Sales-Inferred)": "",
                "Salary Expectations (Sales-Inferred)": "",
                "JD Enrichment with Implied Preferences": "",
                "Cultural Fit Indicators": ""
            }
        ],
        "sentiment_scores": {},  # CV sentiment scores
        "jd_sentiment_scores": {},  # JD sentiment scores
        "sentiment_comparison": {}  # Comparison between CV and JD sentiments
    }

    prompt = f"""
You are an expert resume parser. Convert the resume text below into this JSON format. Fill in all the relevant fields. Leave the enrichment_parameters, sentiment_scores, jd_sentiment_scores, and sentiment_comparison fields empty (they will be filled later).
The JSON schema is as follows:

{json.dumps(json_schema, indent=2)}

Resume:
\"\"\"
{resume_text}
\"\"\"
"""
    return prompt

# STEP 4: Define JSON schema and prompt for JD parsing
def build_jd_prompt(jd_text):
    json_schema = {
        "job_title": "",
        "company": "",
        "summary": "",
        "responsibilities": [""],
        "requirements": [""]
    }

    prompt = f"""
You are an expert job description parser. Convert the job description text below into this JSON format. Fill in all the relevant fields.
The JSON schema is as follows:

{json.dumps(json_schema, indent=2)}

Job Description:
\"\"\"
{jd_text}
\"\"\"
"""
    return prompt

# STEP 5: Call OpenAI API
def call_openai(prompt):
    os.environ["OPENAI_API_KEY"] = "sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    api_key = os.environ["OPENAI_API_KEY"]
    client = OpenAI(api_key=api_key)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    return response.choices[0].message.content

# STEP 6: Compare sentiments between CV and JD
def compare_sentiments(cv_scores: dict, jd_scores: dict) -> dict:
    """
    Compares sentiment scores between CV and JD.

    Args:
        cv_scores (dict): Sentiment scores from the CV.
        jd_scores (dict): Sentiment scores from the JD.

    Returns:
        dict: Comparison results with alignment scores and notes.
    """
    comparison = {}

    # Compare summary sentiments
    cv_summary = cv_scores.get("summary", {"label": "NEUTRAL", "score": 0.5})
    jd_summary = jd_scores.get("summary", {"label": "NEUTRAL", "score": 0.5})

    # Label alignment (1 if labels match, 0 if they don't)
    summary_label_alignment = 1 if cv_summary["label"] == jd_summary["label"] else 0
    # Score difference (absolute difference between scores)
    summary_score_diff = abs(cv_summary["score"] - jd_summary["score"])
    # Alignment score (higher is better, max 1.0)
    summary_alignment = max(0.0, 1.0 - summary_score_diff) * summary_label_alignment

    comparison["summary"] = {
        "cv_label": cv_summary["label"],
        "cv_score": cv_summary["score"],
        "jd_label": jd_summary["label"],
        "jd_score": jd_summary["score"],
        "label_alignment": summary_label_alignment,
        "score_difference": summary_score_diff,
        "alignment_score": summary_alignment,
        "note": "High alignment indicates similar tone in CV and JD summaries, suggesting cultural fit."
    }

    # Compare experience vs responsibilities
    cv_exp = cv_scores.get("experience_descriptions", {"label": "NEUTRAL", "score": 0.5})
    jd_resp = jd_scores.get("responsibilities", {"label": "NEUTRAL", "score": 0.5})

    exp_label_alignment = 1 if cv_exp["label"] == jd_resp["label"] else 0
    exp_score_diff = abs(cv_exp["score"] - jd_resp["score"])
    exp_alignment = max(0.0, 1.0 - exp_score_diff) * exp_label_alignment

    comparison["experience_vs_responsibilities"] = {
        "cv_label": cv_exp["label"],
        "cv_score": cv_exp["score"],
        "jd_label": jd_resp["label"],
        "jd_score": jd_resp["score"],
        "label_alignment": exp_label_alignment,
        "score_difference": exp_score_diff,
        "alignment_score": exp_alignment,
        "note": "High alignment suggests the candidate's experience matches the tone of the job responsibilities."
    }

    return comparison

# STEP 7: Main function
def main():
    # Paths for CV and JD
    cv_path = "/content/John_Doe_CV.pdf"
    jd_path = "/content/Finance_Executive_JD.pdf"

    # Extract text from CV and JD
    cv_text = extract_text_from_pdf(cv_path)
    jd_text = extract_text_from_pdf(jd_path)

    # Parse CV
    cv_prompt = build_cv_prompt(cv_text)
    parsed_cv_str = call_openai(cv_prompt)

    parsed_cv_str = parsed_cv_str.strip()
    if parsed_cv_str.startswith("```json"):
        parsed_cv_str = parsed_cv_str[len("```json"):]
    if parsed_cv_str.endswith("```"):
        parsed_cv_str = parsed_cv_str[:-len("```")]
    parsed_cv_str = parsed_cv_str.strip()

    try:
        parsed_cv = json.loads(parsed_cv_str)
    except json.JSONDecodeError as e:
        print(f"Error parsing CV OpenAI response: {e}")
        print("Raw CV response:", parsed_cv_str)
        return

    # Parse JD
    jd_prompt = build_jd_prompt(jd_text)
    parsed_jd_str = call_openai(jd_prompt)

    # Fix JSON parsing for JD
    parsed_jd_str = parsed_jd_str.strip()
    if parsed_jd_str.startswith("```json"):
        parsed_jd_str = parsed_jd_str[len("```json"):]
        parsed_jd_str = parsed_jd_str[:-len("```")]
    parsed_jd_str = parsed_jd_str.strip()

    try:
        parsed_jd = json.loads(parsed_jd_str)
    except json.JSONDecodeError as e:
        print(f"Error parsing JD OpenAI response: {e}")
        print("Raw JD response:", parsed_jd_str)
        return

    cv_text_sections = {
        "summary": parsed_cv.get("summary", ""),
        "experience_descriptions": " ".join(
            [exp.get("description", "") for exp in parsed_cv.get("experience", []) if exp.get("description")]
        )
    }

    # Extract text sections for sentiment analysis (JD)
    jd_text_sections = {
        "summary": parsed_jd.get("summary", ""),
        "responsibilities": " ".join(parsed_jd.get("responsibilities", []))
    }

    # Perform sentiment analysis for CV and JD
    cv_sentiment_scores = analyze_sentiment(cv_text_sections)
    jd_sentiment_scores = analyze_sentiment(jd_text_sections)

    # Compare sentiments
    sentiment_comparison = compare_sentiments(cv_sentiment_scores, jd_sentiment_scores)

    # Add sentiment scores and comparison to the parsed CV
    parsed_cv["sentiment_scores"] = cv_sentiment_scores
    parsed_cv["jd_sentiment_scores"] = jd_sentiment_scores
    parsed_cv["sentiment_comparison"] = sentiment_comparison

    print("Parsed CV with Sentiment Comparison:")
    print(json.dumps(parsed_cv, indent=2))

    output_dir = "./gpt_parsed_CVs"
    os.makedirs(output_dir, exist_ok=True)

    output_path = os.path.join(output_dir, "John_Doe_parsed_cv_with_sentiment_comparison.json")
    with open(output_path, "w") as f:
        json.dump(parsed_cv, f, indent=2)
    print(f"Parsed CV with sentiment comparison saved successfully to {output_path}.")

if __name__ == "__main__":
    main()



Parsed CV with Sentiment Comparison:
{
  "name": "John Doe",
  "email": "johndoe@email.com",
  "phone": "+1-234-567-8901",
  "country": "",
  "city": "",
  "summary": "Seasoned finance executive with 18+ years of experience in financial strategy, risk management, and capital raising. Expertise in scaling startups and optimizing financial operations to drive profitability and growth.",
  "skills": [
    {
      "specialized skill": "Financial Strategy & Planning",
      "common skill": "Budgeting & Forecasting"
    },
    {
      "specialized skill": "Risk Management",
      "common skill": ""
    },
    {
      "specialized skill": "Mergers & Acquisitions",
      "common skill": ""
    },
    {
      "specialized skill": "Venture Capital & Fundraising",
      "common skill": ""
    }
  ],
  "experience": [
    {
      "job_title": "Chief Financial Officer",
      "company": "XYZ Tech Solutions",
      "start_date": "2018",
      "end_date": "Present",
      "description": "- Spearheade

In [17]:
#Ignore this code snippet
'''import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the sentiment analysis model
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Function to analyze sentiment
def analyze_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1).cpu().numpy()
    positive_score = probabilities[0][1]  # Index 1 represents positive sentiment
    return positive_score

# Function to calculate OCEAN scores from sentiment
def calculate_ocean(positive_sentiment):
    return {
        'Openness': positive_sentiment * 0.8 + 0.2,
        'Conscientiousness': positive_sentiment * 0.7 + 0.3,
        'Extraversion': positive_sentiment * 0.9 + 0.1,
        'Agreeableness': positive_sentiment * 0.85 + 0.15,
        'Neuroticism': 1 - positive_sentiment
    }

# Load candidate and JD data
cv_df = pd.read_csv("CandidateCV.csv")
jd_df = pd.read_csv("JD Parse.csv")

# Extract relevant text from CVs and JDs
cv_df['text'] = cv_df[['Mandatory Skills', 'Preferred Skills', 'Nice-to-Have Skills', 'Recent Projects (Last 5 Years)']].fillna('').agg(' '.join, axis=1)
jd_df['text'] = jd_df[['Mandatory Skills', 'Preferred Skills', 'Nice-to-Have Skills', 'Recent Projects (Last 5 Years)']].fillna('').agg(' '.join, axis=1)

# Compute sentiment scores and OCEAN traits
cv_df['sentiment_score'] = cv_df['text'].apply(analyze_sentiment)
jd_df['sentiment_score'] = jd_df['text'].apply(analyze_sentiment)

cv_df['OCEAN'] = cv_df['sentiment_score'].apply(calculate_ocean)
jd_df['OCEAN'] = jd_df['sentiment_score'].apply(calculate_ocean)

# Find the best candidate for each JD with OCEAN breakdown
best_matches = []
for _, jd in jd_df.iterrows():
    jd_sentiment = jd['sentiment_score']
    jd_ocean = jd['OCEAN']

    # Compute compatibility for each candidate
    def compatibility_score(candidate):
        ocean_diff = {trait: abs(candidate['OCEAN'][trait] - jd_ocean[trait]) for trait in jd_ocean}
        ocean_avg_diff = sum(ocean_diff.values()) / len(ocean_diff)  # Average OCEAN difference
        sentiment_diff = abs(candidate['sentiment_score'] - jd_sentiment)
        return sentiment_diff + ocean_avg_diff, ocean_diff  # Combined score

    cv_df['compatibility_score'], cv_df['ocean_diff'] = zip(*cv_df.apply(compatibility_score, axis=1))

    # Find the best candidate
    best_candidate = cv_df.loc[cv_df['compatibility_score'].idxmin()]
    best_matches.append({
        'JD': jd['text'],
        'Best Candidate': best_candidate['Candidate Name / Job Title'],
        'Compatibility Score': best_candidate['compatibility_score'],
        'OCEAN Difference': best_candidate['ocean_diff']
    })

# Convert results into a DataFrame
best_match_df = pd.DataFrame(best_matches)

# Print the best matches
print(best_match_df)'''


'import pandas as pd\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n# Load the sentiment analysis model\ntokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")\nmodel = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")\ndevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")\nmodel.to(device)\nmodel.eval()\n\n# Function to analyze sentiment\ndef analyze_sentiment(text):\n    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512).to(device)\n    with torch.no_grad():\n        outputs = model(**inputs)\n    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1).cpu().numpy()\n    positive_score = probabilities[0][1]  # Index 1 represents positive sentiment\n    return positive_score\n\n# Function to calculate OCEAN scores from sentiment\ndef calculate_ocean(positive_se

In [29]:
#Used to generate JD ignore the snippet
'''!pip install fpdf
from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
jd_text = """Senior Finance Executive Job Description

Job Title: Senior Finance Executive
Company: InnovateFin Corp
Location: New York, NY

Summary:
InnovateFin Corp is seeking a dynamic and experienced Senior Finance Executive to lead our financial strategy and drive sustainable growth. The ideal candidate will have a proven track record in financial planning, risk management, and capital raising, with a passion for scaling businesses and optimizing operations. Join our innovative team to shape the financial future of a fast-growing fintech company.

Responsibilities:
- Develop and execute financial strategies to support company growth and profitability goals.
- Lead fundraising efforts, including securing venture capital and managing investor relations.
- Oversee mergers and acquisitions, ensuring seamless integration of new entities.
- Manage large-scale budgets and implement solutions to improve financial efficiency and reporting accuracy.
- Collaborate with cross-functional teams to drive strategic initiatives and operational excellence.

Requirements:
- 15+ years of experience in finance, with at least 5 years in a senior leadership role.
- Expertise in financial strategy, risk management, and capital raising.
- Proven experience with mergers and acquisitions.
- Strong analytical and problem-solving skills.
- MBA in Finance or related field from a top-tier institution preferred."""
for line in jd_text.split("\n"):
    pdf.cell(200, 10, txt=line, ln=True)
pdf.output("/content/Finance_Executive_JD.pdf")'''

'!pip install fpdf\nfrom fpdf import FPDF\n\npdf = FPDF()\npdf.add_page()\npdf.set_font("Arial", size=12)\njd_text = """Senior Finance Executive Job Description\n\nJob Title: Senior Finance Executive\nCompany: InnovateFin Corp\nLocation: New York, NY\n\nSummary:\nInnovateFin Corp is seeking a dynamic and experienced Senior Finance Executive to lead our financial strategy and drive sustainable growth. The ideal candidate will have a proven track record in financial planning, risk management, and capital raising, with a passion for scaling businesses and optimizing operations. Join our innovative team to shape the financial future of a fast-growing fintech company.\n\nResponsibilities:\n- Develop and execute financial strategies to support company growth and profitability goals.\n- Lead fundraising efforts, including securing venture capital and managing investor relations.\n- Oversee mergers and acquisitions, ensuring seamless integration of new entities.\n- Manage large-scale budgets a