#  Quick CV Roaster - SIMPLE VERSION

**Two Easy Options:**
1. **Drag & Drop** your PDF into the `uploaded_cvs/` folder, then enter the filename below
2. **Use existing CV** from the dataset

---

## Step 1: Setup

In [3]:
import sys
sys.path.append('..')  # Add parent directory

from cv_processor import CVProcessor
from pathlib import Path
import pandas as pd
import json

print("Imports successful!")

Imports successful!


## Step 2: Add Your API Key

The API key is stored locally in a config.py file (not included in the project repository), for security reasons

In [4]:
# Load API key from config.py
import sys
sys.path.append('..')
from config import GEMINI_API_KEY
print("API key loaded from config.py")

API key loaded from config.py


In [5]:
# Initialize the CV Processor with API key
import google.generativeai as genai

genai.configure(api_key=GEMINI_API_KEY)
processor = CVProcessor(api_key=GEMINI_API_KEY)
print(" CV Processor initialized successfully!")

 CV Processor initialized successfully!


## Step 3: Upload Your PDF CV (Simple Method)

**Instructions:**
1. Create folder `uploaded_cvs/` in the notebooks directory (if it doesn't exist)
2. Drag and drop your PDF into that folder
3. Enter the filename below and run the cell

In [6]:
# Create upload directory if it doesn't exist
upload_dir = Path('uploaded_cvs')
upload_dir.mkdir(exist_ok=True)

print(f" Upload folder: {upload_dir.absolute()}")
print("\n1. Copy your PDF CV into this folder")
print("2. Then enter the filename below and run the next cell")

 Upload folder: /Users/hannokuegler/Library/CloudStorage/OneDrive-WUWien/SBWL/Data Science/4_LLM/roast_my_cv/roast_my_cv/notebooks/uploaded_cvs

1. Copy your PDF CV into this folder
2. Then enter the filename below and run the next cell


In [22]:
# ==========================================
# ENTER YOUR PDF FILENAME HERE:
# ==========================================
PDF_FILENAME = "CVLoescher.pdf"  # Change this to your PDF filename
# ==========================================

pdf_path = upload_dir / PDF_FILENAME

if pdf_path.exists():
    print(f" Found PDF: {pdf_path}")
    
    # Extract text
    cv_text = processor.extract_text_from_pdf(str(pdf_path))
    
    print(f" Extracted {len(cv_text)} characters\n")
    print(f"Preview:\n{'-'*80}")
    print(cv_text[:500])
    print(f"{'-'*80}")
    
    use_uploaded = True
    
else:
    print(f" PDF not found: {pdf_path}")
    print(f"\nMake sure:")
    print(f"1. PDF is in folder: {upload_dir.absolute()}")
    print(f"2. Filename matches: {PDF_FILENAME}")
    print(f"\nOr use Option 3B to select from dataset instead.")
    
    cv_text = None
    use_uploaded = False

 Found PDF: uploaded_cvs/CVLoescher.pdf
 Extracted 2945 characters

Preview:
--------------------------------------------------------------------------------
Raphael Löscher
Höhenstraße 36 |Neulengbach, Austria
+43 650 2772216 | raphael.loescher@gmail.com
EDUCATION
Vienna University of Economics and Business Vienna, Austria
Bachelor of Business, Economics and Social Sciences Sep 2023 – now
• Major: Business Informatics
• Specializations: Data Science, Finance: Markets, Institutions & Instruments, Information Management & Control
• Active member of the Business Finance Club
Higher Technical Education Institute St. Pölten, Austria
Matura (equiv. A-Leve
--------------------------------------------------------------------------------


In [23]:
# ==========================================
# SELECT CV INDEX (0-9):
# ==========================================
CV_INDEX = 0  # Change this number (0-9)
# ==========================================

if not use_uploaded or cv_text is None:
    # Format CV from dataset
    def format_cv_for_llm(resume_row):
        cv_text = []
        
        if pd.notna(resume_row.get('career_objective')):
            cv_text.append(f"CAREER OBJECTIVE:\n{resume_row['career_objective']}")
        
        if pd.notna(resume_row.get('skills')):
            cv_text.append(f"\nSKILLS:\n{resume_row['skills']}")
        
        education_parts = []
        if pd.notna(resume_row.get('educational_institution_name')):
            education_parts.append(f"Institution: {resume_row['educational_institution_name']}")
        if pd.notna(resume_row.get('degree_names')):
            education_parts.append(f"Degree: {resume_row['degree_names']}")
        if pd.notna(resume_row.get('major_field_of_studies')):
            education_parts.append(f"Major: {resume_row['major_field_of_studies']}")
        
        if education_parts:
            cv_text.append(f"\nEDUCATION:\n" + "\n".join(education_parts))
        
        work_parts = []
        if pd.notna(resume_row.get('professional_company_names')):
            work_parts.append(f"Company: {resume_row['professional_company_names']}")
        if pd.notna(resume_row.get('positions')):
            work_parts.append(f"Position: {resume_row['positions']}")
        if pd.notna(resume_row.get('responsibilities')):
            work_parts.append(f"Responsibilities:\n{resume_row['responsibilities']}")
        
        if work_parts:
            cv_text.append(f"\nWORK EXPERIENCE:\n" + "\n".join(work_parts))
        
        return "\n".join(cv_text)
    
    cv_text = format_cv_for_llm(df.iloc[CV_INDEX])
    
    print(f" Loaded CV #{CV_INDEX} from dataset\n")
    print(f"Preview:\n{'-'*80}")
    print(cv_text[:500])
    print(f"{'-'*80}")
else:
    print("ℹ Using uploaded PDF - skip this cell or set use_uploaded=False above to use dataset instead")

ℹ Using uploaded PDF - skip this cell or set use_uploaded=False above to use dataset instead


## Step 4: Choose Roaster Model

In [24]:
# ==========================================
# CHOOSE MODEL: 'gentle', 'medium', 'brutal', or 'all'
# ==========================================
ROASTER_MODEL = 'brutal'  # Change to: 'gentle', 'medium', 'brutal', or 'all'
# ==========================================

print(f" Selected model: {ROASTER_MODEL.upper()}")

if ROASTER_MODEL == 'gentle':
    print(" Gentle - Constructive & encouraging (Temperature: 0.4)")
elif ROASTER_MODEL == 'medium':
    print(" Medium - Direct & honest (Temperature: 0.7)")
elif ROASTER_MODEL == 'brutal':
    print(" Brutal - Savage & funny (Temperature: 0.9)")
elif ROASTER_MODEL == 'all':
    print(" All Three - Side-by-side comparison")

 Selected model: BRUTAL
 Brutal - Savage & funny (Temperature: 0.9)


## Step 5: Generate Roast! 

In [25]:
if cv_text is None:
    print(" No CV loaded!")
else:
    print(" Generating critique(s)...\n")
    
    # Generate critiques
    critiques = processor.generate_critiques(cv_text)
    
    # Display based on selection
    if ROASTER_MODEL == 'all':
        for model_name in ['gentle', 'medium', 'brutal']:
            icon = {'gentle': '', 'medium': '', 'brutal': ''}[model_name]
            print("="*80)
            print(f"{icon} {model_name.upper()} ROASTER")
            print("="*80)
            print(critiques[model_name])
            print("\n" * 2)
    else:
        icon = {'gentle': '', 'medium': '', 'brutal': ''}[ROASTER_MODEL]
        print("="*80)
        print(f"{icon} {ROASTER_MODEL.upper()} ROASTER CRITIQUE")
        print("="*80)
        print(critiques[ROASTER_MODEL])
        print("\n")
    
    print(" Critique(s) generated!")

 Generating critique(s)...

Generating gentle critique...
Generating medium critique...
Generating brutal critique...
 BRUTAL ROASTER CRITIQUE
Alright Raphael, let's dissect this CV like a frog in a high school biology class. Prepare for some brutal honesty, seasoned with a dash of wit. Consider it tough love, Austrian style.

**OPENING ROAST:**

Höhenstraße 36, Neulengbach? Sounds quaint. I bet the Wi-Fi there is as reliable as a politician's promise. And that Gmail address? Seriously? In this day and age, you're trying to land a job in tech with an email address that screams "still uses dial-up"? You might as well be applying with a carrier pigeon.

**CAREER OBJECTIVE AUTOPSY:**

Oh wait, there ISN'T one! Good. Because career objectives are like those "Live, Laugh, Love" signs – utterly pointless and universally hated. Instead, you dove right in. I commend your lack of cliché, but maybe a *tiny* hook wouldn't hurt. Just a little something to grab my attention before I start yawning a

## Step 6: Evaluation Metrics 

**Precision, Recall, and F1 Scores:**

In [21]:
if cv_text and 'critiques' in locals():
    print(" CALCULATING METRICS...\n")
    
    # Evaluate all models
    df_results = processor.evaluate_all_models(cv_text, critiques)
    
    print("="*80)
    print("EVALUATION RESULTS - ALL MODELS")
    print("="*80)
    print("\n Performance Metrics:\n")
    print(df_results[['model', 'precision', 'recall', 'f1_score', 'coverage_rate']].to_string(index=False))
    
    # Find best model
    best_idx = df_results['f1_score'].idxmax()
    best_model = df_results.loc[best_idx, 'model']
    best_f1 = df_results.loc[best_idx, 'f1_score']
    
    print(f"\n Best Model: {best_model.upper()} (F1: {best_f1:.2%})")
    
    # Detailed analysis for selected or best model
    if ROASTER_MODEL == 'all':
        analysis_model = best_model
    else:
        analysis_model = ROASTER_MODEL
    
    print(f"\n{'='*80}")
    print(f"DETAILED ANALYSIS - {analysis_model.upper()} MODEL")
    print("="*80)
    
    detection = processor.calculate_issue_detection_metrics(cv_text, critiques[analysis_model])
    coverage = processor.calculate_section_coverage(cv_text, critiques[analysis_model])
    
    print(f"\n Issue Detection Quality:")
    print(f"   Precision:  {detection['precision']:.2%}  (How accurate are the critique's claims?)")
    print(f"   Recall:     {detection['recall']:.2%}  (How many real issues were caught?)")
    print(f"   F1 Score:   {detection['f1_score']:.2%}  (Overall balance)")
    
    print(f"\n Confusion Matrix:")
    print(f"   True Positives:  {detection['true_positives']}   (Correctly identified issues)")
    print(f"   False Positives: {detection['false_positives']}    (False alarms)")
    print(f"   False Negatives: {detection['false_negatives']}   (Missed issues)")
    
    print(f"\n Issue Breakdown:")
    print(f"   Actual CV Issues (ground truth):  {detection['ground_truth_issues']}")
    print(f"   Issues mentioned in critique:     {detection['detected_issues']}")
    
    if detection['missed_issues']:
        print(f"     Issues MISSED by critique:     {detection['missed_issues']}")
    
    if detection['extra_mentions']:
        print(f"   ℹ  Extra issues mentioned:         {detection['extra_mentions']}")
    
    print(f"\n Section Coverage:")
    print(f"   Coverage Rate: {coverage['coverage_rate']:.2%}")
    print(f"   Sections Addressed: {coverage['sections_addressed_in_critique']}/{coverage['total_sections_in_cv']}")
    
    print(f"\n{'='*80}")
    
    # Interpretation
    print("\n INTERPRETATION:")
    if detection['f1_score'] >= 0.7:
        print("    GOOD: This critique has high quality (F1 ≥ 0.7)")
    elif detection['f1_score'] >= 0.5:
        print("     ACCEPTABLE: Decent quality but could be better (F1 = 0.5-0.7)")
    else:
        print("    POOR: Low quality critique (F1 < 0.5)")
    
    if detection['precision'] < 0.6:
        print("     Low precision: Critique may be making up issues")
    if detection['recall'] < 0.6:
        print("     Low recall: Critique is missing important issues")
    if coverage['coverage_rate'] >= 0.8:
        print("    Excellent coverage: Most CV sections were reviewed")
    
else:
    print(" Run Step 5 first to generate critiques")

 CALCULATING METRICS...

EVALUATION RESULTS - ALL MODELS

 Performance Metrics:

 model  precision  recall  f1_score  coverage_rate
gentle        0.0       0         0           1.00
medium        0.0       0         0           0.75
brutal        0.0       0         0           1.00

 Best Model: GENTLE (F1: 0.00%)

DETAILED ANALYSIS - BRUTAL MODEL

 Issue Detection Quality:
   Precision:  0.00%  (How accurate are the critique's claims?)
   Recall:     0.00%  (How many real issues were caught?)
   F1 Score:   0.00%  (Overall balance)

 Confusion Matrix:
   True Positives:  0   (Correctly identified issues)
   False Positives: 2    (False alarms)
   False Negatives: 0   (Missed issues)

 Issue Breakdown:
   Actual CV Issues (ground truth):  []
   Issues mentioned in critique:     ['formatting', 'relevance']
   ℹ  Extra issues mentioned:         ['relevance', 'formatting']

 Section Coverage:
   Coverage Rate: 100.00%
   Sections Addressed: 4/4


 INTERPRETATION:
    POOR: Low quality c

---

##  Quick Summary

### How to Use This Notebook:

1. **Setup** (Step 1-2): Import libraries and add API key
2. **Input CV** (Step 3):
   - **Option A**: Put PDF in `uploaded_cvs/` folder, enter filename
   - **Option B**: Load existing CV from dataset
3. **Choose Model** (Step 4): Set `ROASTER_MODEL` variable
4. **Generate** (Step 5): Get your roast!
5. **Metrics** (Step 6): See precision, recall, F1 scores


### Understanding Metrics:

| Metric | What it means | Good Score |
|--------|---------------|------------|
| **Precision** | % of mentioned issues that are real | > 0.7 |
| **Recall** | % of real issues that were caught | > 0.7 |
| **F1 Score** | Overall quality (balance of both) | > 0.7 |
| **Coverage** | % of CV sections reviewed | > 0.8 |
