#  Quick CV Roaster - SIMPLE VERSION

**Two Easy Options:**
1. **Drag & Drop** your PDF into the `uploaded_cvs/` folder, then enter the filename below
2. **Use existing CV** from the dataset

---

## Step 1: Setup

In [53]:
import sys
sys.path.append('..')  # Add parent directory

from cv_processor import CVProcessor
from pathlib import Path
import pandas as pd
import json

print(" Imports successful!")

 Imports successful!


## Step 2: Add Your API Key

The API key is stored locally in a config.py file (not included in the project repository), for security reasons

In [54]:
# Load API key from config.py
import sys
sys.path.append('..')
from config import GEMINI_API_KEY
print("API key loaded from config.py")

API key loaded from config.py


In [55]:
# Initialize the CV Processor with API key
import google.generativeai as genai

genai.configure(api_key=GEMINI_API_KEY)
processor = CVProcessor(api_key=GEMINI_API_KEY)
print(" CV Processor initialized successfully!")

 CV Processor initialized successfully!


## Step 3: Upload Your PDF CV (Simple Method)

**Instructions:**
1. Create folder `uploaded_cvs/` in the notebooks directory (if it doesn't exist)
2. Drag and drop your PDF into that folder
3. Enter the filename below and run the cell

In [60]:
# Create upload directory if it doesn't exist
upload_dir = Path('uploaded_cvs')
upload_dir.mkdir(exist_ok=True)

print(f" Upload folder: {upload_dir.absolute()}")
print("\n1. Copy your PDF CV into this folder")
print("2. Then enter the filename below and run the next cell")

 Upload folder: /Users/hannokuegler/Library/CloudStorage/OneDrive-WUWien/SBWL/Data Science/4_LLM/roast_my_cv/roast_my_cv/notebooks/uploaded_cvs

1. Copy your PDF CV into this folder
2. Then enter the filename below and run the next cell


In [61]:
# ==========================================
# ENTER YOUR PDF FILENAME HERE:
# ==========================================
PDF_FILENAME = "hannokuegler.pdf"  # Change this to your PDF filename
# ==========================================

pdf_path = upload_dir / PDF_FILENAME

if pdf_path.exists():
    print(f" Found PDF: {pdf_path}")
    
    # Extract text
    cv_text = processor.extract_text_from_pdf(str(pdf_path))
    
    print(f" Extracted {len(cv_text)} characters\n")
    print(f"Preview:\n{'-'*80}")
    print(cv_text[:500])
    print(f"{'-'*80}")
    
    use_uploaded = True
    
else:
    print(f" PDF not found: {pdf_path}")
    print(f"\nMake sure:")
    print(f"1. PDF is in folder: {upload_dir.absolute()}")
    print(f"2. Filename matches: {PDF_FILENAME}")
    print(f"\nOr use Option 3B to select from dataset instead.")
    
    cv_text = None
    use_uploaded = False

 Found PDF: uploaded_cvs/hannokuegler.pdf
 Extracted 1792 characters

Preview:
--------------------------------------------------------------------------------
Education ——————————
10/2023 – Wirtschaftsuniversität Wien
today Welthandelsplatz 1 1020 Wien (5rd Semester)
§ economic and social sciences
§ business informatics
§ specialization: data science, production
and operations management
§ top 1% overall
09/2017 – HTL Mössingerstraße
06/2022 Mössingerstraße 25, 9020 Klagenfurt
§ electronics and technical informatics.
§ final grade (Matura): 1,6
Hanno horst Kügler
§ programming, network engineering
§ creativity in developing new solutions
§ analytical 
--------------------------------------------------------------------------------


In [62]:
# Load dataset
df = pd.read_csv('../data/resume_data.csv')

print(f" Dataset has {len(df)} CVs")
print("\nFirst 10 CVs:")
for i in range(min(10, len(df))):
    obj = df.iloc[i].get('career_objective', 'No objective')
    preview = str(obj)[:60] + "..." if pd.notna(obj) and len(str(obj)) > 60 else str(obj)
    print(f"  {i}: {preview}")

 Dataset has 9544 CVs

First 10 CVs:
  0: Big data analytics working and database warehouse manager wi...
  1: Fresher looking to join as a data analyst and junior data sc...
  2: nan
  3: To obtain a position in a fast-paced business office environ...
  4: Professional accountant with an outstanding work ethic and i...
  5: To secure an IT specialist, desktop support, network adminis...
  6: nan
  7: nan
  8: Certified Data analyst with a degree in Electronics Engineer...
  9: nan


In [63]:
# ==========================================
# SELECT CV INDEX (0-9):
# ==========================================
CV_INDEX = 0  # Change this number (0-9)
# ==========================================

if not use_uploaded or cv_text is None:
    # Format CV from dataset
    def format_cv_for_llm(resume_row):
        cv_text = []
        
        if pd.notna(resume_row.get('career_objective')):
            cv_text.append(f"CAREER OBJECTIVE:\n{resume_row['career_objective']}")
        
        if pd.notna(resume_row.get('skills')):
            cv_text.append(f"\nSKILLS:\n{resume_row['skills']}")
        
        education_parts = []
        if pd.notna(resume_row.get('educational_institution_name')):
            education_parts.append(f"Institution: {resume_row['educational_institution_name']}")
        if pd.notna(resume_row.get('degree_names')):
            education_parts.append(f"Degree: {resume_row['degree_names']}")
        if pd.notna(resume_row.get('major_field_of_studies')):
            education_parts.append(f"Major: {resume_row['major_field_of_studies']}")
        
        if education_parts:
            cv_text.append(f"\nEDUCATION:\n" + "\n".join(education_parts))
        
        work_parts = []
        if pd.notna(resume_row.get('professional_company_names')):
            work_parts.append(f"Company: {resume_row['professional_company_names']}")
        if pd.notna(resume_row.get('positions')):
            work_parts.append(f"Position: {resume_row['positions']}")
        if pd.notna(resume_row.get('responsibilities')):
            work_parts.append(f"Responsibilities:\n{resume_row['responsibilities']}")
        
        if work_parts:
            cv_text.append(f"\nWORK EXPERIENCE:\n" + "\n".join(work_parts))
        
        return "\n".join(cv_text)
    
    cv_text = format_cv_for_llm(df.iloc[CV_INDEX])
    
    print(f" Loaded CV #{CV_INDEX} from dataset\n")
    print(f"Preview:\n{'-'*80}")
    print(cv_text[:500])
    print(f"{'-'*80}")
else:
    print("ℹ Using uploaded PDF - skip this cell or set use_uploaded=False above to use dataset instead")

ℹ Using uploaded PDF - skip this cell or set use_uploaded=False above to use dataset instead


## Step 4: Choose Roaster Model

In [64]:
# ==========================================
# CHOOSE MODEL: 'gentle', 'medium', 'brutal', or 'all'
# ==========================================
ROASTER_MODEL = 'brutal'  # Change to: 'gentle', 'medium', 'brutal', or 'all'
# ==========================================

print(f" Selected model: {ROASTER_MODEL.upper()}")

if ROASTER_MODEL == 'gentle':
    print(" Gentle - Constructive & encouraging (Temperature: 0.4)")
elif ROASTER_MODEL == 'medium':
    print(" Medium - Direct & honest (Temperature: 0.7)")
elif ROASTER_MODEL == 'brutal':
    print(" Brutal - Savage & funny (Temperature: 0.9)")
elif ROASTER_MODEL == 'all':
    print(" All Three - Side-by-side comparison")

 Selected model: BRUTAL
 Brutal - Savage & funny (Temperature: 0.9)


## Step 5: Generate Roast! 

In [65]:
if cv_text is None:
    print(" No CV loaded! Go back to Step 3A or 3B")
else:
    print(" Generating critique(s)...\n")
    
    # Generate critiques
    critiques = processor.generate_critiques(cv_text)
    
    # Display based on selection
    if ROASTER_MODEL == 'all':
        for model_name in ['gentle', 'medium', 'brutal']:
            icon = {'gentle': '', 'medium': '', 'brutal': ''}[model_name]
            print("="*80)
            print(f"{icon} {model_name.upper()} ROASTER")
            print("="*80)
            print(critiques[model_name])
            print("\n" * 2)
    else:
        icon = {'gentle': '', 'medium': '', 'brutal': ''}[ROASTER_MODEL]
        print("="*80)
        print(f"{icon} {ROASTER_MODEL.upper()} ROASTER CRITIQUE")
        print("="*80)
        print(critiques[ROASTER_MODEL])
        print("\n")
    
    print(" Critique(s) generated!")

 Generating critique(s)...

Generating gentle critique...
Generating medium critique...
Generating brutal critique...
 BRUTAL ROASTER CRITIQUE
Alright, buckle up, buttercup. We're about to dissect this CV like a frog in a high school biology class, except this time, the frog can talk back (and hopefully learn something).

**OPENING ROAST:**

"Hanno horst Kügler"... Okay, first things first, are you sure you're not a character from a Wes Anderson film? That name is so exquisitely Austrian, it's practically yodeling at me. It's got that certain "I own a ski chalet and judge competitive strudel baking" vibe.

**CAREER OBJECTIVE AUTOPSY:**

Where is it? Oh that's right, you're one of those "I'm too cool for a career objective" types. Instead, you're hoping employers will just *intuit* your deepest professional desires from your scattershot list of skills. I'd bet my next paycheck you're hoping to use AI and Traffic analysis to take over the world. Good Luck!

**SKILLS COMEDY:**

"Creativit

## Step 6: Evaluation Metrics 

**Precision, Recall, and F1 Scores:**

In [40]:
if cv_text and 'critiques' in locals():
    print(" CALCULATING METRICS...\n")
    
    # Evaluate all models
    df_results = processor.evaluate_all_models(cv_text, critiques)
    
    print("="*80)
    print("EVALUATION RESULTS - ALL MODELS")
    print("="*80)
    print("\n Performance Metrics:\n")
    print(df_results[['model', 'precision', 'recall', 'f1_score', 'coverage_rate']].to_string(index=False))
    
    # Find best model
    best_idx = df_results['f1_score'].idxmax()
    best_model = df_results.loc[best_idx, 'model']
    best_f1 = df_results.loc[best_idx, 'f1_score']
    
    print(f"\n Best Model: {best_model.upper()} (F1: {best_f1:.2%})")
    
    # Detailed analysis for selected or best model
    if ROASTER_MODEL == 'all':
        analysis_model = best_model
    else:
        analysis_model = ROASTER_MODEL
    
    print(f"\n{'='*80}")
    print(f"DETAILED ANALYSIS - {analysis_model.upper()} MODEL")
    print("="*80)
    
    detection = processor.calculate_issue_detection_metrics(cv_text, critiques[analysis_model])
    coverage = processor.calculate_section_coverage(cv_text, critiques[analysis_model])
    
    print(f"\n Issue Detection Quality:")
    print(f"   Precision:  {detection['precision']:.2%}  (How accurate are the critique's claims?)")
    print(f"   Recall:     {detection['recall']:.2%}  (How many real issues were caught?)")
    print(f"   F1 Score:   {detection['f1_score']:.2%}  (Overall balance)")
    
    print(f"\n Confusion Matrix:")
    print(f"   True Positives:  {detection['true_positives']}   (Correctly identified issues)")
    print(f"   False Positives: {detection['false_positives']}    (False alarms)")
    print(f"   False Negatives: {detection['false_negatives']}   (Missed issues)")
    
    print(f"\n Issue Breakdown:")
    print(f"   Actual CV Issues (ground truth):  {detection['ground_truth_issues']}")
    print(f"   Issues mentioned in critique:     {detection['detected_issues']}")
    
    if detection['missed_issues']:
        print(f"     Issues MISSED by critique:     {detection['missed_issues']}")
    
    if detection['extra_mentions']:
        print(f"   ℹ  Extra issues mentioned:         {detection['extra_mentions']}")
    
    print(f"\n Section Coverage:")
    print(f"   Coverage Rate: {coverage['coverage_rate']:.2%}")
    print(f"   Sections Addressed: {coverage['sections_addressed_in_critique']}/{coverage['total_sections_in_cv']}")
    
    print(f"\n{'='*80}")
    
    # Interpretation
    print("\n INTERPRETATION:")
    if detection['f1_score'] >= 0.7:
        print("    GOOD: This critique has high quality (F1 ≥ 0.7)")
    elif detection['f1_score'] >= 0.5:
        print("     ACCEPTABLE: Decent quality but could be better (F1 = 0.5-0.7)")
    else:
        print("    POOR: Low quality critique (F1 < 0.5)")
    
    if detection['precision'] < 0.6:
        print("     Low precision: Critique may be making up issues")
    if detection['recall'] < 0.6:
        print("     Low recall: Critique is missing important issues")
    if coverage['coverage_rate'] >= 0.8:
        print("    Excellent coverage: Most CV sections were reviewed")
    
else:
    print(" Run Step 5 first to generate critiques")

 CALCULATING METRICS...

EVALUATION RESULTS - ALL MODELS

 Performance Metrics:

 model  precision  recall  f1_score  coverage_rate
gentle   0.333333     1.0  0.500000              0
medium   0.500000     1.0  0.666667              0
brutal   0.333333     0.5  0.400000              0

 Best Model: MEDIUM (F1: 66.67%)

DETAILED ANALYSIS - BRUTAL MODEL

 Issue Detection Quality:
   Precision:  33.33%  (How accurate are the critique's claims?)
   Recall:     50.00%  (How many real issues were caught?)
   F1 Score:   40.00%  (Overall balance)

 Confusion Matrix:
   True Positives:  1   (Correctly identified issues)
   False Positives: 2    (False alarms)
   False Negatives: 1   (Missed issues)

 Issue Breakdown:
   Actual CV Issues (ground truth):  ['no_metrics', 'formatting']
   Issues mentioned in critique:     ['typos', 'vague_objective', 'formatting']
     Issues MISSED by critique:     ['no_metrics']
   ℹ  Extra issues mentioned:         ['typos', 'vague_objective']

 Section Coverage

---

##  Quick Summary

### How to Use This Notebook:

1. **Setup** (Step 1-2): Import libraries and add API key
2. **Input CV** (Step 3):
   - **Option A**: Put PDF in `uploaded_cvs/` folder, enter filename
   - **Option B**: Load existing CV from dataset
3. **Choose Model** (Step 4): Set `ROASTER_MODEL` variable
4. **Generate** (Step 5): Get your roast!
5. **Metrics** (Step 6): See precision, recall, F1 scores


### Understanding Metrics:

| Metric | What it means | Good Score |
|--------|---------------|------------|
| **Precision** | % of mentioned issues that are real | > 0.7 |
| **Recall** | % of real issues that were caught | > 0.7 |
| **F1 Score** | Overall quality (balance of both) | > 0.7 |
| **Coverage** | % of CV sections reviewed | > 0.8 |
