In [1]:
import os
import sys
sys.path.append('../code')

In [298]:
from fpdf import FPDF
from dotenv import load_dotenv
from llm_connect import get_response
from context import save_job_context, clear_context
import pyrsm as rsm
import random
import pdfkit
import uuid

from weasyprint import HTML
import re
from datetime import datetime, timedelta

import pandas as pd
from faker import Faker

from markdown2 import markdown
from weasyprint import HTML

load_dotenv()

True

In [299]:
# Reset the shared job/candidate memory file to start fresh
clear_context()

# Generate Job Description

In [300]:
# Function to generate a realistic job description using an LLM
def generate_job_description(title="Backend Software Engineer", level="Mid-Level", specialization="Backend", years=3):
    # Construct a prompt that tells the LLM what kind of job to describe
    prompt = (
        f"Write a realistic, professional job description for a {level} {title} specializing in {specialization} "
        f"with about minimum {years} years of experience. "
        "Include the following sections:\n"
        "- About the Role\n"
        "- Responsibilities\n"
        "- Required Skills\n"
        "- Preferred Qualifications\n"
        "- Company Culture Highlights\n"
        "- Salary and Visa Requirements\n\n"
        "Model it off of what you would see off of an Amazon job posting."
        "Use clear and professional language. The description should be detailed enough for a LinkedIn or Indeed job post. "
        "Return only the job description text — no formatting like markdown or code blocks."
    )

    # Call the LLM (LLaMA or Gemini) using your custom get_response wrapper
    response = get_response(
        input=prompt,
        template=lambda x: x,   # Use identity function as no template modification is needed
        llm="llama",            # or "gemini"
        md=False,               # Disable markdown formatting in output
        temperature=1,          # Add variation to outputs for creativity
        max_tokens=1000,        # Limit maximum length of generated job description
    )
    
    # Return cleaned-up string output
    return response.strip()


In [301]:
job_description = generate_job_description(
    title="Software Engineer",
    level="Senior",
    specialization="Cloud Infrastructure",
    years=6
)

# Save the generated description to a txt file
with open("data/job_description_cloud_senior.txt", "w", encoding="utf-8") as f:
    f.write(job_description)



In [302]:
print(job_description)

About the Role
We are seeking an experienced Senior Software Engineer specializing in Cloud Infrastructure to join our team, driving the design, development, and operation of large-scale cloud computing systems. As a Senior Software Engineer, you will be responsible for leading the development of cloud infrastructure technologies, collaborating with cross-functional teams, and ensuring the highest levels of system reliability, performance, and security. You will have the opportunity to work on complex problems, develop innovative solutions, and contribute to the growth and evolution of our cloud infrastructure.

Responsibilities
Design, develop, and deploy scalable, secure, and efficient cloud infrastructure systems, including architecture, provisioning, and management of cloud resources. Collaborate with development teams to ensure seamless integration of cloud infrastructure with applications and services. Participate in the evaluation, selection, and implementation of new cloud tech

In [303]:
job_id = str(uuid.uuid4())
job_data = {
    "job_id": job_id,
    "title": "Senior Software Engineer",
    "specialization": "Cloud Infrastructure",
    "years_required": 6,
    "job_description": job_description
}

# Keep track of the job description to be used across the pipeline
save_job_context(job_id, job_data)

## Data Simulation Part 1 (Job Description Generation)

During this section, we are using GenAI to create a job description that you would typically see on recruitment websites (i.e. Indeed, LinkedIn, etc.). This is to allow model context to be implemented across the entire recruitment process. For example, it is necessary for the LLM to be aware of the context it should be evaluating candidates later on; hence, we would this is be readily saved for the LLM to access later. Where it saves the job description is via the `save_job_context()` function where it records general information and other to a `json` file.

There is no general methodology when choosing which LLM to generate this data. However, it should be well documented that gemini requires more prompting and has limited generation capacity when it comes to token generation. As our prompting is very specific, even across many iterations of this script, it seems to create a very consistent job description.

# Generate Resumes

In [304]:
def slugify(text):
    """Converts text to lowercase alphanumeric slug, strips accents."""
    text = re.sub(r'[^\w\s-]', '', text)
    return re.sub(r'[-\s]+', '', text.lower())

In [305]:
# Function to construct a resume generation prompt using a randomly selected format, tone, and section order
def resume_template(input_text):
    # List of stylistic formats the resume might follow
    formats = [
        "Skills-focused resume (Skills section immediately after Profile).",
        "Education-focused resume (Education section near the top).",
        "Project-focused resume (highlight personal software projects).",
        "Experience-heavy resume (detailed bullets for every job).",
        "Minimalistic resume (short, clean sections).",
        "Startup-style resume (dynamic and energetic tone).",
        "Corporate-style resume (formal, detailed language).",
    ]

    # List of tones to influence the LLM's writing style
    tones = [
        "Use formal corporate language.",
        "Use dynamic, action-driven tone.",
        "Use detailed technical language.",
    ]

    # Different possible section orders for resume layout
    section_orders = [
        "Start with Summary, then Skills, then Experience, then Education.",
        "Start with Skills first, then Projects, then Experience, then Education.",
        "Start with Education first, then Experience, then Skills.",
    ]

    # Randomly select a format, tone, and section structure to ensure diverse outputs
    style = random.choice(formats)
    tone = random.choice(tones)
    order = random.choice(section_orders)

    fake = Faker()
    full_name = fake.name()
    slug = slugify(full_name)
    email = f"{slug}@example.com"
    phone = fake.phone_number()
    linkedin = f"linkedin.com/in/{full_name.lower().replace(' ', '')}"
    github = f"github.com/{full_name.lower().split()[0]}{uuid.uuid4().hex[:4]}"

    person_profile = {
        "name": full_name,
        "email": email,
        "phone": phone,
        "linkedin": linkedin,
        "github": github
    }

    # Combine everything into a structured prompt to send to the LLM
    return f"""
You are an expert technical resume writer.

Your task is to generate a **realistic, well-formatted technical resume in valid Markdown** using the structured candidate data provided below. If any information is missing, use your best judgment to fill in realistic, domain-appropriate details.

==============================
CRITICAL CONSTRAINTS
==============================
- DO NOT include any section titled "HEADER", "PROFILE", "CONTACT", or similar.
- DO NOT reorder the sections — follow the structure exactly as defined below.
- DO NOT use code blocks, XML/JSON tags, or Markdown syntax inside skill lists.
- DO NOT add fake markdown like triple backticks (```).
- This resume will be converted directly to PDF. Formatting must be clean and valid.

==============================
START OF RESUME
==============================

Start with:
- **First line**: The candidate's full name, bolded
- **Second line**: Email | Phone | LinkedIn | GitHub
- DO NOT label this section. DO NOT use a heading like “Header” or “Contact”.

Then continue with:

### SUMMARY  
- 2–3 concise lines.
- Include years of experience (or “New Graduate”), technical specialization (e.g., ML, Backend, DevOps), and a key career achievement or goal.
- If missing, invent a realistic, domain-appropriate summary.

### EXPERIENCE  
- Reverse chronological order.
- For each role, use the format:  
  **Job Title**  
  *Company, Dates*  
  - Bullet points that begin with strong action verbs (e.g., Built, Designed, Led)  
  - Mention tools/technologies  
  - Include clear, measurable results (% impact, time savings, scale, etc.)
  - Use the STAR Method in one sentence (Situation, action, task, result)
- If no roles exist, generate work experiences that fit the candidate's background.

### PROJECTS  
- Reverse chronological.
- For each project, use the format:  
  **Project Title**  
  *Org or Context, Dates*  
  - Describe what was built, what tools were used, and the outcome or value delivered.
  - Use the STAR Method in one sentence (Situation, action, task, result)
- If no projects are provided, generate personal or academic projects based on skills.

### TECHNICAL SKILLS  
Format as shown below, using comma-separated lists:

TECHNICAL SKILLS  
Languages: Python, Java, C++  
Frameworks: Django, React, TensorFlow  
Cloud: AWS, Azure, GCP  
Tools: Git, Docker, Jenkins  
Databases: MySQL, MongoDB  
OS: Windows, Linux, macOS

    - Fill in any missing categories based on likely technical profile.

### EDUCATION  
- Format: Degree, University, Year  
- Include GPA (if provided), thesis title (if applicable), and 2–3 relevant courses.  
- Invent reasonable details if education is incomplete or missing.

==============================
FORMATTING RULES
==============================
- Use **bold** for section headers (e.g., `**SUMMARY**`)  
- Place a horizontal rule (`---`) above each section header  
- Do NOT use `#` or `###` Markdown headings
- Job titles must be bold (`**Software Engineer**`)
- Company and dates must be italicized on the next line (`*Company – 2020–2022*`)
- Bullet points must begin with `-` and be clear, concise, and tech-specific
- One blank line between sections and bullet blocks
- Use single spacing
- Ensure the Markdown renders cleanly in HTML and converts cleanly to PDF

==============================
EXAMPLE (Loose Format)
==============================
**Jane Doe**  
janedoe@example.com | 555-123-4567 | linkedin.com/in/janedoe | github.com/janedoe42

### SUMMARY  
Software engineer with 3+ years in ML and backend systems. Delivered scalable APIs and reduced latency by 40% in production systems.

### EXPERIENCE  
**Software Engineer**  
*OpenAI, 2020–2023*  
- Built scalable ML pipelines using PyTorch and Airflow  
- Reduced model inference latency by 40% through model optimization  
- Led migration to Kubernetes, improving deployment uptime to 99.9%

**Junior Developer**  
*Startup Co, 2018–2020*  
- Developed backend services in Flask and integrated with PostgreSQL  
- Collaborated with frontend team to deliver MVP in 3 months

### PROJECTS  
**LLM Evaluation Toolkit**  
*Personal Project, 2022*  
- Built benchmarking tools for transformer models  
- Visualized accuracy and latency tradeoffs for 5 open-source LLMs  

### TECHNICAL SKILLS  
Languages: Python, JavaScript, C++  
Frameworks: PyTorch, Flask, React  
Cloud: AWS, GCP  
Tools: GitHub, Docker, Jenkins  
Databases: PostgreSQL, MongoDB  
OS: Linux, macOS

### EDUCATION  
B.S. in Computer Science, UC Berkeley, 2018  
GPA: 3.8/4.0  
Relevant Courses: Distributed Systems, Machine Learning, NLP

==============================
REALISM & AUTHENTICITY REQUIREMENTS
==============================

- Every bullet point should feel like it came from real work. No generic phrases like “collaborated with the team” or “improved performance by 25%” unless they are grounded in a specific technology, product, or context.

- Use real tools, libraries, APIs, and tech stacks in specific combinations. Example: “integrated Firebase authentication with a React Native frontend,” not just “built login system.”

- Vary sentence structure across bullets. Not every line needs to follow the “Action → Tool → Result” template. Mix in bullets like:
  - Maintained legacy JavaScript dashboard and resolved data sync issues with MongoDB
  - Created technical documentation for deployment scripts using Jenkins and Bash

- Company and project names should sound plausible. Avoid placeholders like “ABC Corp” or “Startup Inc.” Instead, use names like:
  - “HealthNet AI” (for healthcare)
  - “VantaLabs” (for a cloud startup)
  - “OpenSensor Project” (for OSS)

- If including percentages or metrics, keep them realistic and inconsistent — real resumes don't all use 30%, 40%, 50%. Sometimes just say “significantly improved,” or “reduced load times.”

- Include *at least one line* that suggests a real human wrote this: mention a hackathon, an open-source repo, a tech blog, or mentoring another dev.

- Leave room for imperfection. Not every section has to be perfectly symmetric. A real resume has slightly uneven section lengths, imperfect alignment, and honest variability.


==============================
CANDIDATE PROFILE
==============================
{person_profile}

==============================
STRUCTURED INPUT
==============================
{input_text}

==============================
STYLE + TONE GUIDELINES
==============================
{style}  
{tone}  
{order}
"""






# Function to generate resume text using an LLM based on the constructed description
def generate_resume(description):
    resume_text = get_response(
        input=description,          # Candidate prompt description
        template=resume_template,   # Prompt builder that includes randomness
        llm="llama",                # use your custom gemini connection
        md=False,                   # don't markdown the output
        temperature=1,              # Higher temperature for more variation
        max_tokens=1000,             # Max token limit for the resume output
    )
    return resume_text

In [306]:
# Convert a Markdown-formatted resume to a styled PDF and save it to disk
def save_markdown_as_pdf(markdown_text, output_path):
    html = markdown(markdown_text, extras=["fenced-code-blocks", "cuddled-lists"])

    styled_html = f"""
    <html>
    <head>
    <style>
        @page {{
            margin: 0.2in;
        }}
        body {{
            font-family: "Times New Roman", serif;
            font-size: 11pt;
            line-height: 1;
            color: #000;
        }}
        hr {{
            border: none;
            border-top: 1px solid #999;
            margin: 16px 0;
        }}
        p {{
            margin: 2px 0;
        }}
        ul {{
            margin: 0 0 0 1.2em;
            padding-left: 0;
            list-style-type: disc;
        }}
        li {{
            margin: 0;
            padding: 0;
            line-height: 1;
            margin-bottom: 2px;
            margin-top: 2px
        }}
        strong {{
            font-weight: bold;
        }}
        em {{
            font-style: italic;
        }}
    </style>
    </head>
    <body>
    {html}
    </body>
    </html>
    """


    HTML(string=styled_html).write_pdf(output_path)


In [307]:
def extract_name_email(resume_text):
    """
    Extracts the first name and email address from a resume text.
    Assumes the name appears at the top and email is a standard pattern.
    """
    # Use regular expression to find the first matching email in the text
    email_match = re.search(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", resume_text)
    email = email_match.group(0) if email_match else "unknown@example.com"

    return email


In [308]:
# Define your levels, roles, and specializations
seniority_levels = ["New-Graduate", "Entry-Level", "Mid-Level", "Senior", "Staff"]
specializations = ["Backend", "Frontend", "Cloud", "AI/ML", "Security", "Mobile Apps", "Big Data"]

# Define valid experience per seniority level
experience_ranges = {
    "New-Graduate": [0],
    "Entry-Level": [0, 1, 2],
    "Mid-Level": [3, 4, 5],
    "Senior": [5, 6, 7, 8],
    "Staff": [7, 8, 9, 10],
}
education_ranges = ["Bachelor's Degree", "Master's Degree", "PhD"]

# Specialization-based highlights
highlight_map = {
    "Backend": [
        "experience with large-scale distributed systems",
        "strong grasp of microservice architecture",
        "led backend service redesign for performance gains"
    ],
    "Frontend": [
        "specialist in UI/UX optimization",
        "expertise in cross-browser compatibility",
        "built accessible, WCAG-compliant applications"
    ],
    "Cloud": [
        "strong background in cloud computing",
        "led AWS migration initiatives",
        "developed infrastructure as code (IaC) using Terraform"
    ],
    "AI/ML": [
        "published research in machine learning",
        "deployed ML models at scale",
        "experience with deep learning frameworks"
    ],
    "Security": [
        "specialist in security protocols",
        "conducted vulnerability assessments",
        "implemented threat modeling practices"
    ],
    "Mobile Apps": [
        "developer of mobile applications",
        "launched top-rated app on iOS and Android",
        "focused on performance and battery optimization"
    ],
    "Big Data": [
        "designed big data pipelines",
        "optimized Spark jobs for large datasets",
        "experience with Hadoop and distributed file systems"
    ]
}

# Seniority-based highlights
seniority_highlights = {
    "New-Graduate": [
        "open-source contributor",
        "award-winning coder in hackathons",
        "completed multiple academic projects"
    ],
    "Entry-Level": [
        "participated in agile development teams",
        "worked on internship projects"
    ],
    "Mid-Level": [
        "heavy project-oriented experience",
        "mentored junior developers"
    ],
    "Senior": [
        "led engineering teams",
        "architected system-wide backend solutions"
    ],
    "Staff": [
        "drove cross-functional engineering initiatives",
        "set technical direction for multiple teams"
    ]
}

# Pick an appropriate highlight
def select_highlight(specialization, seniority):
    spec_h = highlight_map.get(specialization, [])
    level_h = seniority_highlights.get(seniority, [])
    combined = spec_h + level_h
    return random.choice(combined) if combined else "strong technical background"

# Create output directory
output_dir = "data/resumes/"
os.makedirs(output_dir, exist_ok=True)


num_resumes = 30
applications = []
# Generate resumes (you can change to how many you need later)
for i in range(1, num_resumes + 1):
    # Randomly simulate candidate attributes
    level = random.choice(seniority_levels)
    specialization = random.choice(specializations)
    years = random.choice(experience_ranges[level])
    highlight = select_highlight(specialization, level)
    education = random.choice(education_ranges)

    # Build a natural language description of the candidate
    description = (
        f"{level} Software Engineer specializing in {specialization}, "
        f"with {years} years of experience, with {education}, and known for {highlight}."
    )

    print(f"Generating resume {i} for {description}...")

    try:
        # Generate a UUID for this candidate
        candidate_id = str(uuid.uuid4())

        # Use LLM to generate resume text, then save it as a PDF
        resume_markdown = generate_resume(description)
        save_markdown_as_pdf(resume_markdown, f"{output_dir}/{candidate_id}.pdf")

        email = extract_name_email(resume_markdown)

        # Assemble candidate profile data
        profile = {
            'candidate_id': candidate_id,
            'resume_file': f"{candidate_id}.pdf",
            'email': email,
            "location": random.choice(["New York, NY", "San Francisco, CA", "Austin, TX", "Seattle, WA", "Chicago, IL"]),
            'education': education,
            'years_experience': years
        }

        # Combine profile with simulated application data
        application = {
            **profile,
            'job_id': job_id,
            'application_date': (datetime.today() - timedelta(days=random.randint(0, 60))).strftime('%Y-%m-%d'),
            "source": random.choice(["LinkedIn", "Referral", "Indeed", "Company Website", "Campus Recruiting"])
        }


        # Save to list of applications
        applications.append(application)

    except Exception as e:
        print(f"Failed to generate resume {i}: {e}")

print(f"\n✅ Finished generating {num_resumes} diverse, professionally formatted resumes!")

df_applications = pd.DataFrame(applications)
df_applications.to_csv('data/applications.csv', index=False)

Generating resume 1 for New-Graduate Software Engineer specializing in Cloud, with 0 years of experience, with Bachelor's Degree, and known for open-source contributor....
Generating resume 2 for Mid-Level Software Engineer specializing in Big Data, with 3 years of experience, with Master's Degree, and known for optimized Spark jobs for large datasets....
Generating resume 3 for Staff Software Engineer specializing in Security, with 9 years of experience, with Master's Degree, and known for implemented threat modeling practices....
Generating resume 4 for Senior Software Engineer specializing in Frontend, with 6 years of experience, with PhD, and known for architected system-wide backend solutions....
Generating resume 5 for New-Graduate Software Engineer specializing in Mobile Apps, with 0 years of experience, with Bachelor's Degree, and known for completed multiple academic projects....
Generating resume 6 for Mid-Level Software Engineer specializing in Backend, with 3 years of exper

## Resume Generation Using GenAI

In this section, we use a large language model (LLM) to generate diverse, professional-quality resumes tailored to simulated candidate profiles. The resume generation pipeline is designed to reflect real-world variation in tone, format, and content structure seen across applicants in a recruiting workflow. To show the tool's practicality, it is important to generate a very diverse set of resumes even if it is not directly aligned with the job description. This can capture the "real world" essence of job applicants.

### Prompt Design

To prompt the LLM, we dynamically construct a description of a candidate based on:
- **Seniority level** (e.g., New-Graduate, Senior, Staff)
- **Technical specialization** (e.g., Backend, Cloud, AI/ML)
- **Years of experience**
- **Education level**
- **A hand-crafted highlight** (mapped from specialization and seniority)

This description is passed into a templated prompt that randomly varies:
- Resume **style** (e.g., project-focused, minimalistic, startup-style)
- Resume **tone** (e.g., corporate, technical, dynamic)
- Section **ordering** (e.g., Education first, Skills first, etc.)

These stylistic variations are designed to reflect the wide variability in candidate resume formats seen in real hiring scenarios. Moreover, it is important that the LLM can read through different structures and extract the correct features when generating candidate profiles later down in the process.

### Implementation Flow

1. **Loop over N candidate profiles**, randomly assigning seniority, specialization, years of experience, and educational background.
2. Construct a descriptive input string from these values.
3. Use `generate_resume(description)` to invoke the LLM and return a Markdown-formatted resume.
4. Extract key fields like **email** and **name** from the resume text using regex and heuristics.
5. Assign a `candidate_id` and save the resume as a uniquely named `.pdf`.
6. Construct a corresponding application record, linking the resume to a `job_id`, a generated `application_date`, and an `application source`.
7. Store each application in a structured list for later export.

We also attempt to simulate "application" level data commonly seen in systems such as Workday. This is to collect more data on candidates that the recruitment analytics pipeline will require to create better insights. An LLM is not used to generate this data, just a mere random selection of possible categorical variables.

### Output

- **Markdown-to-PDF resumes** for each simulated candidate
- A list of structured application records saved to `applications.csv`
- Contextual metadata (e.g., candidate ID, resume file name, and job linkage)

This simulation pipeline enables the creation of a synthetic but realistic candidate funnel suitable for downstream GenAI evaluation, profile scoring, and job matching tasks.


# Team Member Generation

In [None]:
seniority_level = ['Junior', 'Senior', 'Staff', 'Lead']

num_years = {
    'Junior': [0, 1, 2],
    'Senior': [3, 4, 5],
    'Staff': [6, 7, 8],
    'Lead': [9, 10]
}

education_levels = ['Bachelors', 'Masters', 'Phd']

cloud_infra_skills = [
    "Go", "Python", "Terraform", "Kubernetes", "Docker", "AWS", 
    "GCP", "Linux Systems", "CI/CD", "Monitoring", "Distributed Systems"
]

cloud_projects = [
    "Provisioning Platform", "Deployment Orchestration", "Service Mesh",
    "Monitoring Pipeline", "Kubernetes Operator Framework", "Cost Optimization System"
]

non_cloud_specialties = [
    "Programming Languages & Paradigms",
    "Frontend & Web Development",
    "Mobile & Embedded Systems",
    "Data & Analytics",
    "AI / Machine Learning",
    "Security",
    "Software Process & Collaboration",
    "Other Specialties"
]

non_cloud_skills = {
    "Programming Languages & Paradigms": [
        "Java", "C++", "TypeScript", "Ruby", "Scala", "Rust",
        "Functional Programming", "Object-Oriented Design"
    ],
    "Frontend & Web Development": [
        "HTML", "CSS", "React", "Vue.js", "Angular",
        "Web Accessibility", "Responsive Design", "Web Performance Optimization"
    ],
    "Mobile & Embedded Systems": [
        "Swift", "Kotlin", "Flutter", "React Native", "Embedded C", "RTOS"
    ],
    "Data & Analytics": [
        "SQL", "PostgreSQL", "MySQL", "Pandas", "Apache Spark",
        "Data Warehousing", "PowerBI", "Tableau", "A/B Testing", "Data Modeling"
    ],
    "AI / Machine Learning": [
        "PyTorch", "TensorFlow", "HuggingFace Transformers",
        "NLP", "spaCy", "NLTK", "Model Evaluation", "Feature Engineering"
    ],
    "Security": [
        "TLS", "AES", "OWASP", "Penetration Testing",
        "Identity & Access Management", "Threat Modeling"
    ],
    "Software Process & Collaboration": [
        "Agile", "Scrum", "TDD", "Git", "GitHub Actions", "Jenkins",
        "Code Review", "Technical Documentation"
    ],
    "Other Specialties": [
        "Unity", "Unreal Engine", "3D Rendering",
        "Solidity", "Blockchain Development", "Game Development",
        "Technical Writing", "UX Design", "Human-Computer Interaction"
    ]
}

# Generate a realistic synthetic employee profile
def create_team_member():
    seniority = random.choice(seniority_level)
    years = random.choice(num_years[seniority])
    education = random.choice(education_levels)
    cloud_skills = random.sample(cloud_infra_skills, k=3)
    cloud_project = random.sample(cloud_projects, k=2)
    other_speciality = random.choice(non_cloud_specialties)
    other_skills = random.sample(non_cloud_skills[other_speciality], k=4)

    # Return a dictionary representing a structured team member profile
    return {
        'employee_id': str(uuid.uuid4()),
        'seniority_level': seniority,
        "years_experience": years,
        "education": education,
        "skills": cloud_skills + other_skills,
        "cloud_projects": cloud_project,
        "specialization": "Cloud Infrastructure &" + other_speciality
    }

In [None]:
# Step 1: Generate 10 structured team member profiles
team_profiles = [create_team_member() for _ in range(10)]
team_output_dir = "data/resumes_team/"
os.makedirs(team_output_dir, exist_ok=True)

# Step 2: Convert a profile into a natural-language LLM-ready description
def describe_team_member(profile):
    highlight = select_highlight(specialization, profile['seniority_level'])

    # Return a readable profile description string to feed to resume generation
    return (
        f"A current {profile['seniority_level']} Software Engineer in Cloud Infrastructure for Company XYZ with specializations in {profile['specialization']}."
        f"This person has {profile['years_experience']} years of experience, holding a {profile['education']}. "
        f"Skills include {', '.join(profile['skills'])}. "
        f"Projects in-house include: {', '.join(profile['cloud_projects'])}."
        f"This person is also known for {highlight}"
    )

# Step 3: Loop over team members and generate their resumes
counter = 1
for member in team_profiles:
    description = describe_team_member(member)
    try:
        resume_text = generate_resume(description)
        candidate_id = member['employee_id']
        pdf_path = f"{team_output_dir}/{candidate_id}.pdf"

        save_markdown_as_pdf(resume_text, pdf_path)
        print(f'Generating Resume {counter}')
        counter += 1

    except Exception as e:
        print(f"Failed to generate resume for {member['employee_id']}: {e}")


Generating Resume 1
Generating Resume 2
Generating Resume 3
Generating Resume 4
Generating Resume 5
Generating Resume 6
Generating Resume 7
Generating Resume 8
Generating Resume 9
Generating Resume 10


## Team Member Resume Generation

This section programmatically generates synthetic team member resumes to simulate realistic internal employee profiles. These profiles are used in downstream evaluation workflows for contextual candidate scoring.

### Purpose

- Populate the internal `employees` context pool for comparison against job applicants
- Generate Markdown-based resumes and convert them to PDF using realistic profile descriptions
- Provide structured input for LLM analysis (e.g., team summaries, team fit scoring)

### Implementation

1. `create_team_member()` generates 10 team member profiles with randomized:
   - Seniority level and years of experience
   - Education level
   - Core cloud infrastructure skills and projects
   - A secondary technical specialization and its associated skills

2. `describe_team_member(profile)` converts each structured profile into a rich, natural-language resume summary.

3. `generate_resume(description)` prompts an LLM to turn the profile description into a full Markdown resume.

4. `save_markdown_as_pdf()` renders each Markdown resume into a styled PDF and saves it to the `data/resumes_team/` folder.

5. All resumes are uniquely named using UUID-based employee IDs.

### Output

- 10 PDF resumes saved to `data/resumes_team/`
- Each file is used to simulate current team members in candidate evaluation pipelines
