### This notebook is used to dive building synthetic data for the resume editor

I scraped ~400 example resumes however they don't have metrics like a real resume should have. So I'm going feed the bullet into a LLM and ask it to add metrics to the bullet, that paired with some general context like company name, job title, and years of experience will be able to provide a good bullet and come up with a story to support that bullet.

In [17]:
import streamlit as st
import openai
from langchain.document_loaders import PyPDFLoader
from dotenv import load_dotenv
import os
import openai
from PyPDF2 import PdfFileReader
from pypdf import PdfReader
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
import json

# Load environment variables containing API keys
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ["LANGCHAIN_API_KEY"] = str(os.getenv("LANGCHAIN_API_KEY"))
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "elevated_ambitions"

def analyze_resume(full_text, model='gpt-3.5-turbo-1106'):
    """
    Analyze a resume text and extract structured information using a specified language model.

    Parameters:
    full_text (str): The text content of the resume.
    model (str): The language model to use for processing the text.

    Returns:
    dict: A dictionary containing structured information extracted from the resume.
    """
    # Load the prompt template and response template for resume analysis
    with open("../prompts/resume_extraction.prompt", "r") as f:
        template = f.read()
    with open("../templates/scale_profile_template.json", "r") as f:
        resume_template = f.read()

    # Format the input for the language model
    prompt_template = PromptTemplate(template=template, input_variables=['resume', 'response_template'])
    formatted_input = prompt_template.format(resume=full_text, response_template=resume_template)

    # Invoke the language model and process the resume
    chat_llm = ChatOpenAI(model=model)
    analysis_output = chat_llm.invoke(formatted_input)

    return json.loads(analysis_output.content)

def upgrade_experience_bullet(user_experience, bullet, model='gpt-3.5-turbo-1106'):
    """
    Enhance a bullet point in a user's experience section using a language model.

    Parameters:
    user_experience (dict): A dictionary containing details of a user's experience.
    bullet (str): The bullet point to be enhanced.
    model (str): The language model to use for enhancement.

    Returns:
    str: The enhanced bullet point.
    """
    # Load the bullet enhancement template
    with open("../prompts/synthetic_bullet_builder.prompt", "r") as f:
        template = f.read()

    # Format the input for the language model
    prompt_template = PromptTemplate(template=template, input_variables=['user_summary', 'bullet_point'])
    formatted_input = prompt_template.format(user_summary=user_experience, bullet_point=bullet)

    # Invoke the language model and enhance the bullet point
    chat_llm = ChatOpenAI(model=model)
    analysis_output = chat_llm.invoke(formatted_input)

    return analysis_output.content

def upgrade_resume_bullets(extracted_resume):
    """
    Iterate through the work experience in a resume and upgrade each bullet point.

    Parameters:
    extracted_resume (dict): A dictionary containing a structured resume.

    Returns:
    dict: The resume dictionary with enhanced bullet points in the work experience section.
    """
    # Enhance bullet points for each work experience entry
    for experience in extracted_resume['work_experience']:
        experience_desc = ' '.join([experience[field] for field in ['company', 'title', 'duration', 'description']])
        # skip experience if there is no achievements
        if not experience['achievements']:
            continue
        else:
            for i, bullet in enumerate(experience['achievements']):
                # skip bullet if it is empty
                if bullet == '':
                    continue
                else:
                    experience['achievements'][i] = upgrade_experience_bullet(experience_desc, bullet)

    return extracted_resume

def generate_questions(user_profile, model='gpt-3.5-turbo-1106'):
    """
    Generate interview questions based on a user's profile using a language model.

    Parameters:
    user_profile (dict): A dictionary containing the user's profile information.
    model (str): The language model to use for question generation.

    Returns:
    dict: A dictionary containing generated interview questions.
    """
    # Load the question generation template and response template
    with open("../prompts/question_generation.prompt", "r") as f:
        question_template = f.read()
    with open("../templates/profile_interview_template.json", "r") as f:
        profile_template = f.read()

    # Format the input for the language model
    prompt_template = PromptTemplate(template=question_template, input_variables=['user_profile', 'response_template'])
    formatted_input = prompt_template.format(user_profile=user_profile, response_template=profile_template)

    # Invoke the language model and generate questions
    chat_llm = ChatOpenAI(model=model)
    response = chat_llm.invoke(formatted_input)

    return json.loads(response.content)

def generate_synthetic_responses(user_profile, questions, model='gpt-3.5-turbo-1106'):
    """
    Generate synthetic interview responses for a set of questions based on a user's profile.

    Parameters:
    user_profile (dict): A dictionary containing the user's profile information.
    questions (dict): A dictionary containing interview questions.
    model (str): The language model to use for generating responses.

    Returns:
    dict: A dictionary containing synthetic responses to the interview questions.
    """
    # Load the interview response generation template
    with open("../prompts/synthetic_interview_responses.prompt", "r") as f:
        resume_template = f.read()
    with open("../templates/profile_interview_template.json", "r") as f:
        interview_template = f.read()

    # Format the input for the language model
    prompt_template = PromptTemplate(template=resume_template, input_variables=['user_profile', 'questions', 'template'])
    formatted_input = prompt_template.format(user_profile=user_profile, questions=questions, template=interview_template)

    # Invoke the language model and generate responses
    chat_llm = ChatOpenAI(model=model)
    response = chat_llm.invoke(formatted_input)

    return json.loads(response.content)

In [2]:
import pickle

with open('resume_texts.pkl', 'rb') as f:
    resume_texts = pickle.load(f)



In [3]:
# take a random sample of 10 resumes
import random
sample_resumes = random.sample([resume for resume in resume_texts if resume != ''], 3)

In [4]:
sample_resumes

['Drew CallaghanProfessional Event Coordinator936-536-8346Callaghandrew@hotmail.comLinkedin.com/in/drewcallaghan\xa0Professional Summary\xa0Meticulous event coordinator with 5 years of event management experience in the hospitality industry. Maintained a 5-star record across 2.5 years of working at Grand Plaza Hotel as Chief Event Supervisor. Established robust vendor and sponsorship deals, reducing event organization costs by 15%. Looking to join IndigoChic to expand on my existing skillset and further elevate the company’s already renowned reputation.\xa0Work Experience\xa0Chief Event SupervisorGrand Plaza Hotel, Dallas, TXMarch 2013–October 2016Ensured top quality service for over 400 events hosted by the hotel.Assisted organizers in picking the most appropriate venue, food menu, and décor, depending on number of attendees (25–600).Trained a team of other employees in proper hospitality etiquette.Assisted in preparing marketing and promotional material to generate more interest.Deve

In [18]:
# now that we have our samples lets upgrade each of them
upgraded_profiles = []
for resume in sample_resumes:
    upgraded_profiles.append(upgrade_resume_bullets(analyze_resume(resume)))

Now that we have the upgraded resumes, we can generate questions for each area of the profile. 

We now need to set up a way to generate answers for the questions. We can do this by using the LLM to generate answers for each que.

In [20]:
questions = []
for resume in upgraded_profiles:
    questions.append(generate_questions(resume))


In [24]:
output = generate_synthetic_responses(upgraded_profiles[0], questions[0], model='gpt-4')
# print(f'Profile: {upgraded_profiles[0]} \n Questions: {questions[0]}\n Output: {output}')

{
  "education": [
    {
      "interview_responses": {
        "question_1": "During your Bachelor's in Management at the University of Richmond, what were some key projects or coursework that had a significant impact on your understanding of event coordination and management?",
        "response_1": "BLUF: My studies in Management, specifically courses in Project Management and Organizational Behavior, provided a strong foundation for my career in event coordination. Situation: While pursuing my Bachelor's in Management, I had the opportunity to work on several projects that allowed me to apply theoretical knowledge in a practical setting. Task: In my Project Management course, I was assigned to organize a fundraising event for a local charity. Action: I was responsible for budgeting, vendor management, and ensuring a smooth execution of the event. Result: This experience honed my problem-solving skills, attention to detail, and ability to work under pressure, all of which have been 