## **Cover Letter Generator**

This jupyter notebook contains code to help you generate custom-made cover letters for the job you're applying to.  I've personally found cover letters to be an effective way to get you at least through the first screen. If you find yourself feeling lazy, this is just the tool for you.  Feel free to customize the prompts as needed.   

An LLM matches required skills to your job skills and finds the appropriate experiences to highlight.  

**Technical architecture** 
- LLM -- I'm using "llama3-70b" through GroqCloud for both the cover letter writing as well as in extracting the skills and related experience from your resume.
- ChromaDB -- This is where we store your skills/related experiences as vectors to be extracted during the cover letter writing
- Selenium -- We'll need a webscraper to specifically scrape for job-specific information on Linkedin for example

**What you'll need**
- Job opening URL -- I've tested this on linkedin and Selenium seems to effectively avoid Linkedin blockers.
- Your resume -- We'll use the LLM to turn your skills and related experiences into a dataframe which we'll then store in a vectorDB. Save it in same folder as the jupyter notebook.  
- Your secret API key to your LLM of choice.  I'm using one of the Llama models through GroqCloud which can provide you with an API.
- Optional: Work portfolio -- Bonus points if you have for example a github profile where you have a number of repos.  This notebook is customizable so you can have the LLM extract from your portfolio vs skills/experience from your resume

### 0. Import libraries and dependencies

In [None]:
import os
import chromadb
import time
import docx
import pandas as pd
import json
import uuid


from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

os.environ["TOKENIZERS_PARALLELISM"] = "false"

#### Insert your Job-URL here

In [245]:
# feel free to replace
job_url = "https://www.linkedin.com/jobs/view/4211854415/?alternateChannel=search&refId=DTQWg1nCcUJcgSpKXjwE%2Bg%3D%3D&trackingId=wb2Y3vvXFRD72Kiqdghr3g%3D%3D"
SECRET_API_KEY= "YOUR LLM API KEY HERE"

### 1. Technical Architecture

#### LLM: Llama3-70B
By now, just about everyone should know about LLMs.  Here I decided to use Llama3-70B, mostly because I wanted to try GroqCloud which has many other LLMs as options. 

In [247]:
llm = ChatGroq(
    model = "llama3-70b-8192",
    temperature = .05,  # from 0-1, from more precise to more 'creative'
    max_tokens = 1000,
    timeout=None,
    max_retries=2,
    api_key = SECRET_API_KEY  # create account on Groqlcloud or other through other LLM service providers
)
    
response = llm.invoke("Who was the first man on the moon?")
print(response.content)   

The first man to set foot on the moon was Neil Armstrong. He stepped out of the lunar module Eagle and onto the moon's surface on July 20, 1969, during the Apollo 11 mission.

Armstrong famously declared, "That's one small step for man, one giant leap for mankind," as he became the first human to set foot on another celestial body.

He was followed by fellow astronaut Edwin "Buzz" Aldrin, who also walked on the moon during the mission. Michael Collins remained in orbit around the moon in the command module Columbia.

Armstrong's historic moonwalk lasted about two and a half hours, during which he collected samples and conducted experiments. The Apollo 11 mission was a groundbreaking achievement that marked a major milestone in space exploration.


#### VactorDB: ChromaDB

Llama, care to describe this for us?

In [250]:
response = llm.invoke("what is a vectordb like chromadb?  Keep it short")
print(response.content)   

VectorDBs like ChromaDB are databases optimized for storing and querying dense vector data, such as those used in AI/ML models. They enable fast similarity searches, clustering, and nearest neighbor lookups.

Key features:

* Store and manage large collections of dense vectors
* Support efficient similarity searches (e.g., cosine similarity, Euclidean distance)
* Enable fast querying and filtering of vectors
* Scalable and optimized for performance

ChromaDB, in particular, is a vector database that allows for real-time vector search, clustering, and filtering, making it suitable for applications like computer vision, natural language processing, and recommender systems.


----
To briefly illustrate how this works, the code block below sets up a simple search system using ChromaDB. First, it creates a ChromaDB client and makes a new collection. Then it adds two short dummy documents about Manila and Hongkong to the collection, giving each one an ID. Behind the scenes, ChromaDB turns these documents into embeddings(i.e. numerical representation of the text) that capture their meaning. When you search with the query "Query is about Philippines," ChromaDB compares this query's pattern to the stored documents' patterns and returns the closest matches. Since Manila is in the Philippines, the system should rank the Manila document higher in the results.  And it does -- the 'distances' key shows the 'Manila document' has a 'distance' of 0.52 to the query vs 'HK's 1.07.

In [254]:
client=chromadb.PersistentClient("chromadb_example")
collection =client.create_collection(name = 'my_collection')
collection.add(
    documents = [
        "This document is about Manila",
        "This document is about Hongkong"
    ],
    ids = ['id1', 'id2']
)

results= collection.query(
    query_texts=['Query is about Philippines'],
    n_results=2
)
results

{'ids': [['id1', 'id2']],
 'embeddings': None,
 'documents': [['This document is about Manila',
   'This document is about Hongkong']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[None, None]],
 'distances': [[0.529497504234314, 1.0765970945358276]]}

#### Webscraper: Selenium

Here we'll use Selenium to scrape the job-URL. For this one i needed help from AI to help create a function to look for and scrape just the "About the Job" section of Linkedin's job pages.  Selenium is more flexible this way  vs others like WebBasedLoader.  For other purposes, like finding more mundane information, you wont need something like this.  

In [224]:
# Define chrome_options before using it
chrome_options = webdriver.ChromeOptions()
# Install and setup ChromeDriver automatically
service = Service(ChromeDriverManager().install())

# Use the service when creating the driver
driver = webdriver.Chrome(service=service, options=chrome_options)

In [340]:
def scrape_linkedin_job(job_url):
    # Set up Chrome options
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Run in headless mode
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--window-size=1920,1080")
    chrome_options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
    
    # Initialize the driver
    driver = webdriver.Chrome(options=chrome_options)
    
    try:
        # Navigate to the job URL
        driver.get(job_url)
        
        # Wait for the page to load
        time.sleep(5)
        
        # Wait for the job description to be visible
        wait = WebDriverWait(driver, 10)
        job_description = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".show-more-less-html__markup")))
        
        # Get the HTML content
        html_content = driver.page_source
        
        # Parse with BeautifulSoup
        soup = BeautifulSoup(html_content, 'html.parser')

        # Extract job title - LinkedIn typically uses h1 with class 'top-card-layout__title'
        job_title_element = soup.select_one("h1.top-card-layout__title")
        job_title = job_title_element.get_text(strip=True) if job_title_element else "Job title not found"
        
        # Find the job description section
        # LinkedIn structure may change, so you might need to adjust these selectors
        job_description = soup.select_one(".show-more-less-html__markup")
        
        if job_description:
            # Extract text from the job description
            job_text = job_description.get_text(strip=True)
            return {"title": job_title, "description": job_text}
        else:
            # Try alternative selectors if the first one fails
            job_description = soup.select_one(".description__text")
            if job_description:
                job_text = job_description.get_text(strip=True)
                return {"title": job_title, "description": job_text}
            else:
                return {"title": job_title, "description": "Could not find job description section."}
    
    
    except Exception as e:
        return f"Error: {str(e)}"
    
    finally:
        # Close the browser
        driver.quit()



In [342]:
# Usage
job_description = scrape_linkedin_job(job_url)
print(job_description)

{'title': 'Product Strategy and Operations Lead', 'description': "Note: By applying to this position you will have an opportunity to share your preferred working location from the following:San Francisco, CA, USA; Seattle, WA, USA; Sunnyvale, CA, USA.Minimum qualifications:Bachelor's degree or equivalent practical experience.8 years of experience in management consulting, product management and strategy, or analytics in a technology company.3 years of experience in Financial Modeling and Analysis and 3 years of Cloud experience.Experience working with and analyzing data, and managing multiple projects.Preferred qualifications:Advanced degree.Experience working with product and engineering teams.Familiarity with cloud infrastructure or SaaS commercial programs and business models.Ability to be entrepreneurial to source data and setup internal mechanisms to store and track.Excellent communication skills.About The JobProduct and Business Strategy Leaders bring together teams across Google

---
Mission accomplished.  We now have the job description in unstructured format.  Onto the next stage...

### 2. Extract key job data

We're going to create a prompt to do this which in turn we'll need to set up the prompt for the creation of the cover letter.

The steps:
1. Create prompt, add to llm to form a chain (in Langchain, a chain is a way to connect different parts to do tasks)
2. Invoke chain with inputs -- in this case, the job description
3. Convert output from 2 into a readable format (JSON) so we can extract the necessary data for later use.  

#### Prompt, Chain, Invoke, Convert

In [344]:
prompt_extract = PromptTemplate.from_template(
    """
    ### SCRAPE TEXT FROM WEBSITE
    {job description}
    ### INSTRUCTIONS:
    1. The scraped text is from a web page detailing a job opportunity.
    2. Your job is to extract the job posting and return in JSON format containing the
    following keys: `company`,`role`, `experience`, `required skills`,`description`, `expected payrange`.
    3. Do not create new keys
    4. Only return the valid JSON. NO PREAMBLE

    """
)

chain_extract = prompt_extract | llm

res_extract = chain_extract.invoke(input= {'job description':job_description})
#print(res_extract.content)

# Turns out the output is intype string so we still need to convert to JSON
json_extract = JsonOutputParser()
json_res = json_extract.parse(res_extract.content)
json_res

{'company': 'Google',
 'role': 'Product Strategy and Operations Lead',
 'experience': '8 years of experience in management consulting, product management and strategy, or analytics in a technology company. 3 years of experience in Financial Modeling and Analysis and 3 years of Cloud experience.',
 'required skills': ["Bachelor's degree or equivalent practical experience",
  'Experience working with and analyzing data, and managing multiple projects'],
 'description': 'Product and Business Strategy Leaders bring together teams across Google’s functions to help products execute optimally. Our team pushes Google to scale at key points that refine our products and infrastructure by executing efficiently, bringing solid business sense and sound judgment, and working effectively across organizational lines.',
 'expected payrange': '$144,000-$211,000 + bonus + equity + benefits'}

#### Extract data

~~Observation: Still not perfect.  since i think the official role is "Product Strategy and Operations Lead".  Selenium function requires some fine-tuning.~~ Fixed it. 

In [347]:
company_name = json_res['company']
job_title = json_res['role']
job_skills = json_res['required skills']

print(f"Company:{company_name}\nJob:{job_title}\nSkills needed:{job_skills}")

Company:Google
Job:Product Strategy and Operations Lead
Skills needed:["Bachelor's degree or equivalent practical experience", 'Experience working with and analyzing data, and managing multiple projects']


### 3. Scrape relevant skills and experiences from your resume

This step mirrors the previous one.  Instead of pulling data from a website, we'll pull data from the cv on file.  From that standpoint, it's a lot more straightforward -- no need for a tool like Selenium.  Just create a prompt that intructs the LLM to pull key information and output in a data frame.  We'll use the data frame later to save onto ChromaDB.   

#### Load and read resume

In [227]:
# create user-defined function that will read the resume in word format
def read_docx(file_path):
    # Open the document
    doc = docx.Document(file_path)
    
    # Extract text from paragraphs
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    
    # Join all paragraphs with newlines
    return '\n'.join(full_text)

# Example usage
file_path = 'raymondzialcita_resume_jan25.docx'
cv = read_docx(file_path)
print(cv)

RAYMOND ZIALCITA
San Jose, CA 95118
408-425-4297 | raymondbzialcita@gmail.com

DIRECTOR OF STRATEGY AND MARKET INTELLIGENCE
Corporate Strategy| Market Research and Intelligence | Competitive Analysis | Data Analysis
An accomplished and performance-driven senior executive with a long history of developing business strategies through market and competitive intelligence research and analysis. Expertise lies leading market research projects, consulting teams find new areas of growth, and rounding out product and business strategy setting with go-to-market and tactical execution. Passionate about leveraging data-driven insights to help businesses thrive and make a lasting impact on the market.  Proficient in Data Science tools like Excel, Python, R, PowerBI and SQL.
KEY SKILLS
Strategic and Analytical Thinker | Seasoned | Confident Presenter | Primary Research Interviewer | Proficient In AI/ML/Data Science Tools Like Azure ML Studio, Data Robot, R And Python Programming Languages and PowerB

#### Prompt, Chain, Invoke, Convert 

In [156]:
# Better prompt that explicitly asks for JSON output
prompt_cv = PromptTemplate.from_template("""
    ### CV TEXT:
    {cv}
    
    ### INSTRUCTIONS:
    Extract skills and relevant work experiences from this CV.
    
    For each skill, provide a concise description.
    For experiences, combine all relevant experience into a single descriptive paragraph for each skill.
    
    Return the data as a JSON object with:
    1. "skills": A list of skill names
    2. "experiences": A list of corresponding experience paragraphs that match each skill
    
    Make sure both lists have the same length and indexes correspond (skill[0] matches with experience[0], etc.)
    
    Format your response as a valid JSON object only, with no additional text.
    """
)


# Create and run the chain
chain_cv = prompt_cv | llm

# Process the CV
def analyze_cv(cv_text):
    # Get the LLM response
    response = chain_cv.invoke(input={'cv': cv_text})
    
    try:
        # Convert the LLM's JSON response to a Python dictionary
        data = json.loads(response.content)
        
        # Create DataFrame from the extracted data
        df = pd.DataFrame({
            'skills': data['skills'],
            'relevant_experience': data['experiences']
        })
        
        return df
    
    except json.JSONDecodeError:
        # If the LLM didn't return valid JSON, print the response
        print("LLM did not return valid JSON. Raw response:")
        print(response.content)
        
        # Create a simple fallback DataFrame
        return pd.DataFrame({
            'skills': ["Parsing Error"],
            'relevant_experience': ["Could not parse LLM response into valid JSON."]
        })

# Example usage
result_df = analyze_cv(cv)
print(result_df)

                              skills  \
0   Strategic and Analytical Thinker   
1                Confident Presenter   
2       Primary Research Interviewer   
3                      Data Analysis   
4           AI/ML/Data Science Tools   
5               Competitive Analysis   
6                    Market Research   
7                  Business Strategy   
8                     Data Valuation   
9                       Market Watch   
10              Data-Driven Insights   
11                        Leadership   

                                  relevant_experience  
0   Developed business strategies through market a...  
1   Presented insights to senior management, inclu...  
2   Conducted end-to-end research from interviews ...  
3   Proficient in data analysis using tools like E...  
4   Proficient in AI/ML/Data Science tools like Az...  
5   Conducted comprehensive strategic analysis of ...  
6   Led market and competitive intelligence resear...  
7   Developed business strategi

In [None]:
# We'll need as input for the prompt for the cover letter
candidate_skills = result_df['skills'].values.tolist()

### 4. Load candidate skills and relevant experience onto ChromaDB


In [321]:
# Deleting to clean out from previous runs
#client.delete_collection(name="vectorstore")

# Initialize
client = chromadb.PersistentClient('vectorstore')
collection = client.get_or_create_collection(name='my_cv')

if not collection.count():
    for _, row in result_df.iterrows():
        collection.add(documents=row['skills'],
                       metadatas={'experiences': row['relevant_experience']},
                                   ids=[str(uuid.uuid4())])

Verify if data has been loaded

In [323]:
collection.get()

{'ids': ['bb31a3bd-4938-4579-8040-559e1114e741',
  'a58f663f-93f5-4f4e-9512-9806739990a3',
  'cee29569-6513-4b31-9527-4bc2e6ba2d7a',
  'a850e944-6d06-4edd-8ad0-852b86e08e83',
  '7558e77e-71f2-46c1-b247-5a1526449853',
  '0de72de1-d388-4322-a551-d145e0689889',
  'b79fcd95-d100-485a-afad-90fe5ee8370b',
  '7382899f-5763-4d9d-977f-b2e08e16a86a',
  'b7a89ab5-4092-4e33-bccc-d87841f668a8',
  '142ebed0-69dc-4451-84ed-e8aed7d63e25',
  '644998a3-1043-4515-9d6b-2a8994ddf2cf',
  'b0aa07a6-f120-4d3f-922b-6ce2d6958485'],
 'embeddings': None,
 'documents': ['Strategic and Analytical Thinker',
  'Confident Presenter',
  'Primary Research Interviewer',
  'Data Analysis',
  'AI/ML/Data Science Tools',
  'Competitive Analysis',
  'Market Research',
  'Business Strategy',
  'Data Valuation',
  'Market Watch',
  'Data-Driven Insights',
  'Leadership'],
 'uris': None,
 'included': ['metadatas', 'documents'],
 'data': None,
 'metadatas': [{'experiences': "Developed business strategies through market and compe

Test to see if queries are working

In [328]:
exp = collection.query(query_texts=["Experience in strategy setting"], n_results= 1).get('metadatas',[])
exp

[[{'experiences': 'Developed business strategies through market and competitive intelligence research and analysis, leveraging data-driven insights to help businesses thrive and make a lasting impact on the market. Led a team of analysts to solve specific division business problems, including reducing desktop forecast errors and desktop/server transfer costs through analytics.'}]]

### 5. Generate custom-made cover letter

You know the drill.  Prompt, add to chain, invoke.

In [353]:
prompt_cover = PromptTemplate.from_template(
    """
    
    You are an expert career coach with extensive experience writing compelling cover letters.
    
    Create a tailored cover letter for a {job_title} position at {company_name}.
    
    JOB DESCRIPTION KEY SKILLS:
    {job_skills}
    
    CANDIDATE'S RELEVANT SKILLS AND EXPERIENCE:
    {candidate_skills}
    
    
    Write a personalized cover letter that:
    1. Opens with a compelling introduction showing enthusiasm for this specific role
    2. Highlights 2-3 clear connections between the job requirements and the candidate's experience
    3. Uses concrete examples and achievements with measurable results when possible
    4. Demonstrates knowledge of the company's recent achievements or values
    5. Closes with a confident call to action
    6. Maintains a length of 3-4 paragraphs (under one page)
    7. Uses a professional but conversational tone
    8. NO PREAMBLE, just output the cover letter
    9. No need for the block containing "Hiring manager, address, etc"
    10.  Instead of an example data, create a placeholder that says "[INSERT TODAY's DATE]"
    
    FORMAT THE LETTER APPROPRIATELY WITH DATE, GREETING, AND SIGNATURE.
        
    """
)

# Create and run the chain
chain_cover = prompt_cover | llm

# Invoke with inputs
result_cover = chain_cover.invoke(
    input = {"company_name": company_name,
             "job_title": job_title,
             'job_skills': job_skills,
             'candidate_skills': candidate_skills
            }
)  

In [355]:
print(result_cover.content)

[INSERT TODAY's DATE]

Dear Hiring Team,

I'm thrilled to apply for the Product Strategy and Operations Lead role at Google, where I can leverage my analytical mindset, strategic thinking, and leadership skills to drive business growth and innovation. As a long-time admirer of Google's commitment to using technology to solve complex problems, I'm excited about the opportunity to contribute to the company's mission to organize the world's information and make it universally accessible and useful.

With my background in data analysis, market research, and business strategy, I'm confident in my ability to excel in this role. My experience as a Primary Research Interviewer has equipped me with the skills to gather and analyze data, identify patterns, and develop actionable insights. In my previous role, I successfully applied data-driven insights to inform business strategy, resulting in a 25% increase in revenue growth. I'm excited to bring this expertise to Google and drive similar resul

### You probabaly still have to wordsmith.  I've used it a few times and and it certainly speeds things up for me.  As for whether it actually helps you land the job, that isn't the purpose of this exercise really.  Good luck!  