In [14]:
%pip install pdfplumber pytesseract pdf2image

Note: you may need to restart the kernel to use updated packages.


In [15]:
import pdfplumber
import pytesseract
from pdf2image import convert_from_path

In [16]:
def extract_text_from_pdf(pdf_path):
    text =""
    try:
        # Try direct text extraction 
        with pdfplumber.open(pdf_path) as pdf:
            for page in pdf.pages:
                page_text = page.extract_text()
                if page_text:
                    text += page_text
        if text.strip():
            return text.strip()
    except Exception as e:
        print(f"Direct text extraction fail : {e} ")
    # fall back to OCR  for image-based PDFs
    print ("Failing back to OCR for image-based PDF")
    try:
        images = convert_from_path(pdf_path)
        for image in images:
            page_text = pytesseract.image_to_strict(image)
            text += page_text + "\n"
    except Exception as e:
        print(f'OCR failed: {e}')
    return text.strip()
    
        

In [17]:
pdf_path = r"C:\Users\User\OneDrive\Desktop\HOANGKY NGUYEN\Job\DATA\Kate Nguyen - Resume.pdf"
resume_text = extract_text_from_pdf(pdf_path)
print("\nExtracted Text from PDF:")
print(resume_text)


Extracted Text from PDF:
KATE NGUYEN
Email: nghoangky44@gmail.com | LinkedIn: linkedin.com/in/ky-kate-nguyen | Phone: (214)-771-5168
GitHub: github.com/hoangky0403
Education
University of North Texas, Denton, Texas
M.S. in Data Science (Expected Summer 2027)
Coursework: Knowledge Management Tools and Technologies, Principles and Techniques for Data Science,
Fundamentals of Data Science
Florida International University, Miami, Florida
B.B.A in International Business and Finance, GPA: 3.86/4.0 (May 2019)
Coursework: Intermediate Financial Management, Financial Systems, Business Process Management
Professional Experience
Keyrus - Business Consultant (Jan 2022 – May 2025)
 Led 1 full-cycle Anaplan implementation for enterprise clients and supported 3 additional projects.
 Conducted quantitative analysis and built dashboards, improving forecasting efficiency and reducing
manual effort by 40%.
 Designed and optimized financial data models to support corporate strategy and CAPEX initiativ

Set Google GenerativeAI Api Key


In [18]:
%pip install google.generativeai python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [19]:
import google.generativeai as genai
import os
from dotenv import load_dotenv


load_dotenv()


genai.configure(api_key = os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-1.5-flash")

In [20]:
response = model.generate_content("What is the capital of USA?")


In [21]:
print(response)

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "The capital of the USA is **Washington, D.C.**\n"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "avg_logprobs": -0.005602954753807613
        }
      ],
      "usage_metadata": {
        "prompt_token_count": 7,
        "candidates_token_count": 14,
        "total_token_count": 21
      },
      "model_version": "gemini-1.5-flash"
    }),
)


In [22]:
print(response.text)

The capital of the USA is **Washington, D.C.**



Resume Analysis


In [25]:
def analyze_resume(resume_text, job_desciption=None):
    if not resume_text:
        return {"error": "Resume text is required for analysis"}
    model = genai.GenerativeModel("gemini-1.5-flash")

    base_prompt = f"""
    You are an experienced HR headhunter with technical experience in the field of the following job roles: 
    Data Sciencist, Data Analyst, Machine Learning Engineer, Data Engineer, Bussiness Analyst, your task is 
    to review the provided resume. Please share proffessional feedback on whether the candidate profile matches
    with the role. In addition, mention the skills the candidate alreayd have and mention some skills that need
    improvement, also suggest some courses/projects the candidate can do to improve their profile.

    Resume:
    {resume_text}
"""
    if job_desciption:
        base_prompt += f"""
        In addition, compare this resume to the following description:
        Job Description:
        {job_desciption}
        """
    response = model.generate_content(base_prompt)

    analysis = response.text.strip()
    return analysis 
                                      

In [26]:
print(analyze_resume(resume_text))

## Resume Review for Kate Nguyen

Kate Nguyen's resume presents a promising profile, particularly for roles bridging business analysis and data science. However,  her experience leans more towards Business Analyst/Data Analyst roles than  Data Scientist or Machine Learning Engineer roles.  Let's break down the strengths and weaknesses:


**Strengths:**

* **Strong Business Acumen:** Kate demonstrates a solid understanding of business processes, financial modeling, and strategic planning through her experience at Keyrus and Publicis Groupe.  The Anaplan implementation experience is a significant asset, highlighting project management skills and experience with enterprise-level data systems.  Her quantifiable achievements (e.g., 40% efficiency improvement) are compelling.
* **Relevant Technical Skills:** She possesses a good foundation in Python (pandas, scikit-learn, NumPy), SQL, data modeling, and data visualization tools (Tableau, Power BI, Streamlit).  Her projects showcase practical