In [1]:
%pip install openai pdfplumber python-docx pillow

from openai import OpenAI

Collecting pdfplumber
  Downloading pdfplumber-0.11.7-py3-none-any.whl.metadata (42 kB)
Collecting python-docx
  Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)
Collecting pillow
  Downloading pillow-12.0.0-cp313-cp313-win_amd64.whl.metadata (9.0 kB)
Collecting pdfminer.six==20250506 (from pdfplumber)
  Downloading pdfminer_six-20250506-py3-none-any.whl.metadata (4.2 kB)
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-5.0.0-py3-none-win_amd64.whl.metadata (67 kB)
Collecting cryptography>=36.0.0 (from pdfminer.six==20250506->pdfplumber)
  Downloading cryptography-46.0.3-cp311-abi3-win_amd64.whl.metadata (5.7 kB)
Collecting lxml>=3.1.0 (from python-docx)
  Downloading lxml-6.0.2-cp313-cp313-win_amd64.whl.metadata (3.7 kB)
Downloading pdfplumber-0.11.7-py3-none-any.whl (60 kB)
Downloading pdfminer_six-20250506-py3-none-any.whl (5.6 MB)
   ---------------------------------------- 0.0/5.6 MB ? eta -:--:--
   ----------- ---------------------------- 1.


[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pdfplumber, docx, base64

client = OpenAI()

Extract Information from a PDF
Reads text with pdfplumber and sends to GPT-5-nano for summarization.

In [3]:
pdf_path = "data/California_Employment_Offer_Letter.pdf"  

def extract_text_from_pdf(path):
    text = ""
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            t = page.extract_text()
            if t:
                text += t + "\n"
    return text.strip()

pdf_text = extract_text_from_pdf(pdf_path)
pdf_text

'EMPLOYMENT OFFER LETTER (CALIFORNIA)\nAcme Corporation\n1234 Market Street\nSan Francisco, CA 94105\nApril 1, 2023\nPrivate & Confidential\nJane Doe\n567 Oak Avenue\nSan Jose, CA 95112\nDear Jane,\nWe are delighted to extend to you an offer of employment with Acme Corporation (“Company”).\nThis letter sets forth the principal terms and conditions of your employment.\n1. Position and Reporting\nYou will join Acme Corporation as a Senior Software Engineer. Your duties will include, but not be\nlimited to: designing, coding, testing, and maintaining enterprise applications; collaborating with\ncross-functional teams; mentoring junior engineers; and ensuring compliance with internal security\nand data privacy standards. You will report directly to John Smith, Chief Technology Officer. This\nis a full-time, exempt position under California wage and hour law. Your primary work location will\nbe San Francisco, CA, but you may be required to travel up to 10% of your time.\n2. Start Date and D

In [4]:
prompt = f"Extract and summarize key details from this PDF:\n\n{pdf_text[:8000]}"

response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "system", "content": "You summarize and extract data from PDF text."},
        {"role": "user", "content": prompt}
    ]
)

print("Extracted Info from PDF:\n")
print(response.choices[0].message.content)

Extracted Info from PDF:

Here are the key details from the California Employment Offer Letter for Jane Doe (Acme Corporation):

Overview
- Employer: Acme Corporation
- Employee: Jane Doe
- Date of offer: April 1, 2023
- Confidentiality: Marked Private & Confidential
- Acceptance deadline: April 15, 2023
- Start date (if conditions met): April 24, 2023
- Location and work status: Full-time, exempt; primary work location San Francisco, CA; travel up to 10% of time
- Reporting: Reports to John Smith, Chief Technology Officer

1) Position and duties
- Title: Senior Software Engineer
- Duties: Designing, coding, testing, maintaining enterprise applications; cross-functional collaboration; mentoring junior engineers; compliance with internal security and data privacy standards
- At-will employment status; California wage and hour exempt status

2) Start Date and Duration
- Start date: April 24, 2023 (contingent on conditions in Section 6)
- Offer expiration: April 15, 2023 (unless signed ac

Extract Information from a DOCX (Word) File
Reads paragraphs and asks GPT-5-nano for key points.

In [5]:
docx_path = "data/Project_List.docx" 

def extract_text_from_docx(path):
    document = docx.Document(path)
    return "\n".join([p.text for p in document.paragraphs])

docx_text = extract_text_from_docx(docx_path)

print (f"Extracted DOCX Text (first 500 chars):\n{docx_text[:500]}\n")
prompt = f"Extract key points from this DOCX content:\n\n{docx_text[:8000]}"

Extracted DOCX Text (first 500 chars):

Project 1

Project: Real Estate Platform - eREP
Company: Sansa Technology LLC, Milpitas, CA, USA
Duration: From March 2017 – Current
Role: Java EE Developer
Description:
eREP provides leading real estate and rental marketplace platform. eREP serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more. 
Responsibilities:
• Analyzed requirements and designed class diagrams, sequence diagrams using UML and
prepared high level technical documen



In [6]:
response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "system", "content": "You summarize and extract structured data from Word documents."},
        {"role": "user", "content": prompt}
    ]
)

print("Extracted Info from DOCX:\n")
print(response.choices[0].message.content)

Extracted Info from DOCX:

Here are the key points structured by project.

Project 1
- Project: Real Estate Platform - eREP
- Company: Sansa Technology LLC, Milpitas, CA, USA
- Duration: From March 2017 – Current
- Role: Java EE Developer
- Description: eREP provides a leading real estate and rental marketplace platform covering the full lifecycle of owning and living in a home (buying, selling, renting, financing, remodeling, etc.)
- Responsibilities:
  - Analyzed requirements and designed class diagrams, sequence diagrams using UML; prepared high level technical documents
  - Implemented Java and J2EE design patterns
  - Utilized Spring MVC annotations and XML configuration for Dependency Injection
  - Implemented Persistence layer using Hibernate to interact with the MySQL database
  - Wrote various SQL queries in the DAO layer
  - Developed REST web services using Spring, Hibernate, JAX-RS and JAXB
  - Developed UI using Spring view component, JSP, HTML, CSS, JavaScript and Angular