Skip to content

felixmeyer6/handshake-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Handshake Scraper

Scrapes job offers from a Handshake search URL into a CSV. If your school/organization uses SSO, you’ll log in once in a normal Chrome window; the script reuses a local Chrome profile for subsequent runs.

Features

  • Paginates through a Handshake search.
  • Visits each job page and extracts key fields.
  • Writes a tidy CSV (handshake_jobs.csv) ready for analysis.

Extracted columns

  • Company
    • Name
    • Sector
    • Headcount
  • Job
    • Title
    • PostedAt
    • Duration
    • Start
    • Location
    • Description
    • Link

Requirements

  • Python: 3.9+ recommended
  • Google Chrome installed
  • Dependencies: pandas, selenium, webdriver-manager

Quick Start

  1. Clone / download the repo and open it in a terminal.

  2. (Optional) Create a virtual environment

    # macOS/Linux
    python -m venv .venv
    source .venv/bin/activate
    
    # Windows
    .\.venv\Scripts\activate
  3. Install dependencies

    pip install pandas selenium webdriver-manager
  4. Run the scraper with a Handshake search URL that contains page=1:

    python3 handshake_scraper.py \
      -u "https://yourorg.joinhandshake.fr/job-search/123456?query=yourdreamjob&per_page=25&page=1" \
      -p 2 \
      -t 10
    • -u/--url (required): Full search URL including page=1.
    • -p/--pages (optional): Max pages to scrape starting from 1 (default -1 = unlimited).
    • -t/--throttle (optional): Slowness 0..100 (default 10). Higher = slower & gentler.
  5. Output: handshake_jobs.csv in the current folder.

What you’ll see in the terminal

  • [SSO] … login hints
  • [PAGE] … pagination progress
  • [JOB i/N] … job pages being scraped
  • [SLEEP] … time throttling
  • [DATA] … one-line records per field
  • [WARN] … warnings
  • [OK] … on success

Login & Session Notes

The script uses a persistent Chrome profile at:

  • macOS/Linux: ~/.handshake_chrome_profile
  • Windows: C:\Users\<you>\.handshake_chrome_profile

First run may prompt you to log in. Subsequent runs reuse the session.

Troubleshooting

  • No CSV written: If no jobs are found or pages error out, you’ll see [WARN] No rows scraped. Confirm your URL is valid and includes page=1, you’re logged in, and the page has listings.
  • Blocked/Rate-limited: Increase -t or try fewer pages.
  • Layout changes: The script uses XPath selectors; if Handshake changes markup, some fields may come back empty. Update the XPaths in the constants section.
  • Persisting Chrome session: If you see SessionNotCreatedException: ... user data directory is already in use, it means a previous Chrome session still owns the profile.
  1. Close any leftover Chrome/driver windows or
  2. Delete the profile folder ~/.handshake_chrome_profile and re-run or
  3. Run one instance at a time.

Safety & Respect

Use responsibly and follow your organization’s and Handshake’s terms of service. Avoid aggressive scraping (raise -t, limit pages) and cache results when possible.

About

Scrapes job offers from a URL. You may need to log in to your organization's Handshake.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages