Scrapes job offers from a Handshake search URL into a CSV. If your school/organization uses SSO, you’ll log in once in a normal Chrome window; the script reuses a local Chrome profile for subsequent runs.
- Paginates through a Handshake search.
- Visits each job page and extracts key fields.
- Writes a tidy CSV (
handshake_jobs.csv) ready for analysis.
- Company
- Name
- Sector
- Headcount
- Job
- Title
- PostedAt
- Duration
- Start
- Location
- Description
- Link
- Python: 3.9+ recommended
- Google Chrome installed
- Dependencies:
pandas,selenium,webdriver-manager
-
Clone / download the repo and open it in a terminal.
-
(Optional) Create a virtual environment
# macOS/Linux python -m venv .venv source .venv/bin/activate # Windows .\.venv\Scripts\activate
-
Install dependencies
pip install pandas selenium webdriver-manager
-
Run the scraper with a Handshake search URL that contains
page=1:python3 handshake_scraper.py \ -u "https://yourorg.joinhandshake.fr/job-search/123456?query=yourdreamjob&per_page=25&page=1" \ -p 2 \ -t 10-u/--url(required): Full search URL includingpage=1.-p/--pages(optional): Max pages to scrape starting from 1 (default -1 = unlimited).-t/--throttle(optional): Slowness 0..100 (default 10). Higher = slower & gentler.
-
Output:
handshake_jobs.csvin the current folder.
[SSO]… login hints[PAGE]… pagination progress[JOB i/N]… job pages being scraped[SLEEP]… time throttling[DATA]… one-line records per field[WARN]… warnings[OK]… on success
The script uses a persistent Chrome profile at:
- macOS/Linux:
~/.handshake_chrome_profile - Windows:
C:\Users\<you>\.handshake_chrome_profile
First run may prompt you to log in. Subsequent runs reuse the session.
- No CSV written: If no jobs are found or pages error out, you’ll see [WARN] No rows scraped. Confirm your URL is valid and includes page=1, you’re logged in, and the page has listings.
- Blocked/Rate-limited: Increase
-tor try fewer pages. - Layout changes: The script uses XPath selectors; if Handshake changes markup, some fields may come back empty. Update the XPaths in the constants section.
- Persisting Chrome session: If you see
SessionNotCreatedException: ... user data directory is already in use, it means a previous Chrome session still owns the profile.
- Close any leftover Chrome/driver windows or
- Delete the profile folder
~/.handshake_chrome_profileand re-run or - Run one instance at a time.
Use responsibly and follow your organization’s and Handshake’s terms of service. Avoid aggressive scraping (raise -t, limit pages) and cache results when possible.