a natural language interface for querying montreal's tech job market. ask questions in plain english or french and get answers from a postgresql database of job postings.
"what are the top 10 most requested skills?"
↓
SELECT s.name, COUNT(*) AS job_count
FROM skills s
JOIN job_skills js ON s.id = js.skill_id
GROUP BY s.id, s.name
ORDER BY job_count DESC
LIMIT 10;
↓
┌────────────┬───────────┐
│ name │ job_count │
├────────────┼───────────┤
│ Python │ 12 │
│ JavaScript │ 9 │
│ Java │ 8 │
│ ... │ ... │
└────────────┴───────────┘
ask questions in plain english or french - the system converts them to sql automatically.
filter and browse internships by skill, company, or location.
see skill demand, frequency, and co-occurrence patterns.
- fetches job postings from jsearch api (indeed, linkedin, etc.)
- extracts structured data using gpt-4o-mini (skills, salary, location)
- stores everything in a normalized postgresql database
- answers questions by converting natural language to sql with gpt-4o
- visualizes trends through a streamlit dashboard
git clone https://github.com/yourusername/emploiql.git
cd emploiql
pip install -r requirements.txtcreate a .env file in the project root:
OPENAI_API_KEY=your-openai-api-key
RAPIDAPI_KEY=your-rapidapi-key-for-jsearchget your api keys:
- openai: https://platform.openai.com/api-keys
- rapidapi (jsearch): https://rapidapi.com/letscrape-6bRBa3QguO5/api/jsearch
./scripts/setup_db.shthis creates a postgresql database called EmploiQL with all required tables.
# fetch jobs from jsearch api (uses 1 api call)
python3 src/jsearch.py --preset software --live
# load into database (--mock skips openai extraction to save costs)
python3 scripts/load_jsearch.py --preset software --mock# interactive mode
python3 src/cli.py
# single question
python3 src/cli.py "what skills are most in demand?"
# web dashboard
streamlit run src/app.pyemploiql/
├── src/
│ ├── app.py # streamlit dashboard
│ ├── cli.py # command line interface
│ ├── config.py # environment variable loading
│ ├── db.py # database connection and crud operations
│ ├── extraction.py # gpt-4o-mini job data extraction
│ ├── jsearch.py # jsearch api integration
│ └── text_to_sql.py # natural language to sql conversion
├── scripts/
│ ├── setup_db.sh # creates postgresql database
│ ├── load_jsearch.py # loads jobs into database
│ ├── test_schema.py # validates database schema
│ └── test_db.py # tests crud operations
├── documentation/
│ └── schema.md # database schema documentation
├── data/
│ └── jsearch_cache/ # cached api responses (gitignored)
├── schema.sql # database table definitions
├── commands.md # command reference
├── requirements.txt # python dependencies
└── .env # api keys (gitignored)
five normalized tables designed to store job postings without duplicating data:
companies ──────────────┐
▼
raw_postings ────────► jobs ◄────── job_skills ────► skills
| table | purpose |
|---|---|
| companies | stores each company once (name, website) |
| jobs | job postings with title, description, salary, location |
| skills | master list of normalized skill names |
| job_skills | junction table linking jobs to skills (many-to-many) |
| raw_postings | archived raw content before processing |
see documentation/schema.md for full details.
# start interactive repl
python3 src/cli.py
# ask a single question
python3 src/cli.py "which companies have the most postings?"
# run preset queries
python3 src/cli.py --preset skills
python3 src/cli.py --preset companies
python3 src/cli.py --preset remote
# show database statistics
python3 src/cli.py --stats
# execute raw sql (select only)
python3 src/cli.py --sql "SELECT * FROM companies LIMIT 5"| preset | description |
|---|---|
skills |
top 10 most requested skills |
companies |
companies with most postings |
remote |
remote job opportunities |
python |
jobs requiring python |
salaries |
jobs with salary info |
recent |
most recently added postings |
devops |
devops/cloud jobs |
fullstack |
full-stack development jobs |
streamlit run src/app.pyfeatures:
- overview metrics (total jobs, companies, skills)
- top skills bar chart
- top hiring companies
- jobs by location pie chart
- salary by skill analysis
- natural language query interface
- job exploration with filters
- skill co-occurrence analysis
# using presets (recommended)
python3 src/jsearch.py --preset software --live
python3 src/jsearch.py --preset devops --live
python3 src/jsearch.py --preset data --live
# custom queries
python3 src/jsearch.py --query "python intern" --live
python3 src/jsearch.py --query "react developer" --location "Toronto, Canada" --live
# fetch multiple pages (10 jobs per page, 1 api call each)
python3 src/jsearch.py --preset software --pages 3 --live
# list available presets
python3 src/jsearch.py --list-presets
# list cached searches
python3 src/jsearch.py --list-cache| preset | query |
|---|---|
software |
software engineer intern |
developer |
software developer intern |
devops |
devops intern |
data |
data science intern |
cyber |
cybersecurity intern |
frontend |
frontend developer intern |
backend |
backend developer intern |
fullstack |
full stack developer intern |
ml |
machine learning intern |
qa |
qa engineer intern |
# load from preset (with mock extraction - free)
python3 scripts/load_jsearch.py --preset software --mock
# load with full gpt extraction (costs ~$0.0001 per job)
python3 scripts/load_jsearch.py --preset software
# load all presets at once
python3 scripts/load_jsearch.py --all --mock
# limit jobs for testing
python3 scripts/load_jsearch.py --preset software --mock --limit 5┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ jsearch │───►│ raw_post │───►│ gpt-4o- │───►│ database │
│ api │ │ cache │ │ mini │ │ insert │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
fetch store extract normalize
- fetch: jsearch api retrieves job postings from indeed, linkedin, etc.
- cache: raw responses saved to
data/jsearch_cache/to preserve api quota - extract: gpt-4o-mini extracts structured data (title, company, skills, salary)
- normalize: skills are normalized ("python3" → "Python") and stored
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ natural │───►│ gpt-4o │───►│ validate │───►│ execute │
│ language │ │ generate │ │ safety │ │ query │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
question sql check results
- question: user asks in plain english or french
- generate: gpt-4o converts to sql using schema context + few-shot examples
- validate: safety check blocks non-select queries (drop, delete, etc.)
- execute: query runs against postgresql, results returned as dicts
| model | purpose | cost |
|---|---|---|
| gpt-4o-mini | bulk data extraction | ~$0.0001/job |
| gpt-4o | text-to-sql (accuracy matters) | ~$0.001/query |
the system understands natural language questions like:
what are the most in-demand skills?
show me python internships at ubisoft
which companies offer remote positions?
what's the average salary for devops roles?
list all cybersecurity internships
how many jobs require both python and sql?
show jobs that require aws, docker, or kubernetes
what skills does intact look for?
- free tier: 200 requests/month
- each page fetch = 1 request
- caching prevents duplicate api calls
- gpt-4o-mini extraction: ~$0.0001 per job posting
- gpt-4o text-to-sql: ~$0.001 per query
- mock mode available for testing (no api cost)
# test database schema
python3 scripts/test_schema.py
# test crud operations
python3 scripts/test_db.pypython3 src/config.pythe app supports two modes:
- local development: connects to postgresql on localhost:5432
- cloud/production: uses
DATABASE_URLenvironment variable (supabase, heroku, etc.)
| component | technology |
|---|---|
| database | postgresql |
| language | python 3.11 |
| db driver | psycopg2 |
| llm | openai gpt-4o, gpt-4o-mini |
| job data | jsearch api (rapidapi) |
| cli | rich |
| dashboard | streamlit, plotly |
| env management | python-dotenv |
psycopg2-binary
openai>=1.0.0
rich>=13.0.0
requests>=2.31.0
beautifulsoup4>=4.12.0
python-dotenv>=1.0.0
streamlit>=1.28.0
plotly>=5.18.0
create a .env file in the project root:
OPENAI_API_KEY=sk-your-key-here
make sure postgresql is running:
# mac
brew services start postgresql@16
# check if database exists
psql -l | grep EmploiQLif database doesn't exist, run setup:
./scripts/setup_db.shfetch jobs first before loading:
python3 src/jsearch.py --preset software --live
python3 scripts/load_jsearch.py --preset software --mockthe free tier has 200 requests/month. use caching:
# first time: fetch with --live
python3 src/jsearch.py --preset software --live
# subsequent runs: omit --live to use cache
python3 src/jsearch.py --preset softwaremit
- jsearch api for job data
- openai for llm capabilities
- rich for beautiful terminal output
- streamlit for the dashboard framework






