Upload PDF and DOCX resumes, extract name/email/phone from each, and download a structured Excel workbook.
- Batch upload — select multiple files at once via drag-and-drop or file picker
- PDF & DOCX support — extracts text from both formats
- Smart extraction — regex-based parsing for names, email addresses, and phone numbers (international formats supported)
- Excel export — results compiled into a single
.xlsxworkbook - Dark mode — respects system
prefers-color-scheme - Accessible — keyboard-navigable, ARIA labels,
prefers-reduced-motionsupport - CSRF-protected — form submissions secured with Flask-WTF
| Layer | Technology |
|---|---|
| Backend | Flask 3.x + Gunicorn |
| pypdf | |
| DOCX | python-docx |
| Excel | openpyxl |
| Security | Flask-WTF (CSRF protection) |
| Frontend | Vanilla HTML/CSS/JS (no framework) |
| CI/CD | GitHub Actions (ruff + pytest + Docker → GHCR) |
- Python 3.10+
- pip
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
python run.pyOpen http://localhost:5000.
docker compose upOpen http://localhost:8000.
Environment variables (.env file):
| Variable | Default | Description |
|---|---|---|
SECRET_KEY |
auto-generated | Flask session signing key |
MAX_CONTENT_LENGTH |
16777216 |
Max upload size (bytes) |
UPLOAD_FOLDER |
uploads |
Directory for uploaded files |
SECRET_KEY is auto-generated via secrets.token_hex(32) if not set.
├── app/
│ ├── __init__.py # App factory, CSRF init
│ ├── config.py # Env-based configuration
│ ├── extractor.py # Text extraction + regex parsing
│ ├── models.py # CvData dataclass
│ ├── routes.py # GET / and POST /extract
│ ├── shell.py # Flask shell helpers
│ ├── workbook.py # Excel workbook builder
│ ├── static/
│ │ ├── css/main.css # Full-featured stylesheet
│ │ └── js/main.js # Drag-drop, validation, spinner
│ └── templates/
│ └── index.html # Upload form
├── tests/
│ ├── conftest.py # pytest fixtures
│ ├── test_extractor.py # Extraction logic tests
│ └── test_routes.py # Route tests
├── run.py # Dev server entry point
├── wsgi.py # Production entry point
├── pyproject.toml # Build config + ruff + pytest
├── Dockerfile # Multi-stage Docker build (OCI labels, healthcheck)
├── docker-compose.yml # Single-service compose (supports GHCR image)
└── .github/workflows/
├── ci.yml # Lint + test + build on PR/push
└── deploy.yml # Push Docker image to GHCR on main/tags
pip install -r requirements-dev.txt
pytestpip install ruff
ruff check .Uses ruff with rulesets: E, F, W, I, N, UP, B, SIM.
pip install pre-commit
pre-commit installRuns ruff lint + format on commit.
Upload one or more CV files.
- Content-Type:
multipart/form-data - Field name:
files[] - Accepted formats:
.pdf,.docx - Response:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet(Excel download)
Returns 400 with error message on empty upload or extraction failure.
Every push to main publishes a Docker image to the GitHub Container Registry:
docker pull ghcr.io/instax-dutta/cv-extractor-webapp:latest
docker run -p 8000:8000 ghcr.io/instax-dutta/cv-extractor-webapp:latestOr with docker-compose (pulls the published image automatically):
cp .env.example .env
docker compose updocker build -t cv-extractor .
docker run -p 8000:8000 cv-extractorThe Docker image uses gunicorn with a multi-stage build for a slim production image, and includes a built-in health check.
MIT