An end-to-end resume parsing stack with FastAPI backend, Streamlit UI, and MongoDB Atlas storage. It extracts name, email, phone, skills, education, and experience from PDF or DOCX resumes using spaCy NER, regexes, rule-based patterns, and TF-IDF skill matching.
- Upload resumes (PDF, DOCX) via FastAPI or Streamlit.
- Text extraction with pdfplumber and python-docx.
- NLP pipeline with spaCy
en_core_web_sm, regex contact parsing, rule-based education/experience extraction. - Skill detection from a curated dictionary plus TF-IDF cosine similarity.
- Structured JSON storage in MongoDB Atlas with indexes for fast search/filter.
- FastAPI endpoints for upload, retrieval, and search; Streamlit dashboard for manual uploads/search.
-
Python env
python -m venv .venv .venv\Scripts\activate # Windows pip install -r requirements.txt python -m spacy download en_core_web_sm
-
Environment
Copy
.env.exampleto.envand setMONGODB_URI,MONGODB_DB,MONGODB_COLLECTION, optionallySKILL_DICTIONARY_PATH. -
Run FastAPI
uvicorn app.api.main:app --reload --port 8000
- Upload:
POST /api/resumes(multipart filefile). - Get by id:
GET /api/resumes/{id}. - Search:
GET /api/resumes/search?skill=python&text=data&name=smith.
- Upload:
-
Run Streamlit UI
# from project root set PYTHONPATH=. # PowerShell: $env:PYTHONPATH='.' streamlit run app/ui/streamlit_app.py
Upload a resume to preview parsed JSON and optionally persist to MongoDB.
- Indexes are created automatically on first DB use (
email,skills,name,education.degree,experience.company). - Skill dictionary lives at
data/skills.json; extend it to improve matching. - For production, secure your MongoDB credentials and consider larger spaCy models.