A Python service for parsing and anonymizing CVs/resumes using small language models.
This project involves building a Python service that can parse and anonymize CVs/resumes using small language models. The service consists of two main components:
- A REST API application with endpoints for parsing and anonymizing resumes
- An implementation using the smolmodels library to create and utilize compact language models
- Parse CVs/resumes in various formats (PDF, DOCX, TXT)
- Extract structured information including contact details, work experience, education, skills, etc.
- Anonymize personal information with configurable anonymization levels
- REST API for integration with other systems
- Small, efficient language models for parsing and anonymization
# Clone the repository
git clone https://github.com/fpardon-upeo/cv-parser.git
cd cv-parser
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Start the API server
cd app
python main.pyThe API will be available at http://localhost:8000
POST /parse- Extract structured information from a resumePOST /anonymize- Create an anonymized version of a resumeGET /health- Health check endpointGET /version- Version informationGET /docs- API documentation (Swagger/OpenAPI)
app/- Main application codeapi/- API endpoints and routescore/- Core application functionalitymodels/- Data models and schemasservices/- Business logic servicesutils/- Utility functionsdata/- Sample data for testingoutput/- Output files
build-plan/- Project planning documentstests/- Test cases
MIT