A Python-based system for managing engineering firm pre-qualifications and recommending eligible firms for Illinois Department of Transportation (IDOT) infrastructure projects.
This system automates the process of:
- Firm Data Management — Processing, validating, and storing data for 415+ engineering firms with 44 unique prequalification categories
- Project Bulletin Extraction — Parsing IDOT Project Technical Bulletins (PTBs) to extract project requirements, districts, and scope
- Firm-Project Matching — Using TF-IDF similarity, historical award analysis, and district rotation rules to recommend the top eligible firms for each project
- Continuous Data Pipeline — Automated processing of new bulletins with incremental database updates, backups, and monitoring
├── FIRM/ # Firm master data (CSV)
├── ex/ # Extended modules
│ ├── FIRM/ # Qualification engine, recommendation system, award validation
│ ├── pipeline/ # Continuous data pipeline with SQLite database
│ └── *.py # Extractors and pipeline orchestration
├── files/
│ └── word/ # PTB bulletin documents (DOCX) and analysis results
├── *.py # Root-level processors, analyzers, and utilities
├── *.json # Data files (awards, firms, prequalifications)
└── *.md # Documentation and guides
firm_data_processor.py— Main firm data processor with cleaning, validation, and database storagefirm_excel_processor.py— Excel-to-JSON transformation with 100% accuracy column mappingbuild_corrected_json.py— Corrected firm JSON builder with full validation
ptb217_fixed_extraction_system.py— PTB project extraction with TF-IDF prequalification matchingptb217_rotation_test_system.py— District rotation rule testing (firms winning PTB N are ineligible for PTB N+1)ex/FIRM/automated_recommendation_system.py— Automated top-5 firm recommendations per projectex/FIRM/enhanced_qualification_engine.py— Multi-layer firm-project qualification matching
verify_prequals.py— Prequalification duplicate and formatting checkercheck_duplicates.py/verify_duplicates.py— Firm code uniqueness verificationanalyze_prequals.py— Comprehensive prequalification distribution analysis
ex/continuous_data_pipeline.py— Continuous bulletin processing with scheduling, backups, and health monitoring
| Dataset | Count |
|---|---|
| Eligible Firms | 415 |
| Unique Prequalifications | 44 |
| Historical Award Records | 2,095 |
| Projects in Database | 46 |
| Data Quality Score | 98.5%+ |
pandas
openpyxl
python-docx
scikit-learn
numpy
# Clone the repository
git clone https://github.com/rkhan60/IDOT.git
cd IDOT
# Install dependencies
pip install pandas openpyxl python-docx scikit-learn numpy
# Run firm analysis
python analyze_prequals.py
# Run data validation
python verify_prequals.py
# Run tests
python test_firm_processor.py
python test_excel_processor.pyContributions are welcome! Please open an issue or submit a pull request.
MIT License