A Python application for bulk processing records from ClickHouse using AI services and storing results in MongoDB.
- Fetch records from ClickHouse database
- Process records using AI services (Gemini)
- Store processed results in MongoDB
- Batch processing with configurable batch sizes
- Async processing for improved performance
-
Clone the repository
git clone <repository-url> cd python-bulk-processer
-
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
cp .env.example .env # Edit .env with your actual credentials
-
Run the application
python run_batch.py
Edit .env
file with your credentials:
- ClickHouse: Database connection details
- MongoDB: Database and collection settings
- AI Service: Choose Gemini
├── libs/ # Library modules
├── tests/ # Test files
├── providers/ # AI service providers
├── batch_manager.py # Main processing logic
└── run_batch.py # Entry point
The application processes records in batches, extracting vehicle information from product titles using AI services and storing the results in MongoDB.
- Python 3.8+
- ClickHouse database
- MongoDB database
- AI service API key (Gemini)