A Python project for collecting, processing, and analyzing data feeds from various sources.
This project provides tools and utilities to:
- Collect data from various API endpoints, RSS feeds, and other data sources
- Process and transform raw data into structured formats
- Store data in various backends (files, databases)
- Analyze and visualize collected data
- Modular architecture for easy extension
- Support for multiple data source types
- Configurable data processing pipelines
- Scheduled data collection using task scheduling
- Data validation and error handling
- Caching and rate limiting for API sources
- Python 3.10 or higher
This project uses uv for dependency management.
-
Clone the repository:
git clone https://github.com/karimQuant/data-feed-collect.git cd data-feed-collect -
Create a virtual environment and install dependencies:
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -e .
Basic example of collecting data from a source:
from data_feed_collect import DataCollector
# Initialize collector with configuration
collector = DataCollector(config_path="config/sources.yaml")
# Collect data from a specific source
data = collector.collect("source_name")
# Process the collected data
processed_data = collector.process(data)
# Save the processed data
collector.save(processed_data, "output/path")data-feed-collect/
├── data_feed_collect/ # Main package
│ ├── __init__.py
│ ├── collectors/ # Data collection modules
│ ├── processors/ # Data processing modules
│ ├── storage/ # Data storage modules
│ └── utils/ # Utility functions
├── config/ # Configuration files
├── tests/ # Test suite
├── examples/ # Usage examples
└── docs/ # Documentation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.