1. Data Pipeline:
- GitHub Repo Name:
data-pipeline-php-airflow - MVP Goal: Simple data extraction from a CSV file, transformation, and loading into a MySQL table, scheduled with Airflow.
- Key Components:
- PHP Script:
- Extract data from a local CSV file.
- Transform data (basic data cleaning, e.g., trimming whitespace, basic data type conversions).
- Load data into a MySQL table.
- MySQL Database:
- A simple target table.
- Apache Airflow:
- A single DAG to run the PHP ETL script.
- PHP Script:
- Steps:
- Set up MySQL: Create a database and table.
- Create PHP Script: Implement basic ETL logic (using a basic csv file).
- Dockerize Airflow: Start Airflow using a docker-compose setup with a basic configuration.
- Create an Airflow DAG: Write a simple DAG in Python to run the PHP script.
- Run the DAG: Ensure the pipeline runs successfully.
- Documentation: Add README with setup and usage instructions.