This beginner-level ETL project demonstrates how to extract data from a CSV file, transform it using Python and Pandas, and load the cleaned output into a new file.
- Python
- Pandas
- Jupyter Notebooks
- GitHub for version control
/data → raw input CSV files
/output → cleaned/transformed CSV files
/scripts → Python ETL scripts
/notebooks → Jupyter notebooks for exploration
- Extract: Load raw CSV data
- Transform: Clean, filter, and reshape the data
- Load: Save the cleaned dataset to
/output - Document the process in a notebook
- A simple, reproducible ETL pipeline
- Cleaned dataset ready for analysis
- Add logging
- Add error handling
- Turn the ETL script into a scheduled automation