This project computes annual birth statistics (live births, still births and birth rate) and saves outputs in CSV format.
project/
├── births_pipeline.py
├── README.md
├── requirements.txt
├── data/
├── data_2023.csv
├── data_2024.csv
├── data_2025.csv
└── pop_data.csv
├── outputs/
├── docs/
└── data_dictionary.xlsx
└── tests/
└── test_pipeline.py
- Ensure the
data/folder containsdata_YYYY.csvfor the year(s) you want to process andpop_data.csv. - Review
docs/data_dictionary.xlsxfor field definitions and validation rules before touching the source data. - (Optional) Create a virtual environment and install requirements:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Run the pipeline for a year from the command line (the example below uses 2024):
python births_pipeline.py --year 2024 --out-dir outputsRun the standard test suite to ensure the pipeline can execute end-to-end on the data and emit all of the expected CSVs:
python -m pytestAll data definitions live in docs/data_dictionary.xlsx; consult it to understand required columns, types, and acceptable value ranges.
The following CSV files are produced in the chosen output directory:
outputs/{year}_totals.csvoutputs/{year}_by_sex.csvoutputs/{year}_by_region.csvoutputs/{year}_by_sex_region.csv
Birth rate is calculated as: