An end-to-end visual analytics pipeline built on the Global Health Statistics dataset (1,000,000 records with 22 fields).
Developed progressively across six milestones as part of a group data visualization project.
- Abby Sarah
- Asami Mary
- Annbel Muthoni
- Christine Wambui
- Chrystabel Young
- Esther Jubilee
- Joyce Wambu
- Rita Kimani
- Sharon Wanja
Data-Visualization-Project/
├── data/
│ ├── raw/ # Original dataset — NOT pushed to GitHub
│ │ └── DATA_SOURCE.md # Google Drive link to download the dataset
│ └── processed/ # Cleaned/transformed outputs from milestone pipelines
├── notebooks/
│ ├── load_dataset.ipynb # Always run this first before any milestone notebook
│ ├── milestone1_foundations.ipynb
│ ├── milestone2_pipeline.ipynbV
│ ├── milestone3_visualization.ipynb
│ ├── milestone4_statistics.ipynb
│ ├── milestone5_dashboard.ipynb
│ └── milestone6_research.ipynb
├── docs/
│ ├── charts/ # Saved chart outputs
│ ├── data_schema.md # Variable definitions and types
│ └── data_quality_report.md
├── .gitignore
├── requirements.txt
└── README.md
- Name: Global Health Statistics
- Size: 1,000,000 rows × 22 columns
- Source: Google Drive Folder
- Key variables: Country, Year, Disease Name, Disease Category, Mortality Rate, Healthcare Access, Recovery Rate, Per Capita Income, and more.
The raw CSV is excluded from Git (>100MB). Follow the setup steps below to download it.
First-Time Setup (Run Once After Cloning) => Cloning is done after forking the repository from repo owner
git clone git@github.com:<username>/Data-Visualization-Project.git
cd Data-Visualization-ProjectUbuntu/Mac:
python3 -m venv .venv
source .venv/bin/activateWindows:
python -m venv .venv
.venv\Scripts\activateYour terminal prompt should now show
(.venv)— this means the environment is active.
pip install -r requirements.txtpython -m ipykernel install --user --name=dataviz --display-name "Python (dataviz)"Ubuntu/Mac:
python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"Windows:
python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"Same command for both — Note => just make sure your
.venvis active first.
- Open any
.ipynbnotebook in VS Code - Click the kernel selector (top right corner)
- Select Python (dataviz)
Open and run notebooks/load_dataset.ipynb — it should print:
All checks passed. Dataset is ready.
Ubuntu/Mac:
source .venv/bin/activateWindows:
.venv\Scripts\activategit pull origin main --rebasegit checkout -b yourname/milestone1-taskgit add .
git commit -m "Brief description of what you did"
git push origin yourname/milestone1-taskThen open a Pull Request on GitHub and tag the team lead (Repo owner) for review before merging.
- Never commit directly to
main - Never push files from
data/raw/ordata/processed/ - Never add new folders or files — use the existing structure only
- End-of-day merging is done with all members present
| # | Title | Notebook | Status |
|---|---|---|---|
| 1 | Data Representation & Foundations | milestone1_foundations.ipynb |
🔲 |
| 2 | Data Processing & Transformation | milestone2_pipeline.ipynb |
🔲 |
| 3 | Visualization & Exploratory Analysis | milestone3_visualization.ipynb |
🔲 |
| 4 | Statistical Inference & Analytical Modeling | milestone4_statistics.ipynb |
🔲 |
| 5 | Interactive Visual Analytics System | milestone5_dashboard.ipynb |
🔲 |
| 6 | Research Contribution & Advanced Analytics | milestone6_research.ipynb |
🔲 |