Public Health Data Visualization System

An end-to-end visual analytics pipeline built on the Global Health Statistics dataset (1,000,000 records with 22 fields).
Developed progressively across six milestones as part of a group data visualization project.

Group Members

Abby Sarah
Asami Mary
Annbel Muthoni
Christine Wambui
Chrystabel Young
Esther Jubilee
Joyce Wambu
Rita Kimani
Sharon Wanja

Project Structure

Data-Visualization-Project/
├── data/
│   ├── raw/                  # Original dataset — NOT pushed to GitHub
│   │   └── DATA_SOURCE.md    # Google Drive link to download the dataset
│   └── processed/            # Cleaned/transformed outputs from milestone pipelines
├── notebooks/
│   ├── load_dataset.ipynb    # Always run this first before any milestone notebook
│   ├── milestone1_foundations.ipynb
│   ├── milestone2_pipeline.ipynbV
│   ├── milestone3_visualization.ipynb
│   ├── milestone4_statistics.ipynb
│   ├── milestone5_dashboard.ipynb
│   └── milestone6_research.ipynb
├── docs/
│   ├── charts/               # Saved chart outputs
│   ├── data_schema.md        # Variable definitions and types
│   └── data_quality_report.md
├── .gitignore
├── requirements.txt
└── README.md

Dataset

Name: Global Health Statistics
Size: 1,000,000 rows × 22 columns
Source: Google Drive Folder
Key variables: Country, Year, Disease Name, Disease Category, Mortality Rate, Healthcare Access, Recovery Rate, Per Capita Income, and more.

The raw CSV is excluded from Git (>100MB). Follow the setup steps below to download it.

First-Time Setup (Run Once After Cloning) => Cloning is done after forking the repository from repo owner

Step 1 — Clone the repo

git clone git@github.com:<username>/Data-Visualization-Project.git
cd Data-Visualization-Project

Step 2 — Create the virtual environment

Ubuntu/Mac:

python3 -m venv .venv
source .venv/bin/activate

Windows:

python -m venv .venv
.venv\Scripts\activate

Your terminal prompt should now show (.venv) — this means the environment is active.

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4 — Register the Jupyter kernel

python -m ipykernel install --user --name=dataviz --display-name "Python (dataviz)"

Step 5 — Download the dataset

Ubuntu/Mac:

python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"

Windows:

python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"

Same command for both — Note => just make sure your .venv is active first.

Step 6 — Select the kernel in VS Code

Open any .ipynb notebook in VS Code
Click the kernel selector (top right corner)
Select Python (dataviz)

Step 7 — Verify everything works

Open and run notebooks/load_dataset.ipynb — it should print:

All checks passed. Dataset is ready.

Daily Workflow (Every Time You Work)

Activate the virtual environment first

Ubuntu/Mac:

source .venv/bin/activate

Windows:

.venv\Scripts\activate

Then sync with main before starting

git pull origin main --rebase

Create your named feature branch if you haven't already

git checkout -b yourname/milestone1-task

Save and push your work

git add .
git commit -m "Brief description of what you did"
git push origin yourname/milestone1-task

Then open a Pull Request on GitHub and tag the team lead (Repo owner) for review before merging.

Rules

Never commit directly to main
Never push files from data/raw/ or data/processed/
Never add new folders or files — use the existing structure only
End-of-day merging is done with all members present

Milestones

#	Title	Notebook	Status
1	Data Representation & Foundations	`milestone1_foundations.ipynb`	🔲
2	Data Processing & Transformation	`milestone2_pipeline.ipynb`	🔲
3	Visualization & Exploratory Analysis	`milestone3_visualization.ipynb`	🔲
4	Statistical Inference & Analytical Modeling	`milestone4_statistics.ipynb`	🔲
5	Interactive Visual Analytics System	`milestone5_dashboard.ipynb`	🔲
6	Research Contribution & Advanced Analytics	`milestone6_research.ipynb`	🔲

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Public Health Data Visualization System

Group Members

Project Structure

Dataset

First-Time Setup (Run Once After Cloning) => Cloning is done after forking the repository from repo owner

Step 1 — Clone the repo

Step 2 — Create the virtual environment

Step 3 — Install dependencies

Step 4 — Register the Jupyter kernel

Step 5 — Download the dataset

Step 6 — Select the kernel in VS Code

Step 7 — Verify everything works

Daily Workflow (Every Time You Work)

Activate the virtual environment first

Then sync with main before starting

Create your named feature branch if you haven't already

Save and push your work

Rules

Milestones

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
data		data
docs		docs
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
data_acquisition.ipynb		data_acquisition.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Public Health Data Visualization System

Group Members

Project Structure

Dataset

First-Time Setup (Run Once After Cloning) => Cloning is done after forking the repository from repo owner

Step 1 — Clone the repo

Step 2 — Create the virtual environment

Step 3 — Install dependencies

Step 4 — Register the Jupyter kernel

Step 5 — Download the dataset

Step 6 — Select the kernel in VS Code

Step 7 — Verify everything works

Daily Workflow (Every Time You Work)

Activate the virtual environment first

Then sync with main before starting

Create your named feature branch if you haven't already

Save and push your work

Rules

Milestones

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages