Skip to content

mendarrr/Data-Visualization-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Public Health Data Visualization System

An end-to-end visual analytics pipeline built on the Global Health Statistics dataset (1,000,000 records with 22 fields).
Developed progressively across six milestones as part of a group data visualization project.


Group Members

  1. Abby Sarah
  2. Asami Mary
  3. Annbel Muthoni
  4. Christine Wambui
  5. Chrystabel Young
  6. Esther Jubilee
  7. Joyce Wambu
  8. Rita Kimani
  9. Sharon Wanja

Project Structure

Data-Visualization-Project/
├── data/
│   ├── raw/                  # Original dataset — NOT pushed to GitHub
│   │   └── DATA_SOURCE.md    # Google Drive link to download the dataset
│   └── processed/            # Cleaned/transformed outputs from milestone pipelines
├── notebooks/
│   ├── load_dataset.ipynb    # Always run this first before any milestone notebook
│   ├── milestone1_foundations.ipynb
│   ├── milestone2_pipeline.ipynbV
│   ├── milestone3_visualization.ipynb
│   ├── milestone4_statistics.ipynb
│   ├── milestone5_dashboard.ipynb
│   └── milestone6_research.ipynb
├── docs/
│   ├── charts/               # Saved chart outputs
│   ├── data_schema.md        # Variable definitions and types
│   └── data_quality_report.md
├── .gitignore
├── requirements.txt
└── README.md

Dataset

  • Name: Global Health Statistics
  • Size: 1,000,000 rows × 22 columns
  • Source: Google Drive Folder
  • Key variables: Country, Year, Disease Name, Disease Category, Mortality Rate, Healthcare Access, Recovery Rate, Per Capita Income, and more.

The raw CSV is excluded from Git (>100MB). Follow the setup steps below to download it.


First-Time Setup (Run Once After Cloning) => Cloning is done after forking the repository from repo owner

Step 1 — Clone the repo

git clone git@github.com:<username>/Data-Visualization-Project.git
cd Data-Visualization-Project

Step 2 — Create the virtual environment

Ubuntu/Mac:

python3 -m venv .venv
source .venv/bin/activate

Windows:

python -m venv .venv
.venv\Scripts\activate

Your terminal prompt should now show (.venv) — this means the environment is active.

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4 — Register the Jupyter kernel

python -m ipykernel install --user --name=dataviz --display-name "Python (dataviz)"

Step 5 — Download the dataset

Ubuntu/Mac:

python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"

Windows:

python -c "import gdown; gdown.download('https://drive.google.com/uc?id=1cug4qWE6qFArHmYwcXUdDMaJIfmUzAZD', 'data/raw/Global Health Statistics.csv', quiet=False)"

Same command for both — Note => just make sure your .venv is active first.

Step 6 — Select the kernel in VS Code

  1. Open any .ipynb notebook in VS Code
  2. Click the kernel selector (top right corner)
  3. Select Python (dataviz)

Step 7 — Verify everything works

Open and run notebooks/load_dataset.ipynb — it should print:

All checks passed. Dataset is ready.

Daily Workflow (Every Time You Work)

Activate the virtual environment first

Ubuntu/Mac:

source .venv/bin/activate

Windows:

.venv\Scripts\activate

Then sync with main before starting

git pull origin main --rebase

Create your named feature branch if you haven't already

git checkout -b yourname/milestone1-task

Save and push your work

git add .
git commit -m "Brief description of what you did"
git push origin yourname/milestone1-task

Then open a Pull Request on GitHub and tag the team lead (Repo owner) for review before merging.


Rules

  • Never commit directly to main
  • Never push files from data/raw/ or data/processed/
  • Never add new folders or files — use the existing structure only
  • End-of-day merging is done with all members present

Milestones

# Title Notebook Status
1 Data Representation & Foundations milestone1_foundations.ipynb 🔲
2 Data Processing & Transformation milestone2_pipeline.ipynb 🔲
3 Visualization & Exploratory Analysis milestone3_visualization.ipynb 🔲
4 Statistical Inference & Analytical Modeling milestone4_statistics.ipynb 🔲
5 Interactive Visual Analytics System milestone5_dashboard.ipynb 🔲
6 Research Contribution & Advanced Analytics milestone6_research.ipynb 🔲

About

An end-to-end visual analytics pipeline covering data processing, statistical analysis, and interactive visualization that will be developed progressively across six milestones.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors