# EquiHealth: Health Data Collection

This notebook will be used to **implement the collection of health-related datasets** for the EquiHealth project. 
The goal is to gather structured data that can later be used to analyze healthcare inequalities across districts and support the hospital complaint portal.

---

## Objectives

- Identify sources for multiple health datasets, including hospital facilities and population statistics.
- Plan the steps required to clean, standardize, and store data for future analysis and visualization.
- Outline the structure of processed datasets for dashboards and reporting.

---

## Planned Target Datasets

1. **Hospital / Healthcare Facility Data**
   - Sources: National Health Portal Hospital Directory, Rural Health Statistics (RHS), HMIS portal
   - Planned fields: Name, Type, Ownership, Address, District, State, Pincode, Contact info, Services, Beds, Geo-coordinates, Last updated, Source URL

2. **Population Data**
   - Sources: Census of India, district/state population projections
   - Planned fields: District/City name and code, Total population, Age groups, Urban/Rural split, Population density, Source URL, Year

3. **Other Health Indicators (Future / Optional)**
   - Staff availability, equipment, sanctioned vs functional beds, program coverage
   - Sources: HMIS, Ministry of Health reports, government portals

---

## Planned Data Fields

- **Facility Table (to be created):**  
  `facility_id`, `name`, `type`, `ownership`, `address`, `district`, `state`, `pincode`, `latitude`, `longitude`, `contact`, `services`, `beds`, `last_updated`, `source_url`

- **Population Table (to be created):**  
  `district_code`, `district_name`, `state_name`, `total_population`, `age_groups`, `urban_rural`, `source_url`, `year`

- **Optional Metrics / Indicators (to be considered later):**  
  Staff counts, ICU beds, equipment availability, program coverage

---

## Planned Workflow

1. Identify official sources and APIs for hospital, population, and other health datasets.
2. Plan how to download or scrape raw datasets.
3. Plan data cleaning and normalization steps (facility names, addresses, district codes).
4. Plan merging and transformation into master tables for analysis.
5. Plan saving processed data in `data/processed/` for visualization and portal integration.
