# 🧾 Project Report: Predictive Modeling of Malaria Incidence Using Machine Learning

---

## 📌 Title
**Predicting Malaria Incidence in East Africa Using Machine Learning Algorithms: A Comparative Analysis of Linear Regression, Decision Tree, and Random Forest Models**

---

## 🧠 Introduction
Malaria remains a major public health burden across East Africa, particularly in Kenya, Uganda, and Tanzania. These countries continue to experience significant morbidity and mortality due to the disease, especially among children and pregnant women. Accurate and timely prediction of malaria incidence can help governments and health organizations take proactive measures to prevent outbreaks and optimize resource allocation.

This project leverages machine learning techniques to forecast malaria incidence based on historical and environmental data from the three countries. The comparative study evaluates the performance of Linear Regression, Decision Tree, and Random Forest models to determine which best captures malaria patterns across the region.

---

## 🎯 Objectives
- To build and compare predictive models of malaria incidence in Kenya, Uganda, and Tanzania using historical and climate-related data.
- To identify the most accurate machine learning model for forecasting malaria trends.
- To visualize model predictions over time and assess performance.
- To recommend a model for deployment in public health decision-making systems.

---

## ❗ Problem Statement
Malaria prediction models in East Africa are often limited in scope and accuracy. Health officials face challenges in anticipating outbreaks, which leads to delayed responses. There is a critical need for machine learning models that can accurately forecast malaria incidence across different East African countries using environmental and temporal data.

---

## 🛠️ Project Description
This study uses three supervised machine learning regression models:
- **Linear Regression**
- **Decision Tree Regressor**
- **Random Forest Regressor**

These models are trained to predict malaria incidence using historical data from Kenya, Uganda, and Tanzania, with features such as year, rainfall, temperature, and humidity.

Performance metrics include:
- **Mean Squared Error (MSE)** – to assess accuracy
- **R² Score** – to evaluate variance explained
- **Prediction trends vs actuals over time** – for visual performance evaluation

---

## 📊 Data Description
- **Countries included**: Kenya, Uganda, Tanzania
- **Features**: Year, Rainfall, Temperature, Humidity, and other environmental factors.
- **Target variable**: Reported malaria incidence (cases per year).
- **Data source**: Public health records and climate datasets for the three countries.
- **Scope**: Multi-year data covering trends across East Africa.

---

## 📈 Exploratory Data Analysis (EDA)
- Checked for missing values, outliers, and inconsistencies.
- Plotted distributions for all numerical variables.
- Used correlation heatmaps to identify strong predictors of malaria incidence.
- Grouped and visualized data by country to understand national trends.
- Time series plots highlighted peak seasons and decline periods in malaria cases.

---

## 📊 Visualizations
1. **Histograms and Boxplots** – for Rainfall, Temperature, and Malaria Incidence.
2. **Country-wise Trend Charts** – to observe variations between Kenya, Uganda, and Tanzania.
3. **Correlation Matrix** – showing relationships between variables.
4. **Model Performance Bar Chart** – comparing MSE and R² scores.
5. **Prediction vs Actual Line Plots** – illustrating how each model tracks malaria incidence over the years.

---

## 🔍 Insights and Recommendation

### ✅ Model Accuracy:
- **Random Forest** performed best in terms of:
  - Lowest error (MSE)
  - Highest explained variance (R²)
  - Closest alignment with actual malaria trends in all three countries
- **Decision Tree** captured trends but showed signs of overfitting.
- **Linear Regression** lacked the ability to capture non-linear climate-disease interactions.

### 🌍 Country-Specific Observations:
- Uganda showed stronger seasonal peaks in malaria compared to Kenya and Tanzania.
- Rainfall and temperature strongly correlated with malaria incidence in all three countries.
- Data irregularities were minimal, indicating reliable reporting systems.

### 💡 Recommendations:
- **Random Forest** should be deployed as the preferred model across East Africa.
- Integrate the model into a web-based app using **Streamlit** for accessibility.
- Use the model outputs for planning public health campaigns and early warning systems.

---

## ✅ Conclusion
This project successfully developed and compared machine learning models to predict malaria incidence across Kenya, Uganda, and Tanzania. Among the models tested, Random Forest showed the best predictive power and trend-tracking capabilities. These insights can inform regional efforts in malaria control and support real-time decision-making through deployment in a user-friendly app.
