# ANAEMIA LEVEL PREDICTION IN CHILDREN

# Business Understanding

This project addresses childhood anemia, a significant public health concern that can adversely affect cognitive and physical development in children. By analyzing a comprehensive dataset that captures key demographic, health, and socio-economic variables—such as age groups (15-49), place of residence (urban vs. rural), education levels (ranging from no education to higher education), and wealth indices (from poorest to richest)—the project aims to identify and mitigate the causes of anemia in diverse communities. The dataset includes vital health indicators, such as hemoglobin levels (20 to 218 g/dL) and anemia classifications (mild, moderate, severe, not anemic), as well as factors like recent health issues (fevers), usage of iron supplements, and living conditions (e.g., access to mosquito bed nets).

This project is essential for developing targeted interventions tailored to high-risk groups, thereby optimizing resource allocation. It will raise awareness about the importance of proper nutrition and iron intake, empowering families with the knowledge needed to prevent anemia. Community engagement will be a cornerstone of the initiative, fostering a collaborative approach to health and encouraging families to support each other in adopting healthier practices. Ultimately, the goal is to decrease anemia rates among children, leading to better health outcomes, improved school performance, and enhanced quality of life.

Key stakeholders include families and caregivers, healthcare providers (such as pediatricians and community health workers), local government and health departments, educational institutions, non-governmental organizations (NGOs), and research and academic institutions. By engaging these stakeholders, the project aims to create a comprehensive approach to combat childhood anemia, ensuring sustainable improvements in child health and nutrition.

## Problem Statement

Childhood anemia is a critical public health issue, leading to adverse impacts on cognitive and physical development. This project aims to build a predictive classification model to determine anemia status in children based on a comprehensive dataset encompassing demographic, health, and socio-economic factors. By identifying high-risk groups and predicting anemia likelihood, this model will support targeted interventions, optimize resource allocation, and guide public health policies. The ultimate objective is to reduce anemia rates in children, leading to improved health, educational outcomes, and quality of life.

## Objectives

- Develop an Anemia Prediction Model
Create a classification model to predict whether a child has anemia using a combination of demographic, health, and socio-economic factors. By building a model that reliably predicts anemia status, the project aims to enable early detection and facilitate timely, targeted intervention for children most at risk.

- Classify and Monitor Anemia Severity Levels
Use the model to classify anemia cases into severity levels (mild, moderate, severe) based on hemoglobin levels. By tracking these severity levels, the project aims to provide insights that can guide resource prioritization and enable healthcare providers to tailor interventions according to the specific needs of each severity group.


- Identify Key Risk Factors for Anemia
Conduct a thorough analysis of predictive factors such as hemoglobin levels, nutritional intake, caregiver health practices, and environmental conditions. Identifying and understanding these key risk factors will allow for focused prevention efforts, addressing the root causes of childhood anemia across different demographics.

- Guide Targeted Public Health Interventions
Leverage insights from the model to guide and prioritize public health interventions for high-risk groups, particularly in resource-limited areas. This objective aims to support data-driven, targeted health campaigns, focusing on nutrition, supplement distribution, and anemia education to reduce childhood anemia rates sustainably.

## Data Limitations

- Cross-sectional Nature of Data
The dataset is cross-sectional, capturing information from a single point in time. This restricts the ability to infer causation between variables and anemia levels, permitting only association-based insights.

- Potential Reporting Bias
Variables such as the wealth index and anemia status are based on estimated data, which may introduce reporting biases and potential inaccuracies in representing actual conditions.

- Granularity of Anemia Measurements
Hemoglobin levels are adjusted for altitude and smoking, yet other unmeasured health factors that might independently affect hemoglobin levels are not accounted for, potentially impacting the precision of anemia classifications.

- Missing Values and Data Quality
Some columns contain missing data or inconsistencies (e.g., missing hemoglobin levels and anemia status). Such gaps may necessitate imputation or exclusion, potentially affecting the robustness and generalizability of findings.

- Limited Representation of Seasonal Factors
Anemia prevalence can vary with seasonal factors (e.g., malaria rates). Since the dataset does not include temporal data to capture such fluctuations, any seasonal influences on anemia are not accounted for in this analysis.

# Data Understanding

In [1]:
import pandas as pd
import numpy as np

In [3]:
data = pd.read_csv("C:/Users/hp/Documents/Anemia-Level-Prediction-in-Children/anemia_dataset.csv")
data

Unnamed: 0,Age in 5-year groups,Type of place of residence,Highest educational level,Wealth index combined,Births in last five years,Age of respondent at 1st birth,Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal),Anemia level,Have mosquito bed net for sleeping (from household questionnaire),Smokes cigarettes,Current marital status,Currently residing with husband/partner,When child put to breast,Had fever in last two weeks,Hemoglobin level adjusted for altitude (g/dl - 1 decimal),Anemia level.1,"Taking iron pills, sprinkles or syrup"
0,40-44,Urban,Higher,Richest,1,22,,,Yes,No,Living with partner,Staying elsewhere,Immediately,No,,,Yes
1,35-39,Urban,Higher,Richest,1,28,,,Yes,No,Married,Living with her,Hours: 1,No,,,No
2,25-29,Urban,Higher,Richest,1,26,,,No,No,Married,Living with her,Immediately,No,,,No
3,25-29,Urban,Secondary,Richest,1,25,95.0,Moderate,Yes,No,Married,Living with her,105,No,114.0,Not anemic,No
4,20-24,Urban,Secondary,Richest,1,21,,,Yes,No,No longer living together/separated,,Immediately,No,,,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33919,35-39,Rural,Secondary,Richer,2,19,120.0,Not anemic,Yes,No,Married,Living with her,,No,120.0,Not anemic,Yes
33920,25-29,Rural,No education,Richer,1,27,120.0,Not anemic,Yes,No,Never in union,,Hours: 1,No,120.0,Not anemic,No
33921,25-29,Rural,Higher,Richer,1,22,149.0,Not anemic,Yes,No,Married,Living with her,Hours: 1,No,119.0,Not anemic,No
33922,20-24,Rural,Secondary,Richer,1,21,123.0,Not anemic,Yes,No,Married,Living with her,Immediately,No,75.0,Moderate,Yes


In [4]:
data.head()

Unnamed: 0,Age in 5-year groups,Type of place of residence,Highest educational level,Wealth index combined,Births in last five years,Age of respondent at 1st birth,Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal),Anemia level,Have mosquito bed net for sleeping (from household questionnaire),Smokes cigarettes,Current marital status,Currently residing with husband/partner,When child put to breast,Had fever in last two weeks,Hemoglobin level adjusted for altitude (g/dl - 1 decimal),Anemia level.1,"Taking iron pills, sprinkles or syrup"
0,40-44,Urban,Higher,Richest,1,22,,,Yes,No,Living with partner,Staying elsewhere,Immediately,No,,,Yes
1,35-39,Urban,Higher,Richest,1,28,,,Yes,No,Married,Living with her,Hours: 1,No,,,No
2,25-29,Urban,Higher,Richest,1,26,,,No,No,Married,Living with her,Immediately,No,,,No
3,25-29,Urban,Secondary,Richest,1,25,95.0,Moderate,Yes,No,Married,Living with her,105,No,114.0,Not anemic,No
4,20-24,Urban,Secondary,Richest,1,21,,,Yes,No,No longer living together/separated,,Immediately,No,,,No


In [5]:
data.tail()

Unnamed: 0,Age in 5-year groups,Type of place of residence,Highest educational level,Wealth index combined,Births in last five years,Age of respondent at 1st birth,Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal),Anemia level,Have mosquito bed net for sleeping (from household questionnaire),Smokes cigarettes,Current marital status,Currently residing with husband/partner,When child put to breast,Had fever in last two weeks,Hemoglobin level adjusted for altitude (g/dl - 1 decimal),Anemia level.1,"Taking iron pills, sprinkles or syrup"
33919,35-39,Rural,Secondary,Richer,2,19,120.0,Not anemic,Yes,No,Married,Living with her,,No,120.0,Not anemic,Yes
33920,25-29,Rural,No education,Richer,1,27,120.0,Not anemic,Yes,No,Never in union,,Hours: 1,No,120.0,Not anemic,No
33921,25-29,Rural,Higher,Richer,1,22,149.0,Not anemic,Yes,No,Married,Living with her,Hours: 1,No,119.0,Not anemic,No
33922,20-24,Rural,Secondary,Richer,1,21,123.0,Not anemic,Yes,No,Married,Living with her,Immediately,No,75.0,Moderate,Yes
33923,40-44,Rural,Secondary,Richest,1,35,,,No,No,Married,Living with her,Immediately,,,,


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33924 entries, 0 to 33923
Data columns (total 17 columns):
 #   Column                                                                 Non-Null Count  Dtype  
---  ------                                                                 --------------  -----  
 0   Age in 5-year groups                                                   33924 non-null  object 
 1   Type of place of residence                                             33924 non-null  object 
 2   Highest educational level                                              33924 non-null  object 
 3   Wealth index combined                                                  33924 non-null  object 
 4   Births in last five years                                              33924 non-null  int64  
 5   Age of respondent at 1st birth                                         33924 non-null  int64  
 6   Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal)  13136 non-null 

In [8]:
data.columns

Index(['Age in 5-year groups', 'Type of place of residence',
       'Highest educational level', 'Wealth index combined',
       'Births in last five years', 'Age of respondent at 1st birth',
       'Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal)',
       'Anemia level',
       'Have mosquito bed net for sleeping (from household questionnaire)',
       'Smokes cigarettes', 'Current marital status',
       'Currently residing with husband/partner', 'When child put to breast',
       'Had fever in last two weeks',
       'Hemoglobin level adjusted for altitude (g/dl - 1 decimal)',
       'Anemia level.1', 'Taking iron pills, sprinkles or syrup'],
      dtype='object')

In [10]:
data.shape

(33924, 17)

In [13]:
data.describe()

Unnamed: 0,Births in last five years,Age of respondent at 1st birth,Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal),Hemoglobin level adjusted for altitude (g/dl - 1 decimal)
count,33924.0,33924.0,13136.0,10182.0
mean,1.823783,19.570776,114.367235,101.270183
std,0.70546,4.313172,15.915408,15.569583
min,1.0,12.0,20.0,29.0
25%,1.0,16.0,105.0,92.0
50%,2.0,19.0,115.0,103.0
75%,2.0,22.0,125.0,112.0
max,6.0,48.0,218.0,170.0
