## Section I. Introduction
The dataset chosen for this study is the [***"Pediatric Anemia Dataset: Hematological Indicators and Diagnostic Classification"***](https://data.mendeley.com/datasets/y7v7ff3wpj/1). This dataset contains hematological parameters used to support the diagnosis of anemia among patients.

Anemia is a medical condition characterized by a deficiency in healthy red blood cells or hemoglobin, which reduces the blood’s capacity to transport oxygen to body tissues. This condition remains a significant public health concern, particularly in tropical and subtropical regions. Early detection and appropriate treatment using hematological indicators such as hemoglobin level and red blood cell count are essential in addressing this condition.

With this, the goal of this study is to predict the clinical diagnostic outcome of anemia using demographic and hematological parameters. Therefore, the task is a binary classification task, where the model aims to classify patients into either: anemic or non-anemic. 

## Section II. Description of the Dataset
The dataset used in this study was obtained from the publicly available anemia clinical dataset published on [Mendeley Data](https://data.mendeley.com/datasets/y7v7ff3wpj/1). The data were collected from anemia patients in Aalok Healthcare Ltd., located in Dhaka, Bangladesh on October 9, 2023. 

Each row in the dataset represents a patients's record, and each corresponding column represent a specific attribute. The dataset consists of **1000 observations** and **8 features**, with an additional target variable (Decision_Class). 

The following are the description of each feature of the dataset:

- **`sex`**: biological sex of the patient; `m` for male or `f` for female
- **`age`**: age of the patient (years)
- **`hemoglobin (Hb)`**: measure of the blood's capacity to carry oxygen (g/dL)
- **`red blood cell count (RBC)`**: number of red blood cells per unit volume (million/μL)
- **`packed cell volume (PCV)`**: percentage of red blood cells in blood volume
- **`mean corpuscular volume (MCV)`**: average size of red blood cells (fL)
- **`mean corpuscular hemoglobin (MCH)`**: average hemoglobin content per red blood cell (pg/cell)
- **`mean corpuscular hemoglobin concentration (MCHC)`**: concentration of hemoglobin in red blood cells (g/dL)
- **`Decision_Class`**: binary indicator for the diagnostic outcome (0, 1)

## Import

In [None]:
import numpy as np
import pandas as pd
import csv

## Section III. Data Preparation

In [None]:
df = pd.read_csv('anemia.csv')

df.head(10)

In [None]:
gender = df['Gender']
age = df['Age']
hb = df['Hb']
rbc = df['RBC']
pcv = df['PCV']
mcv = df['MCV']
mch = df['MCH']
mchc = df['MCHC']
decision_class = df['Decision_Class']

Binary Mapping for Gender

In [None]:
gender_scale = {'f': 0, 'm': 1}
gender = gender.map(gender_scale)

print(gender)

## Section IV. Exploratory Data Analysis (EDA)