In [1]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier  # or your chosen model
import pickle


# Preprocess data using `index.ipynb` steps

In [29]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier  # or your chosen model
import pickle

# Load dataset
df_preprocess = pd.read_csv('anemia_dataset.csv')

# Dropping rows without target variable
df_preprocess = df_preprocess.dropna(subset=['Anemia level'])

# Separate numerical columns (int and float types)
numeric_cols = df_preprocess.select_dtypes(include=['int64', 'float64']).copy()
numeric_cols.drop(['Hemoglobin level adjusted for altitude (g/dl - 1 decimal)'], axis=1, inplace=True)

# Rename numerical columns for consistency
numeric_cols.rename(columns={
    'Births in last five years':'Births_last_5y',
    'Age of respondent at 1st birth':'Age_first_birth',
    'Hemoglobin level adjusted for altitude and smoking (g/dl - 1 decimal)':'Hemoglobin_level'
}, inplace=True)

# Separate categorical columns
categorical_cols = df_preprocess.select_dtypes(include=['object', 'category']).copy()
categorical_cols.drop(['When child put to breast', 'Anemia level.1', 'Smokes cigarettes'], axis=1, inplace=True)

# Replace "Don't know" with "unknown" and fill missing values
categorical_cols['Had fever in last two weeks'].replace("Don't know", 'unknown', inplace=True)
categorical_cols['Taking iron pills, sprinkles or syrup'].replace("Don't know", 'unknown', inplace=True)
columns_to_fill = ['Taking iron pills, sprinkles or syrup', 'Had fever in last two weeks', 'Currently residing with husband/partner']
categorical_cols[columns_to_fill] = categorical_cols[columns_to_fill].fillna('unknown')

# Rename categorical columns
categorical_cols.rename(columns={
    'Have mosquito bed net for sleeping (from household questionnaire)': 'Mosquito_net',
    'Highest educational level': 'Education_level',
    'Wealth index combined': 'Wealth',
    'Currently residing with husband/partner': 'Living_with_spouse',
    'Type of place of residence': 'Area_Type',
    'Taking iron pills, sprinkles or syrup': 'Taking_meds',
    'Age in 5-year groups': 'Age_group',
    'Anemia level': 'Anemia_level',
    'Had fever in last two weeks': 'Had_fever',
    'Current marital status': 'Marital_status'
}, inplace=True)

# Split the target column
y = categorical_cols['Anemia_level']
categorical_cols = categorical_cols.drop(['Anemia_level'], axis=1)

# Concatenate cleaned numerical and categorical columns
combined_df = pd.concat([numeric_cols, categorical_cols], axis=1)

# Drop rows with missing values in combined data
combined_df.dropna(inplace=True)

# Encode categorical features using OrdinalEncoder
encoder = OrdinalEncoder()
encoded_categorical = encoder.fit_transform(categorical_cols)



# Replace the categorical columns with their encoded versions
encoded_df = pd.DataFrame(encoded_categorical, columns=categorical_cols.columns)
combined_df = pd.concat([numeric_cols.reset_index(drop=True), encoded_df], axis=1)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(combined_df, y, test_size=0.2, random_state=42)

# Define the preprocessing pipeline for scaling
numeric_transformer = StandardScaler()
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_cols.columns)
    ]
)

# Create the model pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

# Fit the pipeline on training data
pipeline.fit(X_train)

# Train the model (RandomForest example)
model = RandomForestClassifier(random_state=42)
model.fit(pipeline.transform(X_train), y_train)

# Save the encoder, pipeline and the model
with open('anemia_model.pkl', 'wb') as f:
    pickle.dump((encoder,pipeline, model), f)


# MODEL DEPLOYMENT

## Overview
This project aims to build a web application for predicting anemia levels in children. The application provides a user-friendly interface for users to input relevant data and receive predictions along with tailored recommendations.

## Procedure

### 1. **Pickle the encoder, pipeline and model**
- The Random Forest Model was trained and pickled together with the encoder object and the preprocessing pipeline to ensure it could be easily loaded for predictions within the web application.

### 2. **Download Dependencies**
- Flask was installed as the web framework to build the web application, allowing for the creation of routes, handling of form data, and serving of HTML templates.

### 3. **Organize Project Folder**
- The project directory was structured to maintain organization and clarity. The setup included creating separate directories for the Flask app, the pickled model, HTML templates, CSS styles, and image assets.

#### - Structure Overview
     ```
     Anemia-Level-Prediction-in-Children/
     ├── app.py             # Flask app
     ├── anemia_model.pkl   # Pickled pipeline, encoder and model
     ├── templates/
     │   ├── index.html     # HTML template for home page
     │   ├── general_info.html  # HTML template for general information page
     │   ├── factors.html   # HTML template for EDA insights
     │   ├── prediction.html  # HTML template for prediction form
     │   └── result.html    # HTML template for result page
     └── static/
         ├── css/
         │   └── styles.css # CSS styles
         └── images/
             ├── background.html  # Image file
             └── anemic_child.jpg # Image file

     ```

### 4. **Set Up the Flask App**
- In `app.py`, the Flask application was set up to manage routes, handle form data, and load the machine learning model for predictions. The main routes were defined as follows:
  - **Home Route (`/`)**: This route renders the `index.html` template, which welcomes users and acts as the homepage.
  - **General Information Route (`/general_info`)**: This route loads the `general_info.html` template, providing background information on anemia.
  - **Factors Route (`/factors`)**: This route renders the `factors.html` template, displaying key insights and visualizations from the exploratory data analysis (EDA).
  - **Prediction Route (`/predict`)**: This route processes the user input data, makes predictions with the model, and displays the results on `prediction.html` alongside tailored recommendations.

### 5. **Create HTML Templates**
- In the `templates/` directory, several HTML files were set up to create different pages for the web application:
  - **`index.html`**: This home page welcomes users and provides a form for entering relevant features such as age and hemoglobin level.
  - **`general_info.html`**: This page provides educational content and general information about anemia.
  - **`factors.html`**: This page displays various insights and visualizations from the EDA phase.
  - **`prediction.html`**: This page contains a form where users can input details required for prediction.
  - **`result.html`**: This page displays the anemia prediction result based on user input along with tailored recommendations.

### 6. **Design the User Interface (UI)**
- Using CSS, HTML elements were styled to create a user-friendly and visually appealing interface. Specific attention was given to layout, typography, and images to enhance the visual appeal and usability of the app.

- In `static/css/styles.css`, styling for layout, typography, and images was added to improve the visual appeal of the application:
  - **Background and Header Images**: `background.jpg` and `anemic_child.jpg` were integrated as background images for the body and header sections, respectively.
  - **Styling**: Flexbox was used for the layout structure. Buttons were made interactive, and hover effects were added to buttons and links to provide a professional and engaging look.
