# Multi-Class Prediction of Obesity Risk

Feature list:
* Gender (categorical, Female or Male)
* Age (float, years)
* Height (float, metres)
* Weight (float, kilograms)
* family_history_with_overweight (categorical, yes or no)
* FAVC = frequent consumption of high caloric food (categorical, yes or no)
* FCVC = frequency of consumption of vegetables (float, range 1 to 3)
* NCP = number of main meals (float, range 1 to 4)
* CAEC = consumption of food between meals (categorical; no, Sometimes, Frequently, or Always)
* SMOKE = smoking habit (categorical; yes or no)
* CH2O = consumption of water daily (float, range 1 to 3)
* SCC = calories consumption monitoring (categorical; yes or no)
* FAF = physical activity frequency (float, range 0 to 3)
* TUE = time using technology devices (float, range 0 to 2)
* CALC = consumption of alcohol (categorical; no, Sometimes, or Frequently)
* MTRANS = transportation used (categorical; Automobile, Motorbike, Public_Transportation, Bike, or Walking)

Target:
* NObeyesdad (categorical; Insufficient_Weight, Normal_Weight, Overweight_Level_I, Overweight_Level_II, Obesity_Type_I, Obesity_Type_II, Obesity_Type_III)
* Note that each category corresponds to a BMI range. E.g., Insufficient_Weight corresponds to BMI < 18.5, Normal_Weight corresponds to 18.5 <= BMI < 25, the two Overweight categories correspond to 25 <= BMI < 30, etc.

In [1]:
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sklearn.svm as svm
import sklearn.model_selection as model_selection
import sklearn.metrics as metrics
import sklearn.preprocessing as preprocessing

# ROOT = '/kaggle/input/playground-series-s4e2'
ROOT = 'competition_data'

In [2]:
df_train = pd.read_csv(os.path.join(ROOT, 'train.csv'))

In [3]:
set(df_train["NObeyesdad"])

{'Insufficient_Weight',
 'Normal_Weight',
 'Obesity_Type_I',
 'Obesity_Type_II',
 'Obesity_Type_III',
 'Overweight_Level_I',
 'Overweight_Level_II'}

In [18]:
X_raw = df_train.drop(columns=["NObeyesdad"])
y = df_train["NObeyesdad"]

X_id = X_raw[["id"]]

cat_cols = ["Gender", "family_history_with_overweight", "FAVC", "CAEC", "SMOKE", "SCC", "CALC", "MTRANS"]
feature_cats = [["Female", "Male"],
                ["yes", "no"],
                ["yes", "no"],
                ["no", "Sometimes", "Frequently", "Always"],
                ["yes", "no"],
                ["yes", "no"],
                ["no", "Sometimes", "Frequently", "Always"],
                ["Automobile", "Motorbike", "Public_Transportation", "Bike", "Walking"]]
enc = preprocessing.OrdinalEncoder(categories=feature_cats)
X_encoded = enc.fit_transform(X_raw[cat_cols])
X_cat = pd.DataFrame(X_encoded, X_raw.index, cat_cols)

norm_cols = ["Age", "Height", "Weight", "FCVC", "NCP", "CH2O", "FAF", "TUE"]
scaler = preprocessing.StandardScaler()
X_norm = scaler.fit_transform(X_raw[norm_cols])
X_norm = pd.DataFrame(X_norm, X_raw.index, norm_cols)

X = pd.concat([X_id, X_norm, X_cat], axis=1)

In [19]:
X.head()

Unnamed: 0,id,Age,Height,Weight,FCVC,NCP,CH2O,FAF,TUE,Gender,family_history_with_overweight,FAVC,CAEC,SMOKE,SCC,CALC,MTRANS
0,0,0.105699,-0.002828,-0.235713,-0.836279,0.314684,1.206594,-1.171141,0.597438,1.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0
1,1,-1.027052,-1.606291,-1.170931,-0.836279,0.338364,-0.048349,0.021775,0.636513,0.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0
2,2,-1.027052,0.128451,-1.430012,-1.060332,-1.913423,-0.195644,-0.138022,1.755239,0.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0
3,3,-0.507929,0.12009,1.64477,1.039171,0.338364,-0.584035,0.579896,0.271455,0.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0
4,4,1.371197,2.450367,0.224054,0.438397,-1.119801,-0.081469,1.176486,0.523111,1.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0
