# Description

This dataset contains synthetic data designed for predicting age based on various health and lifestyle factors. It includes 3,000 rows with 24 features, each representing different aspects of physical health and lifestyle. 

## Features:

* Height (cm): The height of the individual in centimeters.
* Weight (kg): The weight of the individual in kilograms.
* Blood Pressure (s/d): Blood pressure (systolic/diastolic) in mmHg.
* Cholesterol Level (mg/dL): Cholesterol level in milligrams per deciliter.
* BMI: Body Mass Index, calculated from height and weight.
* Blood Glucose Level (mg/dL): Blood glucose level in milligrams per deciliter.
* Bone Density (g/cm²): Bone density in grams per square centimeter.
* Vision Sharpness: Vision sharpness on a scale from 0 (blurry) to 100 (perfect).
* Hearing Ability (dB): Hearing ability in decibels.
* Physical Activity Level: Categorized as 'Low', 'Moderate', or 'High'.
* Smoking Status: Categorical values including 'Never', 'Former', and 'Current'.
* Alcohol Consumption: Frequency of alcohol consumption.
* Diet: Type of diet, categorized as 'Balanced', 'High Protein', 'Low Carb', etc.
* Chronic Diseases: Presence of chronic diseases (e.g., diabetes, hypertension).
* Medication Use: Usage of medication.
* Family History: Presence of family history of age-related conditions.
* Cognitive Function: Self-reported cognitive function on a scale from 0 (poor) to 100 (excellent).
* Mental Health Status: Self-reported mental health status on a scale from 0 (poor) to 100 (excellent).
* Sleep Patterns: Average number of sleep hours per night.
* Stress Levels: Self-reported stress levels on a scale from 0 (low) to 100 (high).
* Pollution Exposure: Exposure to pollution measured in arbitrary units.
* Sun Exposure: Average sun exposure in hours per week.
* Education Level: Highest level of education attained.
* Income Level: Annual income in USD.
* Age (years): The target variable representing the age of the individual.

# Import Necessary Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import mean_squared_error,accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder, MinMaxScaler
from sklearn.model_selection import KFold


# To suppress warnings
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

  from pandas.core import (


# Load Datasets

In [2]:
# Load the training and testing datasets
train_df = pd.read_csv('Train.csv') 
test_df = pd.read_csv('Test.csv')  

# Data Exploration

In [3]:
# Display the first few rows and all clolumns of the training dataset
pd.set_option('display.max_columns', None)
train_df.head()

Unnamed: 0,Gender,Height (cm),Weight (kg),Blood Pressure (s/d),Cholesterol Level (mg/dL),BMI,Blood Glucose Level (mg/dL),Bone Density (g/cm²),Vision Sharpness,Hearing Ability (dB),Physical Activity Level,Smoking Status,Alcohol Consumption,Diet,Chronic Diseases,Medication Use,Family History,Cognitive Function,Mental Health Status,Sleep Patterns,Stress Levels,Pollution Exposure,Sun Exposure,Education Level,Income Level,Age (years)
0,Male,171.148359,86.185197,151/109,259.465814,29.423017,157.652848,0.132868,0.2,58.786198,Moderate,Former,,Low-carb,,,,44.059172,Good,Insomnia,2.797064,5.142344,7.108975,,Medium,89
1,Male,172.946206,79.641937,134/112,263.630292,26.626847,118.507805,0.629534,0.267312,54.63527,Low,Current,Occasional,Balanced,Hypertension,,Heart Disease,45.312298,Good,Normal,9.33993,7.27272,3.918489,Undergraduate,Medium,77
2,Female,155.945488,49.167058,160/101,207.846206,20.217553,143.58755,0.473487,0.248667,54.564632,Moderate,Never,,Balanced,Hypertension,Regular,Hypertension,56.246991,Poor,Insomnia,9.234637,8.500386,5.393408,,Medium,70
3,Female,169.078298,56.017921,133/94,253.283779,19.59527,137.448581,1.184315,0.513818,79.722963,Moderate,Never,,Balanced,Diabetes,Occasional,Hypertension,55.196092,Poor,Insomnia,4.693446,7.555511,2.745578,,Low,52
4,Female,163.758355,73.966304,170/106,236.119899,27.582078,145.328695,0.434562,0.306864,52.479469,Low,Former,Frequent,Vegetarian,,,,53.023379,Good,Normal,4.038537,9.429097,3.878435,Undergraduate,High,79


In [4]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 26 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Gender                       3000 non-null   object 
 1   Height (cm)                  3000 non-null   float64
 2   Weight (kg)                  3000 non-null   float64
 3   Blood Pressure (s/d)         3000 non-null   object 
 4   Cholesterol Level (mg/dL)    3000 non-null   float64
 5   BMI                          3000 non-null   float64
 6   Blood Glucose Level (mg/dL)  3000 non-null   float64
 7   Bone Density (g/cm²)         3000 non-null   float64
 8   Vision Sharpness             3000 non-null   float64
 9   Hearing Ability (dB)         3000 non-null   float64
 10  Physical Activity Level      3000 non-null   object 
 11  Smoking Status               3000 non-null   object 
 12  Alcohol Consumption          1799 non-null   object 
 13  Diet              

In [5]:
train_df.isnull().sum().sort_values(ascending=True)

Gender                            0
Sun Exposure                      0
Pollution Exposure                0
Stress Levels                     0
Sleep Patterns                    0
Mental Health Status              0
Cognitive Function                0
Diet                              0
Income Level                      0
Smoking Status                    0
Age (years)                       0
Hearing Ability (dB)              0
Vision Sharpness                  0
Bone Density (g/cm²)              0
Blood Glucose Level (mg/dL)       0
BMI                               0
Cholesterol Level (mg/dL)         0
Blood Pressure (s/d)              0
Weight (kg)                       0
Height (cm)                       0
Physical Activity Level           0
Education Level                 627
Medication Use                 1198
Alcohol Consumption            1201
Chronic Diseases               1299
Family History                 1451
dtype: int64

In [6]:
test_df.isnull().sum().sort_values(ascending=True)

Gender                            0
Sun Exposure                      0
Pollution Exposure                0
Stress Levels                     0
Sleep Patterns                    0
Mental Health Status              0
Cognitive Function                0
Diet                              0
Smoking Status                    0
Physical Activity Level           0
Income Level                      0
Vision Sharpness                  0
Bone Density (g/cm²)              0
Blood Glucose Level (mg/dL)       0
BMI                               0
Cholesterol Level (mg/dL)         0
Blood Pressure (s/d)              0
Weight (kg)                       0
Height (cm)                       0
Hearing Ability (dB)              0
Education Level                 627
Medication Use                 1198
Alcohol Consumption            1201
Chronic Diseases               1299
Family History                 1451
dtype: int64