# Simulating a Dataset

## 1. Obesity - A Growing Problem

### Key Facts About Obesity
- Worldwide obesity has nearly tripled since 1975
- In 2016, more than 1.9 billion adults, 18 years and older, were overweight. Of these over 650 million were obese.
- 39% of adults aged 18 years and over were overweight in 2016, and 13% were obese.
- Most of the world's population live in countries where overweight and obesity kills more people than underweight.
- 41 million children under the age of 5 were overweight or obese in 2016.
- Over 340 million children and adolescents aged 5-19 were overweight or obese in 2016.
- Obesity is preventable.

Obesity is increasingly affecting humans of all ages, shapes and sizes in the modern world. This notebook takes a look into some of the risk factors for obesity and the relationships between them.

According to the Health Service Executive (HSE) in Ireland [1], and irishhealth.com [2], there are a number of classifications, based on Body Mass Index (BMI), ranging from Healthy to Class 3 Obese, as seen in the table below:

In [2]:
import pandas as pd

d = {'BMI (kg/m2)' : ['18-24.9', '25-29.9', '30-34.9', '35-39.9', '>40'],'Category' : ['Normal', 'Overweight', 'Class 1 Obesity', 'Class 2 Obesity', 'Class 3 Obesity'],'Risk of Illness and Early Death' : ['Normal', 'Mildly Increased', 'Moderately Raised', 'Severely Raised', 'Very Severely Raised']}
df = pd.DataFrame(data = d, columns=['BMI (kg/m2)','Category','Risk of Illness and Early Death'])
df.index = df.index+1
df

Unnamed: 0,BMI (kg/m2),Category,Risk of Illness and Early Death
1,18-24.9,Normal,Normal
2,25-29.9,Overweight,Mildly Increased
3,30-34.9,Class 1 Obesity,Moderately Raised
4,35-39.9,Class 2 Obesity,Severely Raised
5,>40,Class 3 Obesity,Very Severely Raised


When thinking about the risk factors that impact obesity levels, I firstly wanted to see if gender there was any evidence that one gender was particularly affected over another.
The dataset used below was found at [3] and contains Gender, Height, Weight and BMI Index for 500 people.

In order to verify the "Index" values are correct, I will add a new column to the data 

In [5]:
# Dataset is stored in csv form and was downloaded and saved locally before importing
# First read the csv data using Pandas
gend_bmi_data = pd.read_csv('500_Person_Gender_Height_Weight_Index.csv')

# Create Pandas dataframe to hold data for investigation
# dataframe automatically assigned an index column
df_gend_bmi = pd.DataFrame(data=gend_bmi_data)
df_gend_bmi.index = df_gend_bmi.index+1
# Print the data to notebook
# Summary information at the end shows there are 500 records, each with 4 columns of data (not including the auto-generated index column)
df_gend_bmi

Unnamed: 0,Gender,Height,Weight,Index
1,Male,174,96,4
2,Male,189,87,2
3,Female,185,110,4
4,Female,195,104,3
5,Male,149,61,3
6,Male,189,104,3
7,Male,147,92,5
8,Male,154,111,5
9,Male,174,90,3
10,Female,169,103,4


In [17]:
index = df_gend_bmi.index
columns = df_gend_bmi.columns
values = df_gend_bmi.values

values

array([['Male', 174, 96, 4],
       ['Male', 189, 87, 2],
       ['Female', 185, 110, 4],
       ...,
       ['Female', 141, 136, 5],
       ['Male', 150, 95, 5],
       ['Male', 173, 131, 5]], dtype=object)

In [20]:
df_male_bmi = df_gend_bmi.loc[df_gend_bmi['Gender'] == 'Male']
df_male_bmi

Unnamed: 0,Gender,Height,Weight,Index
0,Male,174,96,4
1,Male,189,87,2
4,Male,149,61,3
5,Male,189,104,3
6,Male,147,92,5
7,Male,154,111,5
8,Male,174,90,3
10,Male,195,81,2
13,Male,155,51,2
14,Male,191,79,2


In [21]:
df_female_age = df_gend_age.loc[df_gend_age['Gender'] == 'Female']
df_female_age

Unnamed: 0,Gender,Height,Weight,Index
2,Female,185,110,4
3,Female,195,104,3
9,Female,169,103,4
11,Female,159,80,4
12,Female,192,101,3
15,Female,153,107,5
16,Female,157,110,5
21,Female,153,149,5
22,Female,169,97,4
24,Female,172,67,2


## 2. Variables to be Considered

### Risk Factors


### Prediction Variable
Body Mass Index (BMI)



## 3. Data Synthesis

## 4. References

### 1. http://www.irishhealth.com/calc/bmi01.html
### 2. http://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
### 3. https://www.kaggle.com/yersever/500-person-gender-height-weight-bodymassindex#500_Person_Gender_Height_Weight_Index.csv