# Simulating a Dataset

## 1. Obesity - A Growing Problem

### Key Facts About Obesity
- Worldwide obesity has nearly tripled since 1975
- In 2016, more than 1.9 billion adults, 18 years and older, were overweight. Of these over 650 million were obese.
- 39% of adults aged 18 years and over were overweight in 2016, and 13% were obese.
- Most of the world's population live in countries where overweight and obesity kills more people than underweight.
- 41 million children under the age of 5 were overweight or obese in 2016.
- Over 340 million children and adolescents aged 5-19 were overweight or obese in 2016.
- Obesity is preventable.

Obesity is increasingly affecting humans of all ages, shapes and sizes in the modern world. This notebook takes a look into some of the risk factors for obesity and the relationships between them.

According to the Health Service Executive (HSE) in Ireland [1], and irishhealth.com [2], there are a number of classifications, based on Body Mass Index (BMI), ranging from Healthy to Class 3 Obese, as seen in the table below:

In [43]:
import pandas as pd

d = {'BMI (kg/m2)' : ['18-24.9', '25-29.9', '30-34.9', '35-39.9', '>40'],'Category' : ['Normal', 'Overweight', 'Class 1 Obesity', 'Class 2 Obesity', 'Class 3 Obesity'],'Risk of Illness and Early Death' : ['Normal', 'Mildly Increased', 'Moderately Raised', 'Severely Raised', 'Very Severely Raised']}
df = pd.DataFrame(data = d, columns=['BMI (kg/m2)','Category','Risk of Illness and Early Death'])
df.index = df.index+1
df

ModuleNotFoundError: No module named 'seaborn'

When thinking about the risk factors that impact obesity levels, I firstly wanted to see if gender there was any evidence that one gender was particularly affected over another.
The dataset used below was downloaded from [3] and contains Gender, Height (cm), Weight (kg) and BMI Index Category (as per above table index) for 500 people.

Although BMI is a crude calculation for an indicator of obesity, that cannot be relied upon as accurate in all circumstances [4], for the purposes of this exercise, it will suffice in order to allow the simulation of a dataset.

In [41]:
# Dataset is stored in csv form and was downloaded [3] and saved locally before importing
# First read the csv data using Pandas
gend_bmi_data = pd.read_csv('500_Person_Gender_Height_Weight_Index.csv', header=None, skiprows=1, names=['Gender','Height_cm','Weight_kg', 'BMI_Category'])

# Create Pandas dataframe to hold data for investigation
# dataframe automatically assigned an index column
df_gend_bmi = pd.DataFrame(data=gend_bmi_data)
df_gend_bmi.index = df_gend_bmi.index+1
# Print the data to notebook
# Summary information at the end shows there are 500 records, each with 4 columns of data (not including the auto-generated index column)
df_gend_bmi

Unnamed: 0,Gender,Height_cm,Weight_kg,BMI_Category
1,Male,174,96,4
2,Male,189,87,2
3,Female,185,110,4
4,Female,195,104,3
5,Male,149,61,3
6,Male,189,104,3
7,Male,147,92,5
8,Male,154,111,5
9,Male,174,90,3
10,Female,169,103,4


From a quick glance at the data above, row 493 contains an Index value of In order to identify if there is a bias towards one gender or another in the data, I will:

1. find what proportion are male/female in the entire dataset
2. identify what proportion of each gender are obese (Index values {3, 4, 5})
3. Compare these proportions to identify if a significant difference exists between genders

In [42]:
df_gend_bmi = df_gend_bmi[df_gend_bmi.BMI_Category != 0]
df_gend_bmi.count()

Gender          487
Height_cm       487
Weight_kg       487
BMI_Category    487
dtype: int64

In [15]:
df_male_bmi = df_gend_bmi.loc[df_gend_bmi['Gender'] == 'Male']
df_male_bmi.count()

Gender    239
Height    239
Weight    239
Index     239
dtype: int64

In [16]:
df_female_bmi = df_gend_bmi.loc[df_gend_bmi['Gender'] == 'Female']
df_female_bmi.count()

Gender    248
Height    248
Weight    248
Index     248
dtype: int64

## 2. Variables to be Considered

### Risk Factors


### Prediction Variable
Body Mass Index (BMI)



## 3. Data Synthesis

## 4. References

### 1. http://www.irishhealth.com/calc/bmi01.html
### 2. http://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
### 3. https://www.kaggle.com/yersever/500-person-gender-height-weight-bodymassindex#500_Person_Gender_Height_Weight_Index.csv
### 4. https://www.nhs.uk/live-well/healthy-weight/bmi-calculator/#limitations-of-the-bmi
### 5. 
### 6. 
### 7. 
### 8. 
### 9. 