# Health Stats Part 1: Waist 2 Hip Ratios

<!--- Write an explanation of the Waist To Hips Ratio statistic used by health professionals. Please include an explanation of what it is used for, exactly how it is calculated, and how to interpret the results. Note: Formmatting matters. Make this as professional as you can using Markdown.  --->

<!--- feel free to use any web resources, including [Wikipedia](https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio) or any other resources that you can find online. Just MAKE SURE you provide a link to every resource you decide to use. --->

<!--- Including the formula, or that fancy diagram/table you see on wikipedia is DEFINITELY a good idea! How? The LaTeX equations section in [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- For extra points, try to create a table similar to the one on the wikipedia page on your own. --->

The waist-hip ratio or waist-to-hip ratio (WHR) is the dimensionless ratio of the circumference of the waist to that of the hips. The WHR has been used as an indicator or measure of health, and the risk of developing serious health conditions. Research shows that people with "apple-shaped" bodies (more weight around the waist) face more health risks than those with "pear-shaped" bodies (more weight around the hips). This is calculated as waist measurement divided by hip measurement:

$$WHR=\frac{W}{H}$$


WHR is used as a measurement of obesity, which in turn is a possible indicator of other more serious health conditions. The WHO states that abdominal obesity is defined as a waist-hip ratio above 0.90 for males and above 0.85 for females, or a body mass index (BMI) above 30.0.[5] The National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) states that women with waist-hip ratios of more than 0.8, and men with more than 1.0, are at increased health risk because of their fat distribution. 

| &nbsp;        | DGSP      | DGSP      | WHO   | WHO   | NIDDK | NIDDK |
| ------------- | :-------: | :-------: | :---: | :---: | :---: | :---: |
| &nbsp;        | Women     | Men       | Women | Men   | Women | Men   |
| under-weight  | ?         | ?         | ?     | ?     | ?     | ?     |
| normal-weight | <0.80     | <0.90     | ?     | ?     | ?     | ?     |
| over-weight   | 0.80-0.84 | 0.90-0.99 | ?     | ?     | ?     | ?     |
| obesity       | >0.85     | >1.00     | >0.85 | >0.90 | >0.80 | >1.00 |

[Wikipedia](https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio)

## Source Data 

<!--- Replace the text below with a Markdown bullet list that defines the columns of the CSV file. Be sure to indicate the data type for each column. --->

<!--- Example can be: ID, unique identifier of each person, integer. Remember you need to put this into a bullet list! How? [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- These two markdown cells are required in almost any analytical report. --->

* ID:
    * Unique identifier of each person
    * Integer
* Waist:
    * Waist circumference in inches
    * Integer
* Hip:
    * Hip circumference in inches
    * Integer
* Gender:
    * Gender of each person
    * String - either "M" for male or "F" for female

## Data Import

For whatever type of analysis, we need to read in the data. 

This is the basic way how Python read-in data. 

For more information regarding this part, read Chapter 7 in your PY4E textbook.

In [1]:
# Goal: Extract the data from the file

# opens the w2h_data.csv for reading
f = open("w2h_data.csv", "r")

# loads the file into a list of strings, one string per line
raw_lines = list(f)

# closes the file
f.close()

Data are not useful when they are in the wrong data type, or have wrong values, missing values... 

Clean up your data is an important step in any analysis.

In [2]:
# Goal: Scrub and convert the data, loading it into a new list called rows

# Strips out newline '\n' characters and converts to a list
raw_rows = [r.rstrip('\n').split(',') for r in raw_lines] # <--- Whoa. Why does this work? 

# Creates a new list `rows`, starting with just the column names
rows = list() 
rows.append(raw_rows[0]);

# Convert each `raw_row`, starting with the second
for raw_row in raw_rows[1:]:
    
    # Note: the values in the `raw_row` list are all strings.
    # Create a new list called `row` that converts each item in `raw_row` to the right data type  
    row = [int(raw_row[0]),int(raw_row[1]),int(raw_row[2]),str(raw_row[3])] # FIX THIS - make column 1 an int, column 2 an int, column 3 a str; 
    # you'll need to use conversion functions above
    # Append the new `row` to the `rows` list
    rows.append(row)
    
# from here on out use the `rows` list instead of `raw_rows` or `raw_lines`
# You may want to print out `rows` to test whether your code above worked
# print(rows)

## Calculations

Sometimes, the data given to you do not contain the values you need directly, you will need to calculate them somehow. 

In this part, you calculate two new features namely `W2H Ratio` and `Shape`.

In [3]:
# Goal: For each row of data calculate and store the w2h_ratio and shape.

# Adds columns for the two new features
rows[0].extend(["W2H Ratio","Shape"])

# For each row in the rows list, calculate the waist to hips ratio and shape
for row in rows[1:]:
    # Calculate the w2h_ratio 
    w2h_ratio = row[1]/row[2] # FIX THIS; you will need to take care about data types - add WHR equation
    
    # Based on the ratio and the gender, set the variable shape to either 'Apple' or 'Pear'
    # 'Apple' shaped is when women's w2h_ratio > 0.80 or men's w2h_ratio > 0.90
    # 'Pear' shaped is when women's w2h_ratio <= 0.80 or men's w2h_ratio <= 0.90
    
    if row[3] == "F" : # FIX THIS; you will need to use a conditional
        if w2h_ratio > 0.80 : 
            shape = 'Apple'
        else :
            shape = 'Pear'
    elif row[3] == "M" :
        if w2h_ratio > 0.90 : 
            shape = 'Apple'
        else :
            shape = 'Pear'
    else:
        print('Gender data error, please check data source')
    
    # Add the new data to the end of the row
    row += [w2h_ratio, shape] # note: += is shorthand for the extend method used above
    
# You may want to print out `rows` to test whether your code above worked
# print(rows)   
    

## Output

In your analysis report, it is always helpful to display your data somehow.

This is a very rudimentary way to displaying your data, including the original features and the new features you just calculated.

In [4]:
# Goal: pretty print the rows as an HTML table

# Note: this works, but we can do this much better with pandas
html_table = '<table><tr><th>'
html_table += "</th><th>".join(rows[0])
html_table += '</th></tr>'
for row in rows[1:]:
    html_table += "<tr><td>"
    html_table += "</td><td>".join(str(col) for col in row)
    html_table += "</td></tr>"
html_table += "</table>"

from IPython.display import HTML, display
display(HTML(html_table))

ID,Waist,Hip,Gender,W2H Ratio,Shape
1,30,32,M,0.9375,Apple
2,32,37,M,0.8648648648648649,Pear
3,30,36,M,0.8333333333333334,Pear
4,33,39,M,0.8461538461538461,Pear
5,29,33,M,0.8787878787878788,Pear
6,32,38,M,0.8421052631578947,Pear
7,33,42,M,0.7857142857142857,Pear
8,30,40,M,0.75,Pear
9,30,37,M,0.8108108108108109,Pear
10,32,39,M,0.8205128205128205,Pear


# 10. __Discussion Questions__
  * How long did it take you to figure out how to do a bullet list in Markdown? What other formatting tricks did you try? - Bullet list did not take long at all. Figuring out how to make the table took much longer
  * Was there any code that you thought was particularly elegant? How about cryptic or buggy? - The code for the HTML table at the end was difficult to follow 
  * What does the code `raw_lines = list(f)` in the first code cell do exactly? Where can we learn more about loading files? Why do we bother closing the file at the end of the cell? - That code loads the raw csv file into a python list. We close the file to save memory and processor speed
  * In the second code cell, why do we try to clean up the data all at once? Why not just deal with it as raw strings? - We need to ensure that the waist and hip data are in integer form so they can be used to calculate the WHR later in the program
  * What is going on in the line below, also from the second code cell?  
  ```raw_rows = [r.rstrip('\n').split(',') for r in raw_lines]``` - A copy of the raw_lines strings are being made in which all default whitespace characters and commas have been stripped from the end of the string
  * What does this do?  
  ```for raw_row in raw_rows[1:]:``` - Iterates through each row of raw_rows
  * In the third code cell, a list is extended by another list. What does that mean and how is that different from appending list items to the list? How could we do the same thing using `append()`? - Append adds the additional data as a single element at the end of the list. Since we need to add more than one element we must use extend
  * When might the calculation 
  ```w2h_ratio = row[1]/row[2]``` give inaccurate results? - If the csv file format changes, such that columns 1 and 2 are no longer waist and hip, repectively