In [1]:
import pandas as pd
import requests

In [2]:
df = pd.read_csv("../openpowerlifting-2019-10-24/openpowerlifting-2019-10-24.csv", low_memory=False)

In [3]:
def get_gender_ratios(df, year):
    return df.query('Date>"{}-01-01" & Date<"{}-01-01"'.format(year, year+1)).loc[:,['Sex']].squeeze().value_counts() \
            / df.query('Date>"{}-01-01" & Date<"{}-01-01"'.format(year, year+1)).loc[:,['Sex']].shape[0]

### indexing with `[]`

You can set the columns of a DataFrame with the `[]` indexer, which is for column indexing when using DataFrames.

We can set a **column** of a DataFrame with the following.

In [4]:
df_ratio = pd.DataFrame()
df_ratio[1964] = get_gender_ratios(df, 1964)

In [5]:
df_ratio

Unnamed: 0,1964
M,1.0


Continuing on with this, we can set **any** column ...

In [6]:
df_ratio[2017] = get_gender_ratios(df, 2017)

In [7]:
df_ratio

Unnamed: 0,1964,2017
M,1.0,0.720134


**Notice** that you do not get the `F` row because it is not an index in the DataFrame.  One way you can solve this is: you can create a new row with the `F` index by simply subtracting 1 from `M` like this:

In [8]:
df_ratio.loc['F'] = 1 - df_ratio.loc['M']

In [9]:
df_ratio

Unnamed: 0,1964,2017
M,1.0,0.720134
F,0.0,0.279866


You can do this multiple times in a loop and if you do it this way, **note** you will need to Transpose (`.T`) the end result in order to graph it (alluded to in the HW writeup).

### but there is another way ... using `loc` label selection

You can do this entirely with [`loc` selection by label](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#selection-by-label) but you will **need to create DataFrame columns explicitly** when you initialize the DataFrame:


In [10]:
df_ratio_col = pd.DataFrame(columns=['M', 'F'])
df_ratio_col.loc[2018,:] = get_gender_ratios(df, 2018)

In [11]:
df_ratio_col

Unnamed: 0,M,F
2018,0.707649,0.292351


In [12]:
df_ratio_col.loc[1964,:] = get_gender_ratios(df, 1964)

In [13]:
df_ratio_col

Unnamed: 0,M,F
2018,0.707649,0.292351
1964,1.0,


**NOTICE**: when the column doesn't exist (in the case where `M` = 1.0, you will get an `NaN` value.  You will need to convert these to 0 when your DataFrame is completed with [`DataFrame.fillna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna). You will NOT need to transpose the DataFrame as everything is in the right place when you create it.

There are a number of clever ways to fill the DataFrame, but the simplest is to just execute through a loop.