# Supplemental Notebook to HW1

Loading data can be accomplished using the [`Requests`]() library in conjunction with the [`Beautiful Soup 4`]() library.  Study the code snippets below.

In [1]:
import requests
page = requests.get("https://gitlab.com/openpowerlifting/opl-data/tree/master/meet-data/").content

`page` contains the HTML of interest and in particular all the directories are of `class=str-truncated` in the original HTML of the page on Gitlab.  Thus, parsing the page with BS4 and building the list of directories of data would look like:

In [2]:
from bs4 import BeautifulSoup
doc = BeautifulSoup(page)
lst_fed = [ i['title'] for i in doc.find_all(class_='str-truncated') if '/' not in i['title']]

You can now use the `random` library to randomly select 15 of the federations and grab the 2019 meet data:

In [3]:
import random
random.shuffle(lst_fed) # this randomizes the list in place
lst_fed[:15]

['apf',
 'wpau',
 'xpc',
 'bpu',
 'ireland-ua',
 'raw-iceland',
 'bb',
 'rawu',
 'bpc',
 'naurupf',
 'belpf',
 'chinapa',
 'ukrainepa',
 'achipo',
 'aap']

Now we can iterate over the directories in each of our random directories using the same technique as above and parse for only those directories that include `19` for the 2019 meet data.  Once we have those directories, we simply open them and grab the `entries.csv` file and proceed accordingly.

In [4]:
import pandas as pd
import time

template = "https://gitlab.com/openpowerlifting/opl-data/raw/master/meet-data/{}/{}/entries.csv"
df = pd.DataFrame()

for f in lst_fed[:15]:

    try:
        page = requests.get("https://gitlab.com/openpowerlifting/opl-data/tree/master/meet-data/{}".format(f)).content
        doc = BeautifulSoup(page)
        lst_2019 = [i['title'] for i in doc.find_all(class_='str-truncated') if i['title'][:2] == '19']

        for y in lst_2019:
            sample = template.format(f, y)
            df = pd.concat([df, pd.read_csv(sample)], sort=False)

        print("{} ...".format(f))
        time.sleep(1.5)
    except Exception as e:
        print("X {} ...".format(f))

apf ...
wpau ...
xpc ...
bpu ...
ireland-ua ...
raw-iceland ...
bb ...
rawu ...
bpc ...
naurupf ...
belpf ...
chinapa ...
ukrainepa ...
achipo ...
aap ...


You will notice that the index needs to be reset with `reset_index`, with something like

```python

df = df.reset_index()

```

Review the original homework writeup to use `plot()` to create the require plots and execute the required queries.

To show a brief example of how to execute an ad hoc query of getting all participants who are 29 years of age.

In [5]:
df.query('Age == 29').head()

Unnamed: 0,Name,Age,Sex,Equipment,Division,BodyweightKg,WeightClassKg,Squat1Kg,Squat2Kg,Squat3Kg,...,Tested,BirthDate,Team,Squat4Kg,Bench4Kg,Deadlift4Kg,Country,State,CyrillicName,BirthYear
25,Christina Brown,29.0,F,Wraps,F_OCR_APF,66.8,67.5,132.5,140.0,145.0,...,No,,,,,,,,,
27,Michael Walker #2,29.0,M,Wraps,M_OCR_APF,97.07,100.0,227.5,247.5,-265.0,...,,,,,,,,,,
30,Newin Spencer,29.0,M,Wraps,M_OCR_APF,102.97,110.0,237.5,262.5,-267.5,...,,,,,,,,,,
29,Eric Couthen,29.0,M,Wraps,M_OCR_AAPF,113.6,125.0,265.0,280.0,285.0,...,Yes,,,,,,,,,
46,Jonathan Stroth,29.0,M,Raw,M_OR_APF,93.2,100.0,170.0,180.0,185.0,...,,,,,,,,,,


Remember that `df.columns` will list all the columns in the dataset.

This example gets the average weight of 29 year old Males.

In [6]:
df.query('Age==29 & Sex=="M"').loc[:,'BodyweightKg'].mean() 

99.8363963963964

In [7]:
df.query('Age==29 & Sex=="M"').loc[:,'BodyweightKg'].mean()  * 2.20462262 # go ahead and convert to lbs

220.10157779478197