## Technical Workshop

- Team up and apply the concepts that you have learned in pandas to access, manipulate and analyze data
- Tinker around with the dataset to answer a series of questions
- Save your group notebook and submit with the following format: `workshop_group_<#>.ipynb`
- Duration: 1 hour

### Describe Your Team!

What other way than coming up with a simple dataframe that has the following columns:
    - Name
    - Year in College
    - School

Input each teammate as rows and then name your dataframe according to your team name.

In [113]:
import pandas as pd

team_foo_bar = {
    'Name': ['Christian Jay Baguio', 'Anna Michaella Villanueva', 'Chris Jallaine Mugot'],
    'Year in College': [4, 3, 3],
    'School': ['USTP', 'CLSU', 'USTP']
}

team_foo_bar = pd.DataFrame(team_foo_bar)
team_foo_bar

Unnamed: 0,Name,Year in College,School
0,Christian Jay Baguio,4,USTP
1,Anna Michaella Villanueva,3,CLSU
2,Chris Jallaine Mugot,3,USTP


### Olympics Agency

You and your teammates are working in the Data Science department of a renowned Sports Agency that primarily focuses on the Olympics. Consultants and journalists reach out to your team for key insights to fulfill their upcoming prints and marketing stints.

Thankfully you have access to the relevant dataset, and of course your newly acquired Python Pandas skills.

<img src="images/olympics_logo.png" alt="BMI Formula" width="500" />

#### Available Datasets

| **Dataset**      | **Notes**      |
| ------------- | ------------- |
| [olympics_bios.csv](data/olympics_bios.csv) | Contains basic biographical data of Olympians since 1960s |
| [noc_regions.csv](data/noc_regions.csv) | Contains records of National Olympic Committees |
| [olympics_results.csv](data/olympics_results.csv) | Contains all records of Olympic events and the corresponding results |

In [163]:
olympics_bios = pd.read_csv('./data/olympics_bios.csv')
noc = pd.read_csv('./data/noc_regions.csv')
olympics_results = pd.read_csv('./data/olympics_results.csv')

olympics_results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 308408 entries, 0 to 308407
Data columns (total 11 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   year        305807 non-null  float64
 1   type        305807 non-null  object 
 2   discipline  308407 non-null  object 
 3   event       308408 non-null  object 
 4   as          308408 non-null  object 
 5   athlete_id  308408 non-null  int64  
 6   noc         308407 non-null  object 
 7   team        121714 non-null  object 
 8   place       283193 non-null  float64
 9   tied        308408 non-null  bool   
 10  medal       44139 non-null   object 
dtypes: bool(1), float64(2), int64(1), object(7)
memory usage: 23.8+ MB


##### Preview Olympics Biography dataset:

In [117]:
olympics_bios.sample(5)

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
36701,36995,Jean Klein,1944-06-22,Créteil,Val-de-Marne,FRA,France,172.0,52.0,2014-12-01
75165,75742,Franziska Rochat-Moser,1966-08-17,Crissier,Vaud,SUI,Switzerland,174.0,54.0,2002-03-07
44785,45127,Lisa Steanes,1958-05-23,,,,Australia,170.0,61.0,
81804,82456,Hedy Stenuf,1922-07-18,Wien (Vienna),Wien,AUT,Austria,,,2010-11-07
127673,130144,Béatrice Edwige,1988-10-03,,,,France,182.0,,


##### Preview NOC Region dataset:

In [119]:
noc.sample(5)

Unnamed: 0,NOC,region,notes
35,CAM,Cambodia,
222,VNM,Vietnam,
62,ESA,El Salvador,
151,NRU,Nauru,
67,FIJ,Fiji,


##### Preview Olympics Results dataset:

In [121]:
olympics_results.sample(5)

Unnamed: 0,year,type,discipline,event,as,athlete_id,noc,team,place,tied,medal
151436,1992.0,Summer,Athletics,"400 metres, Men (Olympic)",Devon Morris,72314,JAM,,5.0,False,
294035,2020.0,Winter,Alpine Skiing (Skiing),"Slalom, Girls (YOG)",Zita Tóth,138618,HUN,,,False,
155399,1948.0,Summer,Athletics,"800 metres, Men (Olympic)",Doug Harris,74310,NZL,,2.0,False,
238987,2014.0,Winter,Speed Skating (Skating),"1,000 metres, Men (Olympic)",Mark Tuitert,110382,NED,,10.0,False,
6463,1992.0,Summer,Baseball (Baseball/Softball),"Baseball, Men (Olympic)",Orlando López,3185,PUR,Puerto Rico,5.0,False,


#### I. Filipino Sporting Triumphs

The Filipino Olympic Committee has commissioned our agency to highlight their athletic achievements since the turn of the millennium. Your task is to compile a list of unique sports disciplines where Pinoy athletes have left their mark since the year 2000. Moreover, which amongst these sports disciplines did they produce promising results in (i.e bagged any type of medals)?

In [183]:
# Start wrangling here

filtered_df = olympics_results[olympics_results['noc'] == 'PHI']
filtered_df = filtered_df[filtered_df['year'] >= 2000]
filtered_df = filtered_df[filtered_df['medal'].notnull()]
filtered_df

Unnamed: 0,year,type,discipline,event,as,athlete_id,noc,team,place,tied,medal
250026,2016.0,Summer,Weightlifting,"Featherweight, Women (Olympic)",Hidilyn Diaz,116263,PHI,,2.0,False,Silver
250027,2020.0,Summer,Weightlifting,"Featherweight, Women (Olympic)",Hidilyn Diaz,116263,PHI,,1.0,False,Gold
304104,2020.0,Summer,Boxing,"Featherweight, Women (Olympic)",Nesthy Petecio,145878,PHI,,2.0,False,Silver
304112,2020.0,Summer,Boxing,"Flyweight, Men (Olympic)",Carlo Paalam,145885,PHI,,2.0,False,Silver
304113,2020.0,Summer,Boxing,"Middleweight, Men (Olympic)",Eumir Marcial,145886,PHI,,3.0,True,Bronze


#### II. The Big Five

Sports journalists are crafting a new magazine that features __*"The Big Five"*__. They reached out to your team seeking information on the global power rankings throughout the entire Olympic history. What are the top 5 countries that bagged the most Gold medals?

In [217]:
medal = olympics_results[olympics_results['medal'] == 'Gold']['noc']
medal

40        FRA
41        FRA
42        FRA
48        FRA
80        FRA
         ... 
308139    ROC
308141    ROC
308290    SUI
308326    SUI
308340    SUI
Name: noc, Length: 14783, dtype: object

In [73]:
# Start wrangling here
gold_medals = olympics_results[olympics_results['medal'] == 'Gold']

gold_medals_count = gold_medals.groupby('noc')['medal'].count().reset_index(name='gold_count')

top_5_countries = gold_medals_count.sort_values(by='gold_count', ascending=False).head(5)

# Display the results
print(top_5_countries)

     noc  gold_count
107  USA        2717
105  URS        1076
41   GER         817
38   GBR         715
52   ITA         605


#### III. BMI Influence in Medal Success

General coaching staff from various professional athelete management firms seek to understand correlation of body composition and performance outcome in a given field of sports. Your team was asked to provide data on the average BMI class of gold medalists per sports discipline.

##### BMI formula

<img src="images/bmi_formula.png" alt="BMI Formula" width="500" />

##### Standard BMI classification

<img src="images/bmi_class.png" alt="BMI Classification" width="500" />

In [223]:
# Start wrangling here

gold_medalists_bios = gold_medals.merge(olympics_bios[['athlete_id', 'height_cm', 'weight_kg']], on='athlete_id', how='left')

gold_medalists_bios['height_m'] = gold_medalists_bios['height_cm'] / 100


gold_medalists_bios['BMI'] = gold_medalists_bios['weight_kg'] / (gold_medalists_bios['height_m'] ** 2)

average_bmi_per_sport = gold_medalists_bios.groupby('discipline')['BMI'].mean().reset_index()

In [225]:
def categorize_bmi(bmi):
    if bmi < 18.5:
        return 'Underweight'
    elif 18.5 <= bmi < 25.0:
        return 'Normal range'
    elif 25.0 <= bmi < 30.0:
        return 'Overweight'
    elif 30.0 <= bmi < 35.0:
        return 'Obese Class I'
    elif 35.0 <= bmi < 40.0:
        return 'Obese Class II'
    else:
        return 'Obese Class III'

In [227]:
gold_medalists_bios['Weight Status'] = gold_medalists_bios['BMI'].apply(categorize_bmi)

average_bmi_per_sport_with_status = gold_medalists_bios.groupby(['discipline', 'Weight Status'])['BMI'].mean().reset_index()

display(average_bmi_per_sport_with_status)

Unnamed: 0,discipline,Weight Status,BMI
0,3-on-3 Ice Hockey (Ice Hockey),Normal range,21.280900
1,3-on-3 Ice Hockey (Ice Hockey),Obese Class III,
2,3-on-3 Ice Hockey (Ice Hockey),Overweight,25.106333
3,3x3 Basketball (Basketball),Normal range,22.694019
4,3x3 Basketball (Basketball),Obese Class III,
...,...,...,...
254,Wrestling,Normal range,22.286794
255,Wrestling,Obese Class I,32.386913
256,Wrestling,Obese Class II,36.747716
257,Wrestling,Obese Class III,
