# Let's calculate some championship data.

## Given what we have, I think it'll be interesting to take a look at how year in school, seed number in the outdoor lists, and team name are correlated with winning the Big Ten Championship.

### Let's get started by importing our libraries and opening the csv file.

In [1]:
import pandas as pd

# Initialize empty dictionaries to store the DataFrames
dataframes_lists = {}
dataframes_champs = {}

# Define the range of years
years = range(2010, 2025)

# Read CSV files into dataframes_lists
for year in years:
    if year != 2020:
        filename = f"dataframes_lists_{year}.csv"
        dataframes_lists[year] = pd.read_csv(filename)

# Read CSV files into dataframes_champs
for year in years:
    if year != 2020:
        filename = f"dataframes_champs_{year}.csv"
        dataframes_champs[year] = pd.read_csv(filename)


### Let's change the year column to numerical data as we did in improvenshow.py 

In [2]:

def convert_year(value):
    value = str(value)
    if '1' in value:
        return 1
    elif '2' in value:
        return 2
    elif '3' in value:
        return 3
    else:
        return 4

# Iterate through each DataFrame in dataframes_lists and apply the conversion
for year, df in dataframes_lists.items():
    dataframes_lists[year]['Year'] = df['Year'].apply(convert_year)

# Repeat the process for dataframes_champs
for year, df in dataframes_champs.items():
    dataframes_champs[year]['Year'] = df['Year'].apply(convert_year)



### Now let's calculate the chance of winning the championship given year in school.

In [14]:

# Initialize a dictionary to count championships won by each year
championships_by_year = {}

# Loop through the years in dataframes_champs
for year, champs_df in dataframes_champs.items():
    # Get the champion's year from dataframes_lists
    # Assuming the champion's year is first row, second column
    champion_year = champs_df.iloc[0,1]
    # Count the championship for this year
    if champion_year in championships_by_year:
        championships_by_year[champion_year] += 1
    else:
        championships_by_year[champion_year] = 1
        
# Calculate the total number of championships
total_championships = sum(championships_by_year.values())

# Calculate and print the chance of winning for each year
#Sort by descending year 
championships_by_year = {year: championships for year, championships in sorted(championships_by_year.items(), key=lambda x: x[0], reverse=True)}
for year, count in championships_by_year.items():
    chance_of_winning = (count / total_championships) * 100
    print(f"Year {year}: {chance_of_winning:.2f}% of championships won, {count} championships won")



Year 4: 35.71% of championships won, 5 championships won
Year 3: 35.71% of championships won, 5 championships won
Year 2: 14.29% of championships won, 2 championships won
Year 1: 14.29% of championships won, 2 championships won


### Pretty even distribution, with upper classmen having a slight edge over their younger competitors.

### Now let's calculate chance of winning given seed number in the outdoor lists. 

In [16]:

# Initialize a dictionary to count championships won by each seed
championships_by_seed = {}

# Loop through the years in dataframes_champs
for year, champs_df in dataframes_champs.items():
    # Assuming the champion's name is in the first row, first column of champs_df
    champion_name = champs_df.iloc[0, 0]
    list_df = dataframes_lists[year]
    # Find the row in list_df where the name matches the champion's name
    champion_row = list_df[list_df.iloc[:, 0] == champion_name]
    if not champion_row.empty:
        # Assuming the athlete's seed is the index + 1 (since DataFrame indices start at 0)
        champion_seed = champion_row.index[0] + 1 
        # Count the championship for this seed
        if champion_seed in championships_by_seed:
            championships_by_seed[champion_seed] += 1
        else:
            championships_by_seed[champion_seed] = 1

# Calculate the total number of championships
total_championships = sum(championships_by_seed.values())
championships_by_seed = {seed: championships for seed, championships in sorted(championships_by_seed.items(), key=lambda x: x[0], reverse=False)}
# Calculate and print the chance of winning for each seed
for seed, count in championships_by_seed.items():
    chance_of_winning = (count / total_championships) * 100
    print(f"Seed {seed}: {chance_of_winning:.2f}% chance of winning, {count} championships won")


Seed 1: 35.71% chance of winning, 5 championships won
Seed 2: 35.71% chance of winning, 5 championships won
Seed 3: 14.29% chance of winning, 2 championships won
Seed 5: 7.14% chance of winning, 1 championships won
Seed 11: 7.14% chance of winning, 1 championships won


### Interesting that the 1 seed and the 2 seed have the same number of championships won. 
### Also interesting that an 11 seed won the championship one time. Let's take a look at that championship.

In [5]:
print(dataframes_champs[2011])

             Athlete  Year           Team   Mark
0      Cody Marshall     1      Ohio State  5.05
1      Chris  Thoman     4          Purdue  4.95
2    Austin  DeWildt     2        Michigan  4.95
3       Matthew Bane     1        Illinois  4.95
4     Derik Peterman     1  Michigan State  4.95
5      Zack Saunders     2          Purdue  4.95
6       Ben Peterson     4       Minnesota  4.95
7      Jack Greenlee     2        Michigan  4.95
8     Mitch Erickson     1          Purdue  4.95
9       Jack Szmanda     2       Minnesota  4.80
10        Cody Klein     1        Illinois  4.80
11        Josh Hodur     2        Illinois  4.80
12      Sam Retzloff     1         Indiana  4.80
13     Derek Messmer     3         Indiana  4.80
14       James Nixon     3  Michigan State  0.00
15       Codi Mattix     2  Michigan State  0.00
16     Wes Kavelaris     3       Wisconsin  0.00
17      Japheth Cato     1       Wisconsin  0.00
18  MIckey DeFilippo     3       Wisconsin  0.00
19      Alex Baldwin

### 5.05 wins the Big Ten Championship?? And so many no heights. 

### Ah, I know why. The meet was hosted in Iowa. 

### Let's look at the teams that have the most championships.

In [17]:
# Initialize a dictionary to count championships won by each team
championships_by_team = {}

# Iterate through each year's DataFrame to count championships by team
for year, df in dataframes_champs.items():
    # Assuming the champion team's name is in the first row, first column
    champion_team = df.iloc[0, 2]
    if champion_team in championships_by_team:
        championships_by_team[champion_team] += 1
    else:
        championships_by_team[champion_team] = 1

# Calculate the total number of championships
total_championships = sum(championships_by_team.values())

# Calculate and print the chance of winning for each team
for team, count in championships_by_team.items():
    chance_of_winning = (count / total_championships) * 100
    print(f"Team {team}: {chance_of_winning:.2f}% chance of winning, {count} championships won")


Team Minnesota: 28.57% chance of winning, 4 championships won
Team Ohio State: 7.14% chance of winning, 1 championships won
Team Michigan: 7.14% chance of winning, 1 championships won
Team Indiana: 28.57% chance of winning, 4 championships won
Team Wisconsin: 7.14% chance of winning, 1 championships won
Team Nebraska: 7.14% chance of winning, 1 championships won
Team Michigan State: 14.29% chance of winning, 2 championships won


### And so Minnesota and Indiana are the winningest teams in recent Big Ten men's pole vault history. 

### Let's export our data to csv files. 

In [18]:

# Convert the championships_by_team dictionary to a DataFrame
df_teams_championships = pd.DataFrame(list(championships_by_team.items()), columns=['Team', 'Championships Won'])

# Export the DataFrame to a CSV file
df_teams_championships.to_csv('teams_championships_won.csv', index=False)

# Convert the championships_by_year dictionary to a DataFrame
df_championships_by_year = pd.DataFrame(list(championships_by_year.items()), columns=['Year', 'Championships Won'])

# Export the DataFrame to a CSV file
df_championships_by_year.to_csv('championships_by_year.csv', index=False)

# Convert the championships_by_seed dictionary to a DataFrame
df_championships_by_seed = pd.DataFrame(list(championships_by_seed.items()), columns=['Seed', 'Championships Won'])

# Export the DataFrame to a CSV file
df_championships_by_seed.to_csv('championships_by_seed.csv', index=False)