# Analysis of Sports Team Performance and Metropolitan Population Correlation

## Project Description
This project explores the relationship between the performance of professional sports teams (Win/Loss Ratio) and the population of their corresponding metropolitan areas. The analysis focuses on teams from the "Big 4" major sports leagues in the United States: NFL (football), MLB (baseball), NBA (basketball), and NHL (hockey).

The data includes:
- Metropolitan regions and associated sports teams, sourced from `wikipedia_data.html`.
- Win/Loss statistics from league-specific files: `nfl.csv`, `mlb.csv`, `nba.csv`, and `nhl.csv`.

Please keep in mind that all questions and operations are from the perspective of the metropolitan region, and that wikipedia_data.html file is the "source of authority" for the location of a given sports team. Thus teams which are commonly known by a different area (e.g. "Oakland Raiders") need to be mapped into the metropolitan region given (e.g. San Francisco Bay Area). 

The primary goal is to calculate the correlation between the Win/Loss ratio and the population of the city or metropolitan area where the teams are located. This correlation is calculated using the Pearson correlation coefficient (`pearsonr`), and the analysis is limited to data from the year 2018.

For metropolitan areas with multiple teams in the same league, their Win/Loss ratios are averaged to obtain a single value for the city in that sport. Additionally, manual adjustments are made to match sports teams to their metropolitan areas (e.g., mapping the Oakland Raiders to the San Francisco Bay Area).

## Key Questions
- How does the Win/Loss ratio of teams in the NFL, MLB, NBA, and NHL correlate with the population of their respective metropolitan areas?

## Notes for Execution
1. The analysis excludes other sports leagues, such as MLS and CFL, to focus solely on the Big 4 leagues.
2. For cities with multiple teams in the same league, the performance metrics are aggregated into a single representative value.
3. Data cleaning and transformation are necessary to match team names and locations to the corresponding metropolitan areas. This may involve external research to address naming inconsistencies.

In [5]:
# borrowed from Stack Overflow(https://stackoverflow.com/questions/38783027/jupyter-notebook-display-two-pandas-tables-side-by-side)
from IPython.display import display_html
from itertools import chain,cycle
def display_side_by_side(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        #html_str+=f'<h2 style="text-align: center;">{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw = True)

In [132]:
import pandas as pd
import numpy as np

def clean_cities_html():
        """
        Cleans cities html file and returns cleaned dataframe.
        """
        cities = pd.read_html("assets/wikipedia_data.html")[1]
        cities = cities.iloc[:-1,[0,3,5,6,7,8]]

        # let's rename columns for easier access
        cities.rename(columns = {'Metropolitan area':'Metro. Area', 'Population (2016 est.)[8]':'Popu. Est.'}, inplace = True)
        
        # from here, we only clean Big 4 columns:
        cols = ['NFL', 'MLB', 'NBA', 'NHL']
        
        # Strip whitespace from the specified columns
        cities[cols] = cities[cols].select_dtypes('object').apply(lambda col: col.str.strip())

        # from here, we only clean Big 4 columns:
        cols = ['NFL', 'MLB', 'NBA', 'NHL']
        # Step 1: Replace cells containing dashes ('—') with NaN
        cities[cols] = cities[cols].replace(to_replace = r'.*—.*', value = np.nan, regex=True)
        # Step 2: Remove annotations like [note X]
        cities[cols] = cities[cols].replace(to_replace = r'(\w*?)\[[^\]]+\]', value = r'\1', regex = True)
        # Step 3: Replace empty strings and whitespace-only strings with NaN
        cities[cols] = cities[cols].replace(to_replace=r'^\s*$', value=np.nan, regex=True)

        # finally return the entire dataframe
        return cities

## NHL Only
We calculate the win/loss ratio's correlation with the population of the city it is in for the **NHL** using **2018** data.

In [155]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import re

def nhl_correlation(): 
    # Load NHL data
    nhl_df = pd.read_csv("assets/nhl.csv")

    def clean_cities_html():
        """
        Cleans the cities data from the HTML file and returns a DataFrame.
        - Renames columns for easier access.
        - Cleans the Big 4 sports columns (NFL, MLB, NBA, NHL).
        - Replaces missing or irrelevant values (e.g., dashes, notes, empty strings).
        """
        cities = pd.read_html("assets/wikipedia_data.html")[1]
        cities = cities.iloc[:-1, [0, 3, 5, 6, 7, 8]]

        # Rename columns for better readability
        cities.rename(columns={'Metropolitan area': 'Metro. Area', 
                               'Population (2016 est.)[8]': 'Popu. Est.'}, inplace=True)
        
        # Specify columns to clean (Big 4 sports leagues)
        cols = ['NFL', 'MLB', 'NBA', 'NHL']

        # Strip whitespace from relevant columns
        cities[cols] = cities[cols].apply(lambda col: col.str.strip() if col.dtype == 'object' else col)

        # Replace cells containing dashes ('—') with NaN
        cities[cols] = cities[cols].replace(to_replace=r'.*—.*', value=np.nan, regex=True)

        # Remove annotations like [note X] from team names
        cities[cols] = cities[cols].replace(to_replace=r'(\w*?)\[[^\]]+\]', value=r'\1', regex=True)

        # Replace empty strings or whitespace-only strings with NaN
        cities[cols] = cities[cols].replace(to_replace=r'^\s*$', value=np.nan, regex=True)

        # Return cleaned DataFrame
        return cities
    
    # Clean cities data and retain only relevant columns (Metro. Area, Population, NHL teams)
    cities = clean_cities_html()[['Metro. Area', 'Popu. Est.', 'NHL']].dropna()

    # Split NHL teams column into lists of team names for easier matching
    cities['NHL'] = cities['NHL'].apply(lambda team: team.split())

    """NHL CSV File Cleaning"""
    # Filter NHL data for the year 2018 and exclude division rows
    irrelevant_rows = ['Metropolitan Division', 'Pacific Division', 'Central Division', 'Atlantic Division']
    nhl_df = nhl_df[(nhl_df['year'] == 2018) & (~nhl_df['team'].isin(irrelevant_rows))]

    # Keep only relevant columns: team, W (wins), L (losses)
    relevant_cols = ['team', 'W', 'L']
    nhl_df = nhl_df[relevant_cols]

    # Remove asterisks from team names
    nhl_df.replace(to_replace=r'\*$', value='', regex=True, inplace=True)

    # Calculate Win/Loss ratio for each team
    nhl_df[['W', 'L']] = nhl_df[['W', 'L']].astype(int)  # Convert W and L columns to integers
    nhl_df['W/L'] = nhl_df['W'] / (nhl_df['W'] + nhl_df['L'])  # Compute Win/Loss ratio

    # Function to match NHL team names to their corresponding metro area
    def match_team_city(row, sports_name):
        for _, cities_row in cities.iterrows():
            if any(team in row['team'] for team in cities_row[sports_name]):  # Check for team matches
                return cities_row['Metro. Area']
        return np.nan  # Return NaN if no match is found

    # Add a Metro. Area column to NHL data by applying the matching function
    nhl_df['Metro. Area'] = nhl_df.apply(match_team_city, sports_name='NHL', axis=1)

    # Merge NHL data with cities data on Metro. Area and drop unnecessary columns
    by_cities_df = pd.merge(nhl_df, cities, on='Metro. Area').drop(['W', 'L', 'NHL'], axis='columns')

    # Group data by Metro. Area and calculate mean W/L ratio and population estimate
    by_cities_df = by_cities_df.groupby(by='Metro. Area').agg({'Popu. Est.': 'first', 'W/L': 'mean'})

    # Convert population estimates to numeric type
    by_cities_df['Popu. Est.'] = pd.to_numeric(by_cities_df['Popu. Est.'])

    # Extract relevant Series for correlation
    population_by_region = by_cities_df['Popu. Est.']  # Metro area population
    win_loss_by_region = by_cities_df['W/L']         # Average W/L ratio by metro area

    # Assertions for data validation
    assert len(population_by_region) == len(win_loss_by_region), "Q1: Your lists must be the same length"
    assert len(population_by_region) == 28, "Q1: There should be 28 teams being analyzed for NHL"
    
    # Compute and return Pearson correlation coefficient
    return stats.pearsonr(population_by_region, win_loss_by_region)[0]

## NBA Only
We calculate the win/loss ratio's correlation with the population of the city it is in for the **NBA** using **2018** data.

In [157]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import re

def nba_correlation():
    """
    Calculates the correlation between metropolitan area population 
    and NBA team performance (W/L%) for the year 2018.
    """
    # Load NBA data
    nba_df = pd.read_csv("assets/nba.csv")

    def clean_cities_html():
        """
        Cleans the cities data from the HTML file and returns a DataFrame.
        
        Steps performed:
        - Renames columns for better readability.
        - Cleans the Big 4 sports columns (NFL, MLB, NBA, NHL):
            - Strips whitespace from strings.
            - Replaces cells containing dashes ('—') with NaN.
            - Removes annotations like [note X].
            - Replaces empty or whitespace-only strings with NaN.
        """
        # Load data from the HTML file and select relevant columns
        cities = pd.read_html("assets/wikipedia_data.html")[1]
        cities = cities.iloc[:-1, [0, 3, 5, 6, 7, 8]]

        # Rename columns for clarity
        cities.rename(columns={'Metropolitan area': 'Metro. Area', 
                               'Population (2016 est.)[8]': 'Popu. Est.'}, inplace=True)
        
        # Specify columns to clean (Big 4 sports leagues)
        cols = ['NFL', 'MLB', 'NBA', 'NHL']

        # Step 1: Strip whitespace from all relevant columns
        cities[cols] = cities[cols].apply(lambda col: col.str.strip() if col.dtype == 'object' else col)

        # Step 2: Replace cells containing dashes ('—') with NaN
        cities[cols] = cities[cols].replace(to_replace=r'.*—.*', value=np.nan, regex=True)

        # Step 3: Remove annotations like [note X] from team names
        cities[cols] = cities[cols].replace(to_replace=r'(\w*?)\[[^\]]+\]', value=r'\1', regex=True)

        # Step 4: Replace empty or whitespace-only strings with NaN
        cities[cols] = cities[cols].replace(to_replace=r'^\s*$', value=np.nan, regex=True)

        return cities

    # Step 1: Clean cities data and retain relevant columns
    cities = clean_cities_html()[['Metro. Area', 'Popu. Est.', 'NBA']].dropna()

    # Step 2: Split NBA teams column into lists of team names for easier matching
    cities['NBA'] = cities['NBA'].apply(lambda team: team.split())

    # Step 3: Filter NBA data for the year 2018
    nba_df = nba_df[nba_df['year'] == 2018][['team', 'W/L%']]
    
    # Step 4: Rename 'W/L%' column for consistency
    nba_df.rename(columns={'W/L%': 'W/L'}, inplace=True)

    # Step 5: Clean team names in the NBA dataset
    # Remove unnecessary characters such as '*' and numbers in parentheses '(#)'
    nba_df['team'] = nba_df['team'].replace(to_replace=r'[\s|\*]*\(\d+\)', value='', regex=True).str.strip()


    # Step 6: Match NBA team names to their corresponding metro area
    def match_team_city(row, sports_name):
        """
        Matches a team's name to its corresponding metropolitan area using the cities DataFrame.
        """
        for _, cities_row in cities.iterrows():
            if any(team in row['team'] for team in cities_row[sports_name]):  # Check for team matches
                return cities_row['Metro. Area']
        return np.nan  # Return NaN if no match is found

    # Apply the matching function to add 'Metro. Area' to the NBA DataFrame
    nba_df['Metro. Area'] = nba_df.apply(lambda row: match_team_city(row, sports_name='NBA'), axis=1)

    # Step 7: Merge NBA data with cities data on 'Metro. Area'
    by_cities_df = pd.merge(nba_df, cities, on='Metro. Area')

    # Step 8: Convert population estimates and W/L ratios to numeric types
    by_cities_df['Popu. Est.'] = pd.to_numeric(by_cities_df['Popu. Est.'])
    by_cities_df['W/L'] = pd.to_numeric(by_cities_df['W/L'])

    # Step 9: Group data by 'Metro. Area' and calculate mean W/L ratio and population estimate
    by_cities_df = by_cities_df.groupby(by='Metro. Area').agg({'Popu. Est.': 'first', 'W/L': 'mean'})

    # Step 10: Extract relevant data for correlation
    population_by_region = by_cities_df['Popu. Est.']  # Metro area population
    win_loss_by_region = by_cities_df['W/L']           # Average W/L ratio by metro area

    # Step 11: Validate data length and consistency
    assert len(population_by_region) == len(win_loss_by_region), "Q2: Your lists must be the same length"
    assert len(population_by_region) == 28, "Q2: There should be 28 teams being analyzed for NBA"

    # Step 12: Compute and return Pearson correlation coefficient
    return stats.pearsonr(population_by_region, win_loss_by_region)[0]

## MLB Only
We calculate the win/loss ratio's correlation with the population of the city it is in for the **MLB** using **2018** data.

In [161]:
import pandas as pd
import numpy as np
import scipy.stats as stats

def mlb_correlation():
    """
    Calculates the correlation between metropolitan area population 
    and MLB team performance (W/L%) for the year 2018.
    """
    # Load MLB data
    mlb_df = pd.read_csv("assets/mlb.csv")

    def clean_cities_html():
        """
        Cleans the cities data from the HTML file and returns a DataFrame.
        
        Steps performed:
        - Renames columns for better readability.
        - Cleans the Big 4 sports columns (NFL, MLB, NBA, NHL):
            - Strips whitespace from strings.
            - Replaces cells containing dashes ('—') with NaN.
            - Removes annotations like [note X].
            - Replaces empty or whitespace-only strings with NaN.
        """
        # Load data from the HTML file and select relevant columns
        cities = pd.read_html("assets/wikipedia_data.html")[1]
        cities = cities.iloc[:-1, [0, 3, 5, 6, 7, 8]]

        # Rename columns for clarity
        cities.rename(columns={'Metropolitan area': 'Metro. Area', 
                               'Population (2016 est.)[8]': 'Popu. Est.'}, inplace=True)
        
        # Specify columns to clean (Big 4 sports leagues)
        cols = ['NFL', 'MLB', 'NBA', 'NHL']

        # Step 1: Strip whitespace from all relevant columns
        cities[cols] = cities[cols].apply(lambda col: col.str.strip() if col.dtype == 'object' else col)

        # Step 2: Replace cells containing dashes ('—') with NaN
        cities[cols] = cities[cols].replace(to_replace=r'.*—.*', value=np.nan, regex=True)

        # Step 3: Remove annotations like [note X] from team names
        cities[cols] = cities[cols].replace(to_replace=r'(\w*?)\[[^\]]+\]', value=r'\1', regex=True)

        # Step 4: Replace empty or whitespace-only strings with NaN
        cities[cols] = cities[cols].replace(to_replace=r'^\s*$', value=np.nan, regex=True)

        return cities

    # Step 1: Clean cities data and retain relevant columns
    cities = clean_cities_html()[['Metro. Area', 'Popu. Est.', 'MLB']].dropna()

    # Step 2: Split MLB teams column into lists of team names for easier matching
    cities['MLB'] = cities['MLB'].apply(lambda team: team.split())

    # Step 3: Filter MLB data for the year 2018 and retain relevant columns
    mlb_df = mlb_df[mlb_df['year'] == 2018][['team', 'W-L%']]

    # Step 4: Rename 'W-L%' column for consistency
    mlb_df.rename(columns={'W-L%': 'W/L'}, inplace=True)

    # Step 5: Clean team names in the MLB dataset
    mlb_df['team'] = mlb_df['team'].str.strip()

    # Step 6: Match MLB team names to their corresponding metro area
    def match_team_city(row, sports_name):
        """
        Matches a team's name to its corresponding metropolitan area using the cities DataFrame.
        """
        for _, cities_row in cities.iterrows():
            if any(team in row['team'] for team in cities_row[sports_name]):  # Check for team matches
                return cities_row['Metro. Area']
        return np.nan  # Return NaN if no match is found

    # Apply the matching function to add 'Metro. Area' to the MLB DataFrame
    mlb_df['Metro. Area'] = mlb_df.apply(lambda row: match_team_city(row, sports_name='MLB'), axis=1)

    # Step 7: Manually overwrite mismatched metro areas for specific teams
    mlb_df.loc[mlb_df['team'] == 'Boston Red Sox', 'Metro. Area'] = 'Boston'
    mlb_df.loc[mlb_df['team'] == 'Cincinnati Reds', 'Metro. Area'] = 'Cincinnati'

    # Step 8: Merge MLB data with cities data on 'Metro. Area'
    by_cities_df = pd.merge(mlb_df, cities, on='Metro. Area')

    # Step 9: Convert population estimates and W/L ratios to numeric types
    by_cities_df['Popu. Est.'] = pd.to_numeric(by_cities_df['Popu. Est.'])
    by_cities_df['W/L'] = pd.to_numeric(by_cities_df['W/L'])

    # Step 10: Group data by 'Metro. Area' and calculate mean W/L ratio and population estimate
    by_cities_df = by_cities_df.groupby(by='Metro. Area').agg({'Popu. Est.': 'first', 'W/L': 'mean'})

    # Step 11: Extract relevant data for correlation
    population_by_region = by_cities_df['Popu. Est.']  # Metro area population
    win_loss_by_region = by_cities_df['W/L']           # Average W/L ratio by metro area

    # Step 12: Validate data length and consistency
    assert len(population_by_region) == len(win_loss_by_region), "Q3: Your lists must be the same length"
    assert len(population_by_region) == 26, "Q3: There should be 26 teams being analyzed for MLB"

    # Step 13: Compute and return Pearson correlation coefficient
    return stats.pearsonr(population_by_region, win_loss_by_region)[0]

## NFL Only
We calculate the win/loss ratio's correlation with the population of the city it is in for the **NFL** using **2018** data.

In [167]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import re

nfl_df = pd.read_csv("assets/nfl.csv")

def nfl_correlation(): 
    # Load NFL data
    nfl_df = pd.read_csv("assets/nfl.csv")

    def clean_cities_html():
        """
        Cleans the cities data from the HTML file and returns a DataFrame.
        
        Steps performed:
        - Renames columns for better readability.
        - Cleans the Big 4 sports columns (NFL, MLB, NBA, NHL):
            - Strips whitespace from strings.
            - Replaces cells containing dashes ('—') with NaN.
            - Removes annotations like [note X].
            - Replaces empty or whitespace-only strings with NaN.
        """
        # Load data from the HTML file and select relevant columns
        cities = pd.read_html("assets/wikipedia_data.html")[1]
        cities = cities.iloc[:-1, [0, 3, 5, 6, 7, 8]]

        # Rename columns for clarity
        cities.rename(columns={'Metropolitan area': 'Metro. Area', 
                               'Population (2016 est.)[8]': 'Popu. Est.'}, inplace=True)
        
        # Specify columns to clean (Big 4 sports leagues)
        cols = ['NFL', 'MLB', 'NBA', 'NHL']

        # Step 1: Strip whitespace from all relevant columns
        cities[cols] = cities[cols].apply(lambda col: col.str.strip() if col.dtype == 'object' else col)

        # Step 2: Replace cells containing dashes ('—') with NaN
        cities[cols] = cities[cols].replace(to_replace=r'.*—.*', value=np.nan, regex=True)

        # Step 3: Remove annotations like [note X] from team names
        cities[cols] = cities[cols].replace(to_replace=r'(\w*?)\[[^\]]+\]', value=r'\1', regex=True)

        # Step 4: Replace empty or whitespace-only strings with NaN
        cities[cols] = cities[cols].replace(to_replace=r'^\s*$', value=np.nan, regex=True)

        return cities

    # Step 1: Clean cities data and retain relevant columns
    cities = clean_cities_html()[['Metro. Area', 'Popu. Est.', 'NFL']].dropna()

    # Step 2: Split MLB teams column into lists of team names for easier matching
    cities['NFL'] = cities['NFL'].apply(lambda team: team.split())

    # Step 3: Filter MLB data for the year 2018 and retain relevant columns
    nfl_df = nfl_df[nfl_df['year'] == 2018][['team', 'W-L%']]

    # Step 4: Rename 'W-L%' column for consistency
    nfl_df.rename(columns={'W-L%': 'W/L'}, inplace=True)

    # Step 4.5: Drop all divider rows that contain names like 'AFC North'
    nfl_df = nfl_df[~nfl_df['team'].str.contains('AFC|NFC')]

    # Step 5: Clean team names in the NFL dataset
    # Remove unnecessary characters such as '*' or '+'
    nfl_df['team'] = nfl_df['team'].replace(to_replace=r'\*|\+', value = '', regex=True).str.strip()

    # Step 6: Match NFL team names to their corresponding metro area
    def match_team_city(row, sports_name):
        """
        Matches a team's name to its corresponding metropolitan area using the cities DataFrame.
        """
        for _, cities_row in cities.iterrows():
            if any(team in row['team'] for team in cities_row[sports_name]):  # Check for team matches
                return cities_row['Metro. Area']
        return np.nan  # Return NaN if no match is found

    # Apply the matching function to add 'Metro. Area' to the NFl DataFrame
    nfl_df['Metro. Area'] = nfl_df.apply(lambda row: match_team_city(row, sports_name='NFL'), axis=1)

    # Step 7: Manually overwrite mismatched metro areas for specific teams
    nfl_df.loc[nfl_df['team'] == 'Boston Red Sox', 'Metro. Area'] = 'Boston'
    nfl_df.loc[nfl_df['team'] == 'Cincinnati Reds', 'Metro. Area'] = 'Cincinnati'

    # Step 8: Merge NFL data with cities data on 'Metro. Area'
    by_cities_df = pd.merge(nfl_df, cities, on='Metro. Area')

    # Step 9: Convert population estimates and W/L ratios to numeric types
    by_cities_df['Popu. Est.'] = pd.to_numeric(by_cities_df['Popu. Est.'])
    by_cities_df['W/L'] = pd.to_numeric(by_cities_df['W/L'])

    # Step 10: Group data by 'Metro. Area' and calculate mean W/L ratio and population estimate
    by_cities_df = by_cities_df.groupby(by='Metro. Area').agg({'Popu. Est.': 'first', 'W/L': 'mean'})

    # Step 11: Extract relevant data for correlation
    population_by_region = by_cities_df['Popu. Est.']  # Metro area population
    win_loss_by_region = by_cities_df['W/L']           # Average W/L ratio by metro area

    assert len(population_by_region) == len(win_loss_by_region), "Q4: Your lists must be the same length"
    assert len(population_by_region) == 29, "Q4: There should be 29 teams being analysed for NFL"

    return stats.pearsonr(population_by_region, win_loss_by_region)[0]

nfl_correlation()

0.004282141436393022

## All Together
Alright, so here's what I'm doing for this part: I'm testing the idea that **if a metro area has teams in two different sports, those teams will perform similarly in their respective sports.** To check this, I'm running a series of paired t-tests (using `ttest_rel` from `scipy.stats`) for all pairs of sports. Essentially, I want to see if there are any sports where we can confidently reject the null hypothesis (that their performances are the same). 

For regions with multiple teams in the same sport, I'll calculate the average performance to represent that sport in that city. Also, I'll only include cities that have teams participating in both sports being compared, and I'll drop cities that don't fit the criteria. This part is pretty important because it’s worth 20% of the assignment grade!

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import re

mlb_df=pd.read_csv("assets/mlb.csv")
nhl_df=pd.read_csv("assets/nhl.csv")
nba_df=pd.read_csv("assets/nba.csv")
nfl_df=pd.read_csv("assets/nfl.csv")
cities=pd.read_html("assets/wikipedia_data.html")[1]
cities=cities.iloc[:-1,[0,3,5,6,7,8]]

def sports_team_performance():
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # Note: p_values is a full dataframe, so df.loc["NFL","NBA"] should be the same as df.loc["NBA","NFL"] and
    # df.loc["NFL","NFL"] should return np.nan
    sports = ['NFL', 'NBA', 'NHL', 'MLB']
    p_values = pd.DataFrame({k:np.nan for k in sports}, index=sports)
    
    assert abs(p_values.loc["NBA", "NHL"] - 0.02) <= 1e-2, "The NBA-NHL p-value should be around 0.02"
    assert abs(p_values.loc["MLB", "NFL"] - 0.80) <= 1e-2, "The MLB-NFL p-value should be around 0.80"
    return p_values