# Debate Gender Disparities
## [Samarth Chitgopekar](https://smrth.dev)
*This notebook & its parent repository contain all of the needed data and code to reproduce all pertinent figures in 'Analyzing Gender & Performance In Competitive Environments
With Machine Learning: A High School Debate Case Study' by Samarth Chitgopekar in 2022.*

## Step 1: Imports, Configuration, and Raw-Data Parsing

Define a constant, `YEAR`, that we can change between runs to get results from a different year.

In [795]:
#YEAR = "21-22"
YEAR = "20-21"

Now import our required modules.

In [796]:
import json
import numpy as np
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import plotly.io as pio
import random
import nltk
from nltk.corpus import names
init_notebook_mode(connected=True)

Read in our year-specific data, the schema for which is a dictionary of [Team](https://docs.tournaments.tech/api-constants-and-schemas)s with a top-level key of `LastName1 & LastName2`.

In [797]:
with open(f"../data/20{YEAR} MASTER.json", 'r') as f:
    data = json.loads(f.read())

## Step 2: Classification Model Training

Use the NLTK Corpus known dataset to train a classifier that'll return one of our constants, `MALE` or `FEMALE` given a name.

In [798]:
MALE: str = "MALE"
FEMALE: str = "FEMALE"

def gender_features(word):
    return {'last_letter':word[-1]}

labeled_names = ([(name, MALE) for name in names.words('male.txt')]+
             [(name, FEMALE) for name in names.words('female.txt')])

random.shuffle(labeled_names)

featuresets = [(gender_features(n), gender)
               for (n, gender)in labeled_names]

train_set, test_set = featuresets[500:], featuresets[:500]

classifier = nltk.NaiveBayesClassifier.train(train_set)

f"Classification Accuracy with known dataset of {round(nltk.classify.accuracy(classifier, train_set)*100, 3)}%"

'Classification Accuracy with known dataset of 76.33%'

Create a helper function that'll determine the gender of an input name using our classifier.

In [799]:
def classifyGender(name: str) -> str:
    """Uses trained dataset ( > 70% acc. ) to classify an input name
    as either male or female.

    Args:
        name (str): the first name to test

    Returns:
        str: either the constants MALE or FEMALE
    """

    return classifier.classify(gender_features(name))

## Step 3: Derive and Structure Pertinent Data

Helper function to derive mean speaker statistics.

In [800]:
def aggregateSpeakerStats(team: dict):
    """Aggregates speaker stats.

    Args:
        team (dict): A dictionary of schema 'Team'

    Returns:
        Tuple | None: Returns a tuple (or None if not available)
        containing the following:
            - float: Raw Speaker Average
            - float: Adjusted Speaker Average with IQR of 2
    """
    c = 0
    raw = 0
    adj = 0

    tournaments = team['tournaments']

    for t in tournaments:
        if len(t['speaks']) < 2: continue
        s1 = t['speaks'][0]
        s2 = t['speaks'][1]
        if not s1['rawAVG'] or not s2['rawAVG']: continue
        raw += s1['rawAVG'] + s2['rawAVG']
        adj += s1['adjAVG'] + s2['adjAVG']
        c += 2

    if c == 0: return None
    return raw/c, adj/c

Iterate over all the teams and collect the number of bids, preliminary & elimination round win percentages, percentage of times made to elimination rounds, OTR Score, and speaker statistics. Add this data to a `MALES` array if both debaters are classified as males, `FEMALES` array if both debaters are classified as females, or `MIXEDS` array for teams classified as having 1 male and 1 female.

In [801]:
MALES = []
FEMALES = []
MIXEDS = []

for team in data:
    splitFullNames = data[team]['tournaments'][0]['fullNames'].split('&')
    if len(splitFullNames) != 2:
        continue

    fullName1, fullName2 = splitFullNames
    firstName1 = fullName1.split(' ')[0]
    firstName2 = fullName2.split(' ')[1]
    gender1 = classifyGender(firstName1)
    gender2 = classifyGender(firstName2)

    try:
        bwp =  data[team]['breakWinPCT']
    except KeyError:
        bwp = None

    teamData = {
        'bids': data[team]['goldBids'] + 0.5 * data[team]['silverBids'],
        'prelimWinPCT': data[team]['prelimWinPCT'],
        'breakWinPCT': bwp,
        'breakPCT': data[team]['breakPCT'],
        'otrScore': data[team]['otrScore'],
        'speakerStats': aggregateSpeakerStats(data[team]) #raw, adj
    }

    if gender1 == MALE and gender2 == MALE:
        MALES.append(teamData)

    elif gender1 == FEMALE and gender2 == FEMALE:
        FEMALES.append(teamData)

    else:
        MIXEDS.append(teamData)

Get averages for all measured statistics.

In [802]:
rawSpeaksM = []
adjSpeaksM = []
prelimWinPCTM = []
elimWinPCTM = []
otrScoreM = []

for team in MALES:
    if team['speakerStats']:
        rawSpeaksM.append(team['speakerStats'][0])
        adjSpeaksM.append(team['speakerStats'][1])

    if team['breakWinPCT']:
        elimWinPCTM.append(team['breakWinPCT'])

    prelimWinPCTM.append(team['prelimWinPCT'])
    otrScoreM.append(team['otrScore'])

avgRawSpeaksM = np.mean(rawSpeaksM)
avgAdjSpeaksM = np.mean(adjSpeaksM)
avgPrelimWinPCTM = np.mean(prelimWinPCTM)
avgElimWinPCTM = np.mean(elimWinPCTM)
avgOtrScoreM = np.mean(otrScoreM)

In [803]:
rawSpeaksF = []
adjSpeaksF = []
prelimWinPCTF = []
elimWinPCTF = []
otrScoreF = []

for team in FEMALES:
    if team['speakerStats']:
        rawSpeaksF.append(team['speakerStats'][0])
        adjSpeaksF.append(team['speakerStats'][1])

    if team['breakWinPCT']:
        elimWinPCTF.append(team['breakWinPCT'])

    prelimWinPCTF.append(team['prelimWinPCT'])
    otrScoreF.append(team['otrScore'])

avgRawSpeaksF = np.mean(rawSpeaksF)
avgAdjSpeaksF = np.mean(adjSpeaksF)
avgPrelimWinPCTF = np.mean(prelimWinPCTF)
avgElimWinPCTF = np.mean(elimWinPCTF)
avgOtrScoreF = np.mean(otrScoreF)

## Step 4: Create & Plot Gender v. Prelim. Win Percentage Data

Create a temporary dependent data 2D array where each subarray matches `[int number of males in category, int number of females in category]`. Then, create an independent data array of all of the different win percentage categories. I use steps of 10. Iterate over the `MALES` and `FEMALES` arrays and add to appropriate positions in the temporary dependent data. Then, iterate over the temporary dependent data and create a new dependent data array where each corresponding index is the percentage of that category dominated by males.

In [804]:
dependentData_p = [
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
]

dependentData = []

independentData = [
    '0-10%',
    '10-20%',
    '20-30%',
    '30-40%',
    '40-50%',
    '50-60%',
    '60-70%',
    '70-80%',
    '80-90%',
    '90-100%'
]

dependentDataM = []
dependentDataF = []


for team in MALES:
    idx = round(team['prelimWinPCT'] * 10) - 1
    try:
        dependentData_p[idx][0] += 1
    except Exception:
        print(idx)

for team in FEMALES:
    idx = round(team['prelimWinPCT'] * 10) - 1
    try:
        dependentData_p[idx][1] += 1
    except Exception:
        print(idx)

for d in dependentData_p:
    maleDominance = d[0]/len(MALES) * 100
    dependentDataM.append(maleDominance)

    femaleDominance = d[1]/len(FEMALES) * 100
    dependentDataF.append(femaleDominance)

Linearize the data and create dependent data for it using `y = mx + b` for each category.

In [805]:
a, b, c, d, e, f, g, h  = np.polyfit([i for i in range(0, 9)], dependentDataM[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v

dependentDataLinearM = [linear_y(i) for i in range(0, 10)]

a, b, c, d, e, f, g, h = np.polyfit([i for i in range(0, 9)], dependentDataF[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v


dependentDataLinearF = [linear_y(i) for i in range(0, 10)]

Use plotly to plot the data and write it to the disk.

In [806]:
layout = go.Layout(
    #title = f"Avg. Preliminary Round Win Rate: {round(avgPrelimWinPCTM, 2)}% (M), {round(avgPrelimWinPCTF, 2)}% (F)",
    xaxis = {"title": "Preliminary Round Win Rate Ranges"},
    yaxis = {"title": "% of Respective Gender Present In Range"},
)

fig = go.Figure(layout=layout)

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataM,
    mode = 'markers',
    name = '% of All Males In Win Rate Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataF,
    mode = 'markers',
    name = '% of All Females In Win Rate Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearM,
    mode = 'lines',
    name = f"Male 7th Degree Least Squares Polynomial Fit"
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearF,
    mode = 'lines',
    name = f"Female 7th Degree Least Squares Polynomial Fit"
))

fig.update_layout(
    autosize=False,
    width=1500,
    height=600,
    font_size=24,
    font_family="Times New Roman",
    font_color="black",
    title_font_size=24,
    template="plotly_white",
    title = {
        'y':0.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }
)

iplot(fig)
fig.write_image(f"../assets/Gender_PrelimWinPCT_{YEAR}.svg")

## Step 5: Create & Plot Gender v. OTR Score Data

Create a temporary dependent data 2D array where each subarray matches `[int number of males in category, int number of females in category]`. Then, create an independent data array of all of the different OTR Score categories. I use steps of 0.2. Iterate over the `MALES` and `FEMALES` arrays and add to appropriate positions in the temporary dependent data. Then, iterate over the temporary dependent data and create a new dependent data array where each corresponding index is the percentage of that category dominated by males.

In [807]:
def getRangeIdx(otrScore: float) -> int:
    """Returns independent data index of OTR score range.

    Args:
        otrScore (float): OTR score

    Returns:
        int: Index of range
    """

    idx = 10

    for i in reversed(range(2, 22, 2)):
        if otrScore > i/10:
            if idx > 9:
                idx = 9
            break
        if idx >= 1:
            idx -= 1
        else:
            break

    return idx

In [808]:
dependentData_p = [
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
]

dependentDataM = []
dependentDataF = []

independentData = [
    '0-0.2',
    '0.2-0.4',
    '0.4-0.6',
    '0.6-0.8',
    '0.8-1.0',
    '1.0-1.2',
    '1.2-1.4',
    '1.6-1.8',
    '1.8-2.0',
    '2.0+'
]

for team in MALES:
    idx = getRangeIdx(team['otrScore'])
    try:
        dependentData_p[idx][0] += 1
    except Exception:
        print(idx)

for team in FEMALES:
    idx = getRangeIdx(team['otrScore'])
    try:
        dependentData_p[idx][1] += 1
    except Exception:
        print(idx)

for d in dependentData_p:
    maleDominance = d[0]/len(MALES) * 100
    dependentDataM.append(maleDominance)

    femaleDominance = d[1]/len(FEMALES) * 100
    dependentDataF.append(femaleDominance)

Linearize the data and create dependent data for it using `y = mx + b` for each category.

In [809]:
# Not includding the last 2.0+ range since it isn't a real-number bounded category
a, b, c, d, e, f, g, h  = np.polyfit([i for i in range(0, 9)], dependentDataM[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v

dependentDataLinearM = [linear_y(i) for i in range(0, 10)]

# Not includding the last 2.0+ range since it isn't a real-number bounded category
a, b, c, d, e, f, g, h = np.polyfit([i for i in range(0, 9)], dependentDataF[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v


dependentDataLinearF = [linear_y(i) for i in range(0, 10)]

Use plotly to plot the data and write it to the disk.

In [810]:
layout = go.Layout(
    #title = f"Avg. OTR Score: {round(avgOtrScoreM, 2)} (M), {round(avgOtrScoreF, 2)} (F)",
    xaxis = {"title": "OTR Score Ranges"},
    yaxis = {"title": "% of Respective Gender Present In Range"},
)

fig = go.Figure(layout=layout)

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataM,
    mode = 'markers',
    name = '% of All Males In OTR Score Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataF,
    mode = 'markers',
    name = '% of All Females In OTR Score Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearM,
    mode = 'lines',
    name = f"Male 7th Degree Least Squares Polynomial Fit"
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearF,
    mode = 'lines',
    name = f"Female 7th Degree Least Squares Polynomial Fit"
))

fig.update_layout(
    autosize=False,
    width=1500,
    height=600,
    font_size=24,
    font_family="Times New Roman",
    font_color="black",
    title_font_size=24,
    template="plotly_white",
    title = {
        'y':0.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }
)


iplot(fig)
fig.write_image(f"../assets/Gender_OTRScore_{YEAR}.svg")

## Step 6: Create & Plot Avg. Raw Speaker Points v. Gender Data

In [811]:
def getRangeIdx(speakerPoints: float) -> int:
    """Returns independent data index of speaker points range.

    Args:
        otrScore (float): Speaker Points

    Returns:
        int: Index of range
    """

    idx = 9

    for i in reversed(range(265, 295, 3)):
        if speakerPoints > i/10:
            break
        if idx >= 1:
            idx -= 1
        else:
            break

    return idx

In [812]:
dependentData_p = [
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
]

dependentDataM = []
dependentDataF = []

independentData = [
    '0-26.8',
    '26.8-27.1',
    '27.1-27.4',
    '27.4-27.7',
    '27.7-28.0',
    '28.0-28.3',
    '28.3-28.6',
    '28.6-28.9',
    '28.9-29.2',
    '29.2+'
]

# Teams w/o speaker stats
MISSING_M = 0
MISSING_F = 0

for team in MALES:
    try:
        idx = getRangeIdx(team['speakerStats'][0])
        dependentData_p[idx][0] += 1
    except Exception:
        MISSING_M += 1
        continue

for team in FEMALES:
    try:
        idx = getRangeIdx(team['speakerStats'][0])
        dependentData_p[idx][1] += 1
    except Exception:
        MISSING_F += 1
        continue

for d in dependentData_p:
    maleDominance = d[0]/(len(MALES) - MISSING_M) * 100
    dependentDataM.append(maleDominance)

    femaleDominance = d[1]/(len(FEMALES) - MISSING_F) * 100
    dependentDataF.append(femaleDominance)

In [813]:
# We can include 29.2+ since it has a real upper bound of 30
a, b, c, d, e, f, g, h  = np.polyfit([i for i in range(0, 10)], dependentDataM, 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v

dependentDataLinearM = [linear_y(i) for i in range(0, 10)]

# We can include 29.2+ since it has a real upper bound of 30
a, b, c, d, e, f, g, h = np.polyfit([i for i in range(0, 10)], dependentDataF, 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v


dependentDataLinearF = [linear_y(i) for i in range(0, 10)]

In [814]:
layout = go.Layout(
    #title = f"Avg. Raw Speaker Points: {round(avgRawSpeaksM, 2)} (M), {round(avgRawSpeaksF, 2)} (F)",
    xaxis = {"title": "Speaker Point Ranges"},
    yaxis = {"title": "% of Respective Gender Present In Range"},
)

fig = go.Figure(layout=layout)

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataM,
    mode = 'markers',
    name = '% of All Males In Speaker Point Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataF,
    mode = 'markers',
    name = '% of All Females In Speaker Point Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearM,
    mode = 'lines',
    name = f"Male 7th Degree Least Squares Polynomial Fit"
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearF,
    mode = 'lines',
    name = f"Female 7th Degree Least Squares Polynomial Fit"
))

fig.update_layout(
    autosize=False,
    width=1500,
    height=600,
    font_size=24,
    font_family="Times New Roman",
    font_color="black",
    title_font_size=24,
    template="plotly_white",
    title = {
        'y':0.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }
)


iplot(fig)
fig.write_image(f"../assets/Gender_RawSpeaks_{YEAR}.svg")

## Step 7: Create & Plot Avg. Adj. Speaker Points v. Gender Data

In [815]:
def getRangeIdx(speakerPoints: float) -> int:
    """Returns independent data index of speaker points range.

    Args:
        otrScore (float): Speaker Points

    Returns:
        int: Index of range
    """

    idx = 9

    for i in reversed(range(265, 295, 3)):
        if speakerPoints > i/10:
            break
        if idx >= 1:
            idx -= 1
        else:
            break

    return idx

In [816]:
dependentData_p = [
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
]

dependentDataM = []
dependentDataF = []

independentData = [
    '0-26.8',
    '26.8-27.1',
    '27.1-27.4',
    '27.4-27.7',
    '27.7-28.0',
    '28.0-28.3',
    '28.3-28.6',
    '28.6-28.9',
    '28.9-29.2',
    '29.2+'
]

# Teams w/o speaker stats
MISSING_M = 0
MISSING_F = 0

for team in MALES:
    try:
        idx = getRangeIdx(team['speakerStats'][1])
        dependentData_p[idx][0] += 1
    except Exception:
        MISSING_M += 1
        continue

for team in FEMALES:
    try:
        idx = getRangeIdx(team['speakerStats'][1])
        dependentData_p[idx][1] += 1
    except Exception:
        MISSING_F += 1
        continue

for d in dependentData_p:
    maleDominance = d[0]/(len(MALES) - MISSING_M) * 100
    dependentDataM.append(maleDominance)

    femaleDominance = d[1]/(len(FEMALES) - MISSING_F) * 100
    dependentDataF.append(femaleDominance)

In [817]:
# We can include 29.2+ since it has a real upper bound of 30
a, b, c, d, e, f, g, h  = np.polyfit([i for i in range(0, 10)], dependentDataM, 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v

dependentDataLinearM = [linear_y(i) for i in range(0, 10)]

# We can include 29.2+ since it has a real upper bound of 30
a, b, c, d, e, f, g, h = np.polyfit([i for i in range(0, 10)], dependentDataF, 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v


dependentDataLinearF = [linear_y(i) for i in range(0, 10)]

In [818]:
layout = go.Layout(
    #title = f"Avg. Adj. Speaker Points: {round(avgAdjSpeaksM, 2)} (M), {round(avgAdjSpeaksF, 2)} (F)",
    xaxis = {"title": "Speaker Point Ranges"},
    yaxis = {"title": "% of Respective Gender Present In Range"},
)

fig = go.Figure(layout=layout)

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataM,
    mode = 'markers',
    name = '% of All Males In Speaker Point Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataF,
    mode = 'markers',
    name = '% of All Females In Speaker Point Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearM,
    mode = 'lines',
    name = f"Male 7th Degree Least Squares Polynomial Fit"
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearF,
    mode = 'lines',
    name = f"Female 7th Degree Least Squares Polynomial Fit"
))

fig.update_layout(
    autosize=False,
    width=1500,
    height=600,
    font_size=24,
    font_family="Times New Roman",
    font_color="black",
    title_font_size=24,
    template="plotly_white",
    title = {
        'y':0.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }
)


iplot(fig)
fig.write_image(f"../assets/Gender_AdjSpeaks_{YEAR}.svg")

## Step 8: Create & Plot Gender v. Elim. Win Percentage Data

In [819]:
dependentData_p = [
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
    [0, 0],
]

dependentData = []

independentData = [
    '0-10%',
    '10-20%',
    '20-30%',
    '30-40%',
    '40-50%',
    '50-60%',
    '60-70%',
    '70-80%',
    '80-90%',
    '90-100%'
]

dependentDataM = []
dependentDataF = []

# Teams w/o break stats
MISSING_M = 0
MISSING_F = 0

for team in MALES:
    try:
        idx = round(team['breakWinPCT'] * 10) - 1
        dependentData_p[idx][0] += 1
    except Exception:
        MISSING_M += 1
        continue

for team in FEMALES:
    try:
        idx = round(team['breakWinPCT'] * 10) - 1
        dependentData_p[idx][1] += 1
    except Exception:
        MISSING_F += 1
        continue

for d in dependentData_p:
    maleDominance = d[0]/(len(MALES) - MISSING_M) * 100
    dependentDataM.append(maleDominance)

    femaleDominance = d[1]/(len(FEMALES) - MISSING_F) * 100
    dependentDataF.append(femaleDominance)

In [820]:
a, b, c, d, e, f, g, h  = np.polyfit([i for i in range(0, 9)], dependentDataM[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v

dependentDataLinearM = [linear_y(i) for i in range(0, 10)]

a, b, c, d, e, f, g, h = np.polyfit([i for i in range(0, 9)], dependentDataF[:-1], 7)

def linear_y(x):
    v = a*x**7 + b*x**6 + c*x**5 + d*x**4 + e*x**3 + f*x**2 + g*x + h
    if v < 0: return 0
    elif v > 100: return 100
    return v


dependentDataLinearF = [linear_y(i) for i in range(0, 10)]

In [821]:
layout = go.Layout(
   # title = f"Avg. Elimination Round Win Rate: {round(avgElimWinPCTM, 2)}% (M), {round(avgElimWinPCTF, 2)}% (F)",
    xaxis = {"title": "Elimination Round Win Rate Ranges"},
    yaxis = {"title": "% of Respective Gender Present In Range"},
)

fig = go.Figure(layout=layout)

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataM,
    mode = 'markers',
    name = '% of All Males In Win Rate Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataF,
    mode = 'markers',
    name = '% of All Females In Win Rate Range'
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearM,
    mode = 'lines',
    name = f"Male 7th Degree Least Squares Polynomial Fit"
))

fig.add_trace(go.Scatter(
    x = independentData,
    y = dependentDataLinearF,
    mode = 'lines',
    name = f"Female 7th Degree Least Squares Polynomial Fit"
))

fig.update_layout(
    autosize=False,
    width=1500,
    height=600,
    font_size=24,
    font_family="Times New Roman",
    font_color="black",
    title_font_size=24,
    template="plotly_white",
    title = {
        'y':0.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }
)


iplot(fig)
fig.write_image(f"../assets/Gender_ElimWinPCT_{YEAR}.svg")