# Interactive Widget: Back End Code

Throughout this workbook, we used steps from the following web pages to inform our widgets.
- https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Basics.html
- https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html
- https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html

## Setting Up the Model for the Widget

### Set up the training and testing sets.

In [51]:
# Import necessary data libraries.
from collections import Counter
from imblearn.datasets import fetch_datasets
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from imblearn.pipeline import make_pipeline as make_pipeline_imb
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import NearMiss
from imblearn.metrics import classification_report_imbalanced
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, accuracy_score, classification_report
import numpy as np 
import pandas as pd

In [52]:
# Set up datasets.
master_data_url = 'https://raw.githubusercontent.com/georgetown-analytics/Formula1/main/data/processed/MasterData5.csv'
master_data = pd.read_csv(master_data_url, sep = ',', engine = 'python')
one_hot_url = 'https://raw.githubusercontent.com/georgetown-analytics/Formula1/main/data/processed/OneHot_MasterData5.csv'
one_hot = pd.read_csv(one_hot_url, sep = ',', engine = 'python')

In [3]:
# Drop any nulls.
data_df = one_hot.dropna(axis=0)

In [54]:
# Establish our X (independent) variables.
X = data_df[['grid', 'alt', 'average_lap_time',
       'minimum_lap_time', 'PRCP', 'TAVG', 'TMAX', 'TMIN',
       'country_CompletionStatus_1', 'nationality_CompletionStatus_1',
       'binned_circuits_CompletionStatus_1', "trackType_CompletionStatus_1",
       'country_CompletionStatus_2', 'nationality_CompletionStatus_2',
       'binned_circuits_CompletionStatus_2', "trackType_CompletionStatus_2"]]

In [55]:
# Establish our y (dependent, target) variable.
y = data_df['CompletionStatus']

In [56]:
# Split our data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [57]:
# Import SMOTE so we can deal with our class imbalance.
from imblearn.over_sampling import SMOTE, ADASYN

In [58]:
# Use SMOTE on our X_ and y_train to create X_ and y_resampled.
X_resampled, y_resampled = SMOTE().fit_resample(X_train, y_train)

In [59]:
# Check the balance of our resampled data.
print(sorted(Counter(y_resampled).items()))

[(0, 4405), (1, 4405)]


Above we can see that we've fixed the class imbalance of our training sets.

### Create CSV Files

In order to not have a randomized training set every time someone uses the widget, we'll create CSV files of our training data that we can call back to.

In [60]:
# Use pandas.DataFrame.to_csv to create the CSV file.
X_resampled.to_csv("data/interim/X_resampled_forWidget.csv", index = False)

In [61]:
# Use pandas.DataFrame.to_csv to create the CSV file.
y_resampled.to_csv("data/interim/y_resampled_forWidget.csv", index = False)

Further down, upon running our model and after we brought in the above CSV files, we got an error stating `"A column-vector y was passed when a 1d array was expected."` We know that the model worked before hand, so we need to revert our new y_resampled to the same type it used to be.

In [62]:
# What type was y_resampled?
type(y_resampled)

pandas.core.series.Series

The result above says that `y_resampled` used to be pandas.core.series.Series.

### Set Up the Initial Model

Although our work involves several models, we're only using one for now: Logistic Regression. This model will run with the regular `X_test` and `y_test` data.

In [63]:
# Import the necessary data libraries that we'll need for our model.
from sklearn.metrics import f1_score
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split as tts
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ClassificationReport

In [64]:
# Set up datasets.
X_resampled_url = 'https://raw.githubusercontent.com/georgetown-analytics/Formula1/main/data/interim/X_resampled_forWidget.csv'
X_resampled = pd.read_csv(X_resampled_url, sep = ',', engine = 'python')
y_resampled_url = 'https://raw.githubusercontent.com/georgetown-analytics/Formula1/main/data/interim/y_resampled_forWidget.csv'
y_resampled = pd.read_csv(y_resampled_url, sep = ',', engine = 'python')

In [65]:
# View X_resampled.
X_resampled.head()

Unnamed: 0,grid,alt,average_lap_time,minimum_lap_time,PRCP,TAVG,TMAX,TMIN,country_CompletionStatus_1,nationality_CompletionStatus_1,binned_circuits_CompletionStatus_1,trackType_CompletionStatus_1,country_CompletionStatus_2,nationality_CompletionStatus_2,binned_circuits_CompletionStatus_2,trackType_CompletionStatus_2
0,20,678,71014.471429,68216,0.0,61.0,67.0,50.0,0.27193,0.239583,0.24475,0.237243,0.72807,0.760417,0.75525,0.762757
1,24,785,91658.782609,81085,0.0,74.0,81.0,67.0,0.261224,0.23114,0.277588,0.237243,0.738776,0.76886,0.722412,0.762757
2,16,2,108154.058824,103979,0.0,57.0,78.0,42.0,0.113636,0.239583,0.213611,0.237243,0.886364,0.760417,0.786389,0.762757
3,10,-7,110366.686275,106822,0.83,59.0,65.0,51.0,0.24,0.240838,0.213611,0.287045,0.76,0.759162,0.786389,0.712955
4,2,678,70065.746479,67058,0.0,72.0,82.0,62.0,0.27193,0.240838,0.24475,0.237243,0.72807,0.759162,0.75525,0.762757


We know from testing the type of `y_resampled` before we brought in the CSV files that `y_resampled` needs to be a series in order for our model to run correctly. We also know from this site (https://datatofish.com/pandas-dataframe-to-series/) how to change a dataframe into a series.

In [66]:
# Change the y_resampled dataframe into a y_resampled series.
y_resampled = y_resampled.squeeze()

In [67]:
# View y_resampled.
y_resampled.head()

0    1
1    1
2    1
3    1
4    1
Name: CompletionStatus, dtype: int64

In [68]:
# Create the function score_model.
def score_model(X_resampled, y_resampled, X_test, y_test, estimator, **kwargs):
    """
    Test various estimators.
    """
    # Instantiate the classification model and visualizer.
    estimator.fit(X_resampled, y_resampled, **kwargs)  
    
    expected  = y_test
    predicted = estimator.predict(X_test)
    
    # Compute and return F1 (harmonic mean of precision and recall).
    print("{}: {}".format(estimator.__class__.__name__, f1_score(expected, predicted)))

In [69]:
# Run the Logistic Regression model.
score_model(X_resampled, y_resampled, X_test, y_test, LogisticRegression(solver='lbfgs'))

LogisticRegression: 0.6375921375921376


## Widget Experimentation

### Set Up

In [70]:
# Import necessary data libraries.
import pandas as pd
import os 
import csv
import io
import requests
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
import category_encoders as ce

# The following are for Jupyter Widgets.
import ipywidgets as widgets
from IPython.display import display
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from ipywidgets import FloatSlider

In [71]:
# What columns are in one_hot?
one_hot.columns

Index(['raceId', 'driverId', 'constructorId', 'grid', 'position',
       'positionOrder', 'laps', 'fastestLap', 'rank', 'fastestLapSpeed',
       'familyStatus', 'CompletionStatus', 'year', 'circuitId', 'alt',
       'isHistoric', 'total_lap_time', 'average_lap_time', 'minimum_lap_time',
       'PRCP', 'TAVG', 'TMAX', 'TMIN', 'positionText_CompletionStatus_1',
       'country_CompletionStatus_1', 'trackType_CompletionStatus_1',
       'nationality_CompletionStatus_1',
       'bundled_circuitId_CompletionStatus_1',
       'binned_circuits_CompletionStatus_1', 'positionText_CompletionStatus_2',
       'country_CompletionStatus_2', 'trackType_CompletionStatus_2',
       'nationality_CompletionStatus_2',
       'bundled_circuitId_CompletionStatus_2',
       'binned_circuits_CompletionStatus_2'],
      dtype='object')

In [72]:
# Select the identifiable columns and the columns that are one-hot encoded. Put these into refined_one_hot.
refined_one_hot = one_hot[['raceId', 'driverId',
       'country_CompletionStatus_1', 'trackType_CompletionStatus_1',
       'nationality_CompletionStatus_1',
       'binned_circuits_CompletionStatus_1',
       'country_CompletionStatus_2', 'trackType_CompletionStatus_2',
       'nationality_CompletionStatus_2',
       'binned_circuits_CompletionStatus_2']]

In [73]:
# Check we have the correct columns in refined_one_hot.
refined_one_hot.columns

Index(['raceId', 'driverId', 'country_CompletionStatus_1',
       'trackType_CompletionStatus_1', 'nationality_CompletionStatus_1',
       'binned_circuits_CompletionStatus_1', 'country_CompletionStatus_2',
       'trackType_CompletionStatus_2', 'nationality_CompletionStatus_2',
       'binned_circuits_CompletionStatus_2'],
      dtype='object')

In [74]:
# What columns are in master_data?
master_data.columns

Index(['raceId', 'driverId', 'constructorId', 'grid', 'laps', 'familyStatus',
       'Completion Status', 'year', 'circuitId', 'country', 'alt',
       'isHistoric', 'trackType', 'nationality', 'total_lap_time',
       'average_lap_time', 'minimum_lap_time', 'PRCP', 'TAVG', 'TMAX', 'TMIN',
       'binned_circuits'],
      dtype='object')

In [75]:
# Select the identifiable columns and the columns that will be one-hot encoded. Put these into refined_master.
refined_master = master_data[['raceId', 'driverId', 'country', 'trackType', 'nationality', 'binned_circuits']]

In [76]:
# Check we have the correct columns in refined_master.
refined_master.columns

Index(['raceId', 'driverId', 'country', 'trackType', 'nationality',
       'binned_circuits'],
      dtype='object')

In [77]:
# Merge refined_one_hot with refined_master by "raceId" and "driverId" to get refined_total.
refined_total = pd.merge(refined_master, refined_one_hot, on = ["raceId", "driverId"])
refined_total.head()

Unnamed: 0,raceId,driverId,country,trackType,nationality,binned_circuits,country_CompletionStatus_1,trackType_CompletionStatus_1,nationality_CompletionStatus_1,binned_circuits_CompletionStatus_1,country_CompletionStatus_2,trackType_CompletionStatus_2,nationality_CompletionStatus_2,binned_circuits_CompletionStatus_2
0,1,2,Australia,2,German,Tier2,0.351812,0.287045,0.209566,0.277588,0.648188,0.712955,0.790434,0.722412
1,1,3,Australia,2,German,Tier2,0.351812,0.287045,0.209566,0.277588,0.648188,0.712955,0.790434,0.722412
2,1,4,Australia,2,Spanish,Tier2,0.351812,0.287045,0.23114,0.277588,0.648188,0.712955,0.76886,0.722412
3,1,6,Australia,2,Japanese,Tier2,0.351812,0.287045,0.361371,0.277588,0.648188,0.712955,0.638629,0.722412
4,1,7,Australia,2,French,Tier2,0.351812,0.287045,0.258394,0.277588,0.648188,0.712955,0.741606,0.722412


### Working with the Data in the Input Columns

In [78]:
# What features are in X_resampled and will therefore be required for our widget?
X_resampled.columns

Index(['grid', 'alt', 'average_lap_time', 'minimum_lap_time', 'PRCP', 'TAVG',
       'TMAX', 'TMIN', 'country_CompletionStatus_1',
       'nationality_CompletionStatus_1', 'binned_circuits_CompletionStatus_1',
       'trackType_CompletionStatus_1', 'country_CompletionStatus_2',
       'nationality_CompletionStatus_2', 'binned_circuits_CompletionStatus_2',
       'trackType_CompletionStatus_2'],
      dtype='object')

As shown above, with slight changes to account for the one-hot encoding, we'll have to ask interactors to choose grid, altitude, an average lap time and minimum lap time, precipitation, temperatures (average, minimum, and maximum), country, nationality, circuit, and track type. We will change the country, nationality, circuit, and track type in the function to match their one-hot encoding. Because there are so many options, though, we will only allow a few choices for these. Track type will be the only one-hot encoded feature that shows all possible choices, as there are only two to begin with.

In [79]:
# What are the most popular nationalities?
refined_total[["nationality", "nationality_CompletionStatus_1", "nationality_CompletionStatus_2"]].value_counts()

nationality    nationality_CompletionStatus_1  nationality_CompletionStatus_2
German         0.209566                        0.790434                          1561
British        0.240838                        0.759162                          1314
Brazilian      0.292359                        0.707641                           871
Finnish        0.206369                        0.793631                           776
French         0.258394                        0.741606                           663
Italian        0.317841                        0.682159                           657
Spanish        0.231140                        0.768860                           612
Australian     0.209360                        0.790640                           397
Japanese       0.361371                        0.638629                           308
Dutch          0.316901                        0.683099                           281
Canadian       0.278810                        0.721190       

The most popular nationalities of drivers are German, British, and Brazilian.

In [80]:
# What are the most popular countries?
refined_total[["country", "country_CompletionStatus_1", "country_CompletionStatus_2"]].value_counts()

country     country_CompletionStatus_1  country_CompletionStatus_2
Italy       0.279099                    0.720901                      786
Germany     0.291429                    0.708571                      690
Spain       0.219697                    0.780303                      651
UK          0.220339                    0.779661                      523
Hungary     0.229446                    0.770554                      513
Monaco      0.351562                    0.648438                      496
Japan       0.217039                    0.782961                      485
Brazil      0.261224                    0.738776                      474
Canada      0.334737                    0.665263                      453
Australia   0.351812                    0.648188                      449
Belgium     0.230088                    0.769912                      441
Malaysia    0.260759                    0.739241                      389
Bahrain     0.177112                    0.822

The most popular countries are Italy, Germany, and Spain.

In [81]:
# What are the most popular binned circuits?
refined_total[["binned_circuits", "binned_circuits_CompletionStatus_1", "binned_circuits_CompletionStatus_2"]].value_counts()

binned_circuits  binned_circuits_CompletionStatus_1  binned_circuits_CompletionStatus_2
Tier1            0.253451                            0.746549                              2558
Tier2            0.277588                            0.722412                              2262
Tier3            0.235686                            0.764314                              1771
Tier4            0.244750                            0.755250                              1218
Tier5            0.213611                            0.786389                              1030
Tier6            0.223529                            0.776471                               419
dtype: int64

The most popular binned circuits are Tier1, Tier2, and Tier3.

In [88]:
# How was trackType one-hot encoded?
refined_total[["trackType", "trackType_CompletionStatus_1", "trackType_CompletionStatus_2"]].value_counts()

trackType  trackType_CompletionStatus_1  trackType_CompletionStatus_2
0          0.237243                      0.762757                        7070
2          0.287045                      0.712955                        2188
dtype: int64

In [90]:
# What minimum and maximum numbers will we have to allow for in our input columns?
X_resampled.describe()

Unnamed: 0,grid,alt,average_lap_time,minimum_lap_time,PRCP,TAVG,TMAX,TMIN,country_CompletionStatus_1,nationality_CompletionStatus_1,binned_circuits_CompletionStatus_1,trackType_CompletionStatus_1,country_CompletionStatus_2,nationality_CompletionStatus_2,binned_circuits_CompletionStatus_2,trackType_CompletionStatus_2
count,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0,8810.0
mean,10.919864,182.187514,99737.115665,91672.348014,0.153843,69.569506,78.066776,61.450084,0.241976,0.24269,0.248192,0.251982,0.758024,0.75731,0.751808,0.748018
std,5.713557,288.710481,18197.500233,12762.446022,0.417893,8.757608,9.327086,9.34014,0.060899,0.046514,0.019098,0.021369,0.060899,0.046514,0.019098,0.021369
min,0.0,-7.0,62932.344828,55404.0,0.0,49.0,56.0,36.0,0.1,0.078947,0.213611,0.237243,0.648188,0.6,0.722412,0.712955
25%,6.0,10.0,86016.985797,80717.0,0.0,62.694237,70.0,55.0,0.217039,0.209566,0.235686,0.237243,0.723534,0.723092,0.741868,0.722256
50%,11.0,75.0,98747.952296,90706.0,0.0,69.983245,78.0,61.776486,0.230088,0.240838,0.247582,0.237243,0.769912,0.759162,0.752418,0.762757
75%,15.0,228.0,109138.300678,100598.5,0.11,76.0,85.0,67.815465,0.276466,0.276908,0.258132,0.277744,0.782961,0.790434,0.764314,0.762757
max,24.0,2227.0,213946.550725,122930.0,6.3,94.2,102.0,88.4,0.351812,0.4,0.277588,0.287045,0.9,0.921053,0.786389,0.762757


- grid has a min of 0 and a max of 24.
- alt has a min of -7.0 and a max of 2227.0.
- average_lap_time has a min of 62932.344828 and a max of 216112.776119.
- minimum_lap_time has a min of 55404.000000 and a max of 122930.000000.
- PRCP has a min of 0.0 and a max of 6.3.
- TAVG has a min of 49.0 and a max of 94.2.
- TMAX has a min of 56.0 and a max of 102.0.
- TMIN has a min of 36.0 and a max of 88.4.

### Building the Widget

Because the final widget's function will have a lot of code in it, we're going to slowly build the function one step at a time. These steps include:
1. Building a dropdown widget connected to a function containing an elif statement. This statement will change the display depending on what the user selects in the dropdown menu.
2. Building four dropdown widgets that all connect to the same function. Each widget connects to a different elif or if-else statement within that function, and each elif or if-else statement changes its own display.
3. Using the build from the prior widget, each elif or if-else statement changes the one-hot encoding for the connected dropdown menu. Each one-hot encoding number is placed in a new dataframe, which is displayed with the dropdown menus.
4. Using the build from the prior widget, we add all of the numeric columns that did not have to be one-hot encoded. These are not based on dropdown menus, but are instead bounded text boxes (both int and float). These are also placed in the dataframe, as well as displayed separately.
5. Using the build from the prior widget, we stop displaying the numeric features. We also add a modeling function that predicts whether a car will finish the race or not, based on the features that users input through the widget. Finally, we use an if-else statement to print a car's predicted outcome.

These steps are enacted below.

In [31]:
"""
Establish function "nationality" which allows selection of three nationalities, then returns a country.
"""
def nationality(nationality):
    # Use an elif statement to determine the output country name based on the input nationality.
    if nationality == "German":
        countryname = "Germany"
    elif nationality == "British":
        countryname = "England"
    else:
        countryname = "Brazil"
    display(countryname)

# Create a widget that will interact with the nationality function.
interact(nationality, nationality = widgets.Dropdown(options = ["German", "British", "Brazilian"], value = "German"));


interactive(children=(Dropdown(description='nationality', options=('German', 'British', 'Brazilian'), value='G…

In [99]:
"""
Establish function "fourreturn" which allows selection of three nationalities,
countries, and circuit tiers, then returns a country, language, and number.
It also includes a selection of two track types, which returns a type number.
"""
def fourreturn(nationality, country, circuit, trackType):
    # Use an elif statement to determine the output country name based on the input nationality.
    if nationality == "German":
        countryname = "Germany"
    elif nationality == "British":
        countryname = "Great Britain"
    else:
        countryname = "Brazil"
    display(countryname)
    
    # Use an elif statement to determine the output language based on the input country.
    if country == "Italy":
        language = "Italian"
    elif country == "Germany":
        language = "German"
    else:
        language = "Spanish"
    display(language)
    
    # Use an elif statement to determine the output number based on the input circuit.
    if circuit == "Tier1":
        number = "1"
    elif circuit == "Tier2":
        number = "2"
    else:
        number = "3"
    display(number)
    
    # Use an if-else statement to determine the output typetrack based on the input track.
    if trackType == "race":
        typetrack = "type0"
    else:
        typetrack = "type2"
    display(typetrack)
    
# Create a widget that will interact with the nationality function.
interact(fourreturn, nationality = widgets.Dropdown(options = ["German", "British", "Brazilian"], value = "German", description = "Nationality"),
         country = widgets.Dropdown(options = ["Italy", "Germany", "Spain"], value = "Italy", description = "Country"),
         circuit = widgets.Dropdown(options = ["Tier1", "Tier2", "Tier3"], value = "Tier1", description = "Circuit"),
         trackType = widgets.Dropdown(options = ["race", "street"], value = "race", description = "Track Type"));

interactive(children=(Dropdown(description='Nationality', options=('German', 'British', 'Brazilian'), value='G…

In the function below we create a single row dataframe using this site (https://www.geeksforgeeks.org/different-ways-to-create-pandas-dataframe/).

In [102]:
"""
Establish function "onehot" which allows selection of three nationalities,
countries, and circuit tiers, then inputs them into dataframe input_df and returns the dataframe.
It also allows the selection of two track types, and inputs that selection into the dataframe as well.
"""
def onehot(nationality, country, circuit, trackType):
    # Use an elif statement to determine the output one-hot encoding based on the input nationality.
    if nationality == "German":
        nationality_CompletionStatus_1 = 0.209566
        nationality_CompletionStatus_2 = 0.790434
    elif nationality == "British":
        nationality_CompletionStatus_1 = 0.240838
        nationality_CompletionStatus_2 = 0.759162
    else:
        nationality_CompletionStatus_1 = 0.292359
        nationality_CompletionStatus_2 = 0.707641
    
    # Use an elif statement to determine the output one-hot encoding based on the input country.
    if country == "Italy":
        country_CompletionStatus_1 = 0.279099
        country_CompletionStatus_2 = 0.720901
    elif country == "Germany":
        country_CompletionStatus_1 = 0.291429
        country_CompletionStatus_2 = 0.708571
    else:
        country_CompletionStatus_1 = 0.219697
        country_CompletionStatus_2 = 0.780303
    
    # Use an elif statement to determine the output one-hot encoding based on the input circuit.
    if circuit == "Tier1":
        binned_circuits_CompletionStatus_1 = 0.253451
        binned_circuits_CompletionStatus_2 = 0.746549
    elif circuit == "Tier2":
        binned_circuits_CompletionStatus_1 = 0.277588
        binned_circuits_CompletionStatus_2 = 0.722412
    else:
        binned_circuits_CompletionStatus_1 = 0.235686
        binned_circuits_CompletionStatus_2 = 0.764314
    
    # Use an if-else statement to determine the output one-hot encoding based on the input track.
    if trackType == "race":
        trackType_CompletionStatus_1 = 0.237243
        trackType_CompletionStatus_2 = 0.762757
    else:
        trackType_CompletionStatus_1 = 0.287045
        trackType_CompletionStatus_2 = 0.712955
    
    # Establish the data of our input_df dataframe.
    inputdata = [[nationality_CompletionStatus_1, nationality_CompletionStatus_2,
                country_CompletionStatus_1, country_CompletionStatus_2,
                binned_circuits_CompletionStatus_1, binned_circuits_CompletionStatus_2,
                trackType_CompletionStatus_1, trackType_CompletionStatus_2]]
    
    # Establish the dataframe input_df itself with pd.DataFrame.
    input_df = pd.DataFrame(inputdata, columns = ["nationality_CompletionStatus_1", "nationality_CompletionStatus_2",
                "country_CompletionStatus_1", "country_CompletionStatus_2",
                "binned_circuits_CompletionStatus_1", "binned_circuits_CompletionStatus_2",
                "trackType_CompletionStatus_1", "trackType_CompletionStatus_2"])
    
    return(input_df)
    
# Create a widget that will interact with the onehot function.
interact(onehot, nationality = widgets.Dropdown(options = ["German", "British", "Brazilian"], value = "German", description = "Nationality"),
         country = widgets.Dropdown(options = ["Italy", "Germany", "Spain"], value = "Italy", description = "Country"),
         circuit = widgets.Dropdown(options = ["Tier1", "Tier2", "Tier3"], value = "Tier1", description = "Circuit"),
         trackType = widgets.Dropdown(options = ["race", "street"], value = "race", description = "Track Type"));

interactive(children=(Dropdown(description='Nationality', options=('German', 'British', 'Brazilian'), value='G…

In [104]:
"""
Establish function "showvalues" which allows selection of three nationalities,
countries, and circuit tiers, as well as a selection of two track types and
input of one of each of the following values:
grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN. Display the values.

Place these values in the dataframe input_df and display the dataframe.
"""
def showvalues(nationality, country, circuit, trackType, grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN):
    # Use an elif statement to determine the output one-hot encoding based on the input nationality.
    if nationality == "German":
        nationality_CompletionStatus_1 = 0.209566
        nationality_CompletionStatus_2 = 0.790434
    elif nationality == "British":
        nationality_CompletionStatus_1 = 0.240838
        nationality_CompletionStatus_2 = 0.759162
    else:
        nationality_CompletionStatus_1 = 0.292359
        nationality_CompletionStatus_2 = 0.707641
    
    # Use an elif statement to determine the output one-hot encoding based on the input country.
    if country == "Italy":
        country_CompletionStatus_1 = 0.279099
        country_CompletionStatus_2 = 0.720901
    elif country == "Germany":
        country_CompletionStatus_1 = 0.291429
        country_CompletionStatus_2 = 0.708571
    else:
        country_CompletionStatus_1 = 0.219697
        country_CompletionStatus_2 = 0.780303
    
    # Use an elif statement to determine the output one-hot encoding based on the input circuit.
    if circuit == "Tier1":
        binned_circuits_CompletionStatus_1 = 0.253451
        binned_circuits_CompletionStatus_2 = 0.746549
    elif circuit == "Tier2":
        binned_circuits_CompletionStatus_1 = 0.277588
        binned_circuits_CompletionStatus_2 = 0.722412
    else:
        binned_circuits_CompletionStatus_1 = 0.235686
        binned_circuits_CompletionStatus_2 = 0.764314
        
    # Use an if-else statement to determine the output one-hot encoding based on the input track.
    if trackType == "race":
        trackType_CompletionStatus_1 = 0.237243
        trackType_CompletionStatus_2 = 0.762757
    else:
        trackType_CompletionStatus_1 = 0.287045
        trackType_CompletionStatus_2 = 0.712955
    
    # Establish the data of our input_df dataframe.
    inputdata = [[nationality_CompletionStatus_1, nationality_CompletionStatus_2,
                country_CompletionStatus_1, country_CompletionStatus_2,
                binned_circuits_CompletionStatus_1, binned_circuits_CompletionStatus_2,
                trackType_CompletionStatus_1, trackType_CompletionStatus_2,
                grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN]]
    
    # Establish the dataframe input_df itself with pd.DataFrame.
    input_df = pd.DataFrame(inputdata, columns =
                ["nationality_CompletionStatus_1", "nationality_CompletionStatus_2",
                "country_CompletionStatus_1", "country_CompletionStatus_2",
                "binned_circuits_CompletionStatus_1", "binned_circuits_CompletionStatus_2",
                "trackType_CompletionStatus_1", "trackType_CompletionStatus_2",
                "grid", "alt", "average_lap_time", "minimum_lap_time", "PRCP", "TAVG", "TMAX", "TMIN"])
    
    display(grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN)
    
    display(input_df)
    
# Create a widget that will interact with the showvalues function.
interact(showvalues, nationality = widgets.Dropdown(options = ["German", "British", "Brazilian"], value = "German", description = 'Nationality'),
         country = widgets.Dropdown(options = ["Italy", "Germany", "Spain"], value = "Italy", description = 'Country'),
         circuit = widgets.Dropdown(options = ["Tier1", "Tier2", "Tier3"], value = "Tier1", description = 'Circuit'),
         trackType = widgets.Dropdown(options = ["race", "street"], value = "race", description = 'Track Type'),
         grid = widgets.BoundedIntText(min = 0, max = 30, description = 'Grid', disabled = False, continuous_update = False),
         alt = widgets.BoundedFloatText(min = -100, max = 2500, description = 'Altitude', disabled = False, continuous_update = False),
         average_lap_time = widgets.BoundedFloatText(min = 0, max = 300000, description = 'Avg Lap Time', disabled = False, continuous_update = False),
         minimum_lap_time = widgets.BoundedFloatText(min = 0, max = 300000, description = 'Min Lap Time', disabled = False, continuous_update = False),
         PRCP = widgets.BoundedFloatText(min = 0, max = 20, description = 'Precipitation', disabled = False, continuous_update = False),
         TAVG = widgets.BoundedFloatText(min = 0, max = 120, description = 'Avg Temp (F)', disabled = False, continuous_update = False),
         TMAX = widgets.BoundedFloatText(min = 0, max = 120, description = 'Max Temp (F)', disabled = False, continuous_update = False),
         TMIN = widgets.BoundedFloatText(min = 0, max = 120, description = 'Min Temp (F)', disabled = False, continuous_update = False));

interactive(children=(Dropdown(description='Nationality', options=('German', 'British', 'Brazilian'), value='G…

In [105]:
# Create the function widgetpred. We'll use this in the function predict.
def widgetpred(X_resampled, y_resampled, X_test, estimator, **kwargs):
    """
    Test various estimators.
    """
    # Instantiate the classification model and visualizer
    estimator.fit(X_resampled, y_resampled, **kwargs)  
    
    predicted = estimator.predict(X_test)
    
    # Compute and return F1 (harmonic mean of precision and recall)
    return predicted

In [106]:
"""
Establish function "predict" which allows selection of three nationalities,
countries, and circuit tiers, as well as a selection of two track types and
input of one of each of the following values:
grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN.

Place these values in the dataframe input_df and display the dataframe.

Create prediction based on widgetpred function and display the prediction:
0 for did not finish, 1 for did finish.
"""
def predictfinish(nationality, country, circuit, trackType, grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN):
    # Use an elif statement to determine the output one-hot encoding based on the input nationality.
    if nationality == "German":
        nationality_CompletionStatus_1 = 0.209566
        nationality_CompletionStatus_2 = 0.790434
    elif nationality == "British":
        nationality_CompletionStatus_1 = 0.240838
        nationality_CompletionStatus_2 = 0.759162
    else:
        nationality_CompletionStatus_1 = 0.292359
        nationality_CompletionStatus_2 = 0.707641
    
    # Use an elif statement to determine the output one-hot encoding based on the input country.
    if country == "Italy":
        country_CompletionStatus_1 = 0.279099
        country_CompletionStatus_2 = 0.720901
    elif country == "Germany":
        country_CompletionStatus_1 = 0.291429
        country_CompletionStatus_2 = 0.708571
    else:
        country_CompletionStatus_1 = 0.219697
        country_CompletionStatus_2 = 0.780303
    
    # Use an elif statement to determine the output one-hot encoding based on the input circuit.
    if circuit == "Tier1":
        binned_circuits_CompletionStatus_1 = 0.253451
        binned_circuits_CompletionStatus_2 = 0.746549
    elif circuit == "Tier2":
        binned_circuits_CompletionStatus_1 = 0.277588
        binned_circuits_CompletionStatus_2 = 0.722412
    else:
        binned_circuits_CompletionStatus_1 = 0.235686
        binned_circuits_CompletionStatus_2 = 0.764314
        
    # Use an if-else statement to determine the output one-hot encoding based on the input track.
    if trackType == "race":
        trackType_CompletionStatus_1 = 0.237243
        trackType_CompletionStatus_2 = 0.762757
    else:
        trackType_CompletionStatus_1 = 0.287045
        trackType_CompletionStatus_2 = 0.712955
    
    # Establish the data of our input_df dataframe.
    inputdata = [[nationality_CompletionStatus_1, nationality_CompletionStatus_2,
                country_CompletionStatus_1, country_CompletionStatus_2,
                binned_circuits_CompletionStatus_1, binned_circuits_CompletionStatus_2,
                trackType_CompletionStatus_1, trackType_CompletionStatus_2,
                grid, alt, average_lap_time, minimum_lap_time, PRCP, TAVG, TMAX, TMIN]]
    
    # Establish the dataframe input_df itself with pd.DataFrame.
    input_df = pd.DataFrame(inputdata, columns =
                ["nationality_CompletionStatus_1", "nationality_CompletionStatus_2",
                "country_CompletionStatus_1", "country_CompletionStatus_2",
                "binned_circuits_CompletionStatus_1", "binned_circuits_CompletionStatus_2",
                "trackType_CompletionStatus_1", "trackType_CompletionStatus_2",
                "grid", "alt", "average_lap_time", "minimum_lap_time", "PRCP", "TAVG", "TMAX", "TMIN"])
    
    display(input_df)
    
    # Using the widgetpred function, predict whether the car will finish the race or not given input_df.
    pred = widgetpred(X_resampled, y_resampled, input_df, LogisticRegression(solver='lbfgs'))
    
    # Using an if-else statement, determine what interactors will see given the data they input.
    if pred[0] == 1:
        writtenpred = "finish the race."
    else:
        writtenpred = "not finish the race."
    
    print("According to our Logistic Regression model, your car is predicted to", writtenpred)

# Create a widget that will interact with the predictfinish function.
interact(predictfinish, nationality = widgets.Dropdown(options = ["German", "British", "Brazilian"], value = "German", description = 'Nationality'),
         country = widgets.Dropdown(options = ["Italy", "Germany", "Spain"], value = "Italy", description = 'Country'),
         circuit = widgets.Dropdown(options = ["Tier1", "Tier2", "Tier3"], value = "Tier1", description = 'Circuit'),
         trackType = widgets.Dropdown(options = ["race", "street"], value = "race", description = 'Track Type'),
         grid = widgets.BoundedIntText(min = 0, max = 30, description = 'Grid', disabled = False, continuous_update = False),
         alt = widgets.BoundedFloatText(min = -100, max = 2500, description = 'Altitude', disabled = False, continuous_update = False),
         average_lap_time = widgets.BoundedFloatText(min = 0, max = 300000, description = 'Avg Lap Time', disabled = False, continuous_update = False),
         minimum_lap_time = widgets.BoundedFloatText(min = 0, max = 300000, description = 'Min Lap Time', disabled = False, continuous_update = False),
         PRCP = widgets.BoundedFloatText(min = 0, max = 20, description = 'Precipitation', disabled = False, continuous_update = False),
         TAVG = widgets.BoundedFloatText(min = 0, max = 120, description = 'Avg Temp (F)', disabled = False, continuous_update = False),
         TMAX = widgets.BoundedFloatText(min = 0, max = 120, description = 'Max Temp (F)', disabled = False, continuous_update = False),
         TMIN = widgets.BoundedFloatText(min = 0, max = 120, description = 'Min Temp (F)', disabled = False, continuous_update = False));

interactive(children=(Dropdown(description='Nationality', options=('German', 'British', 'Brazilian'), value='G…