## **NBA Analysis and Machine Learning**
#### Purpose: In order to better understand the NBA as a whole, we took a deep dive into the general offensive and defensive team stats from the last 20 years. These stats then allowed us to build an machine learning program to try and predict the NBA finals champion.
## **Cleaning the Data**

In [None]:
# Initialize spark environment
import os
# Find the latest version of spark 3.0  from http://www-us.apache.org/dist/spark/ and enter as the spark version
# For example:
spark_version = 'spark-3.0.2'
os.environ['SPARK_VERSION']=spark_version

# Install Spark and Java
!apt-get update
!apt-get install openjdk-11-jdk-headless -qq > /dev/null
!wget -q http://www-us.apache.org/dist/spark/$SPARK_VERSION/$SPARK_VERSION-bin-hadoop2.7.tgz
!tar xf $SPARK_VERSION-bin-hadoop2.7.tgz
!pip install -q findspark

# Set Environment Variables
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["SPARK_HOME"] = f"/content/{spark_version}-bin-hadoop2.7"

# Start a SparkSession
import findspark
findspark.init()

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
0% [Connecting to archive.ubuntu.com] [1 InRelease 8,192 B/88.7 kB 9%] [Connect0% [Connecting to archive.ubuntu.com (91.189.88.142)] [Waiting for headers] [Wa                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.142)] [Waiting for headers] [Wa0% [1 InRelease gpgv 88.7 kB] [Connecting to archive.ubuntu.com (91.189.88.142)                                                                               Hit:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
0% [1 InRelease gpgv 88.7 kB] [Connecting to archive.ubuntu.com (91.189.88.142)                                                                               Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/

In [None]:
!wget https://jdbc.postgresql.org/download/postgresql-42.2.9.jar

--2021-02-27 01:03:17--  https://jdbc.postgresql.org/download/postgresql-42.2.9.jar
Resolving jdbc.postgresql.org (jdbc.postgresql.org)... 72.32.157.228, 2001:4800:3e1:1::228
Connecting to jdbc.postgresql.org (jdbc.postgresql.org)|72.32.157.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 914037 (893K) [application/java-archive]
Saving to: ‘postgresql-42.2.9.jar.1’


2021-02-27 01:03:18 (6.08 MB/s) - ‘postgresql-42.2.9.jar.1’ saved [914037/914037]



In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("CloudETL").config("spark.driver.extraClassPath","/content/postgresql-42.2.9.jar").getOrCreate()

In [None]:
# Import Team stats from the last 20 years 
from pyspark import SparkFiles
# Load in employee.csv from S3 into a DataFrame
url = "https://nbadatabasebucket.s3.amazonaws.com/Team_00-20.csv"
spark.sparkContext.addFile(url)

df = spark.read.option('header', 'true').csv(SparkFiles.get("Team_00-20.csv"), inferSchema=True, sep=',', timestampFormat="mm/dd/yy")
df.show(10)

+----+--------------------+---+-----+----+----+-----+---+----+-----+----+----+-----+----+----+-----+----+----+----+----+----+---+----+----+-----+--------+------------+
|Year|                Team|  G|   MP|  FG| FGA|  FG%| 3P| 3PA|  3P%|  2P| 2PA|  2P%|  FT| FTA|  FT%| ORB| DRB| TRB| AST| STL|BLK| TOV|  PF|  PTS|Playoffs|Championship|
+----+--------------------+---+-----+----+----+-----+---+----+-----+----+----+-----+----+----+-----+----+----+----+----+----+---+----+----+-----+--------+------------+
|2000|    Sacramento Kings| 82|20080|39.7|88.2|0.449|6.1|17.1|0.354|33.6|71.1|0.472|20.3|26.3|0.771|12.5|34.2|46.7|23.4|10.0|5.5|15.5|20.2|105.6|    true|       false|
|2000|     Milwaukee Bucks| 82|19780|41.0|89.5|0.458|7.4|19.5|0.379|33.6|70.0| 0.48|19.4|24.7|0.787|12.8|32.9|45.8|24.3| 8.8|5.1|14.8|25.4|108.8|    true|       false|
|2000|  Los Angeles Lakers| 82|19905|40.9|87.9|0.465|5.8|16.8|0.344|35.1|71.1|0.494|20.9|30.7|0.683|14.3|33.9|48.2|24.8| 7.4|6.4|15.6|24.6|108.4|    true|      

In [None]:
# Import opposing team stats from the last 20 years
from pyspark import SparkFiles
# Load in employee.csv from S3 into a DataFrame
url = "https://nbadatabasebucket.s3.amazonaws.com/Opp_00-20.csv"
spark.sparkContext.addFile(url)

df2 = spark.read.option('header', 'true').csv(SparkFiles.get("Opp_00-20.csv"), inferSchema=True, sep=',', timestampFormat="mm/dd/yy")
df2.show(10)

+----+--------------------+---+-----+----+----+-----+---+----+-----+----+----+-----+----+----+-----+----+----+----+----+---+---+----+----+-----+---------+------------+
|Year|                Team|  G|   MP|  FG| FGA|  FG%| 3P| 3PA|  3P%|  2P| 2PA|  2P%|  FT| FTA|  FT%| ORB| DRB| TRB| AST|STL|BLK| TOV|  PF|  PTS|Playoffs2|Championship|
+----+--------------------+---+-----+----+----+-----+---+----+-----+----+----+-----+----+----+-----+----+----+----+----+---+---+----+----+-----+---------+------------+
|2000|     New York Knicks| 82|19905|35.7|85.7|0.417|6.2|17.6|0.352|29.5|68.1|0.434|20.6|28.0|0.733|12.1|33.6|45.8|21.9|8.3|4.7|16.9|24.4| 98.2|     true|       false|
|2000|          Miami Heat| 82|19880|37.4|86.8|0.431|5.0|15.1|0.331|32.5|71.7|0.452|18.6|25.2|0.737|12.7|35.0|47.7|19.7|8.3|5.9|17.7|25.3| 98.5|     true|       false|
|2000|   San Antonio Spurs| 82|19830|38.3|91.5|0.419|4.6|14.1|0.329|33.7|77.4|0.435|16.7|22.5|0.741|13.1|32.9|45.9|21.8|8.1|5.8|15.0|26.3| 98.0|     true|      

In [None]:
# Convert spark df to pandas df for ease of opperation
import pandas as pd
Team_DF = df.toPandas()
Opp_DF = df2.toPandas()
Team_DF.head()

Unnamed: 0,Year,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Playoffs,Championship
0,2000,Sacramento Kings,82,20080,39.7,88.2,0.449,6.1,17.1,0.354,33.6,71.1,0.472,20.3,26.3,0.771,12.5,34.2,46.7,23.4,10.0,5.5,15.5,20.2,105.6,True,False
1,2000,Milwaukee Bucks,82,19780,41.0,89.5,0.458,7.4,19.5,0.379,33.6,70.0,0.48,19.4,24.7,0.787,12.8,32.9,45.8,24.3,8.8,5.1,14.8,25.4,108.8,True,False
2,2000,Los Angeles Lakers,82,19905,40.9,87.9,0.465,5.8,16.8,0.344,35.1,71.1,0.494,20.9,30.7,0.683,14.3,33.9,48.2,24.8,7.4,6.4,15.6,24.6,108.4,True,True
3,2000,Dallas Mavericks,82,19805,40.1,87.3,0.459,6.7,17.6,0.381,33.4,69.7,0.479,20.2,25.4,0.794,10.8,33.4,44.2,22.6,8.0,6.4,14.8,24.8,107.1,True,False
4,2000,Toronto Raptors,82,19955,40.3,92.2,0.437,5.7,15.4,0.369,34.6,76.8,0.451,19.6,26.2,0.747,14.8,33.4,48.2,26.5,7.9,6.9,14.3,23.1,105.9,True,False


## **Machine Learning**

### Set up: forming Cause & Effect DFs and performing a Test / Train Split

In [None]:
# removing Effect columns to form a Cause DF
team_X = Team_DF.drop(columns=["Playoffs", "Championship"])
team_X

Unnamed: 0,Year,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,2000,Sacramento Kings,82,20080,39.7,88.2,0.449,6.1,17.1,0.354,33.6,71.1,0.472,20.3,26.3,0.771,12.5,34.2,46.7,23.4,10.0,5.5,15.5,20.2,105.6
1,2000,Milwaukee Bucks,82,19780,41.0,89.5,0.458,7.4,19.5,0.379,33.6,70.0,0.480,19.4,24.7,0.787,12.8,32.9,45.8,24.3,8.8,5.1,14.8,25.4,108.8
2,2000,Los Angeles Lakers,82,19905,40.9,87.9,0.465,5.8,16.8,0.344,35.1,71.1,0.494,20.9,30.7,0.683,14.3,33.9,48.2,24.8,7.4,6.4,15.6,24.6,108.4
3,2000,Dallas Mavericks,82,19805,40.1,87.3,0.459,6.7,17.6,0.381,33.4,69.7,0.479,20.2,25.4,0.794,10.8,33.4,44.2,22.6,8.0,6.4,14.8,24.8,107.1
4,2000,Toronto Raptors,82,19955,40.3,92.2,0.437,5.7,15.4,0.369,34.6,76.8,0.451,19.6,26.2,0.747,14.8,33.4,48.2,26.5,7.9,6.9,14.3,23.1,105.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
591,2019,New York Knicks,66,15965,40.2,89.9,0.447,9.6,28.5,0.337,30.6,61.4,0.499,16.4,23.6,0.694,12.1,34.7,46.8,22.2,7.7,4.8,14.4,22.3,106.5
592,2019,Cleveland Cavaliers,65,15725,40.5,88.4,0.458,11.2,32.0,0.351,29.3,56.4,0.519,15.2,20.0,0.758,10.9,33.6,44.5,23.2,6.9,3.3,16.6,18.4,107.5
593,2019,Chicago Bulls,65,15675,39.5,88.5,0.447,12.2,35.1,0.348,27.3,53.5,0.511,15.5,20.5,0.755,10.4,31.4,41.8,23.2,10.0,4.1,15.4,21.8,106.7
594,2019,Golden State Warriors,65,15725,38.2,87.2,0.438,10.3,30.9,0.334,27.9,56.3,0.495,18.5,23.0,0.803,9.8,32.5,42.4,25.3,8.1,4.5,14.7,19.8,105.2


In [None]:
# In order to push data frame to machine learning all strings must be removed
# therefore we add a label encoder to match every team to a integer.

from sklearn.preprocessing import LabelEncoder
team_encoder = LabelEncoder()
team_encoder.fit(team_X["Team"])
team_X["Team"] = team_encoder.transform(team_X["Team"])
team_X

Unnamed: 0,Year,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,2000,29,82,20080,39.7,88.2,0.449,6.1,17.1,0.354,33.6,71.1,0.472,20.3,26.3,0.771,12.5,34.2,46.7,23.4,10.0,5.5,15.5,20.2,105.6
1,2000,17,82,19780,41.0,89.5,0.458,7.4,19.5,0.379,33.6,70.0,0.480,19.4,24.7,0.787,12.8,32.9,45.8,24.3,8.8,5.1,14.8,25.4,108.8
2,2000,14,82,19905,40.9,87.9,0.465,5.8,16.8,0.344,35.1,71.1,0.494,20.9,30.7,0.683,14.3,33.9,48.2,24.8,7.4,6.4,15.6,24.6,108.4
3,2000,7,82,19805,40.1,87.3,0.459,6.7,17.6,0.381,33.4,69.7,0.479,20.2,25.4,0.794,10.8,33.4,44.2,22.6,8.0,6.4,14.8,24.8,107.1
4,2000,32,82,19955,40.3,92.2,0.437,5.7,15.4,0.369,34.6,76.8,0.451,19.6,26.2,0.747,14.8,33.4,48.2,26.5,7.9,6.9,14.3,23.1,105.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
591,2019,23,66,15965,40.2,89.9,0.447,9.6,28.5,0.337,30.6,61.4,0.499,16.4,23.6,0.694,12.1,34.7,46.8,22.2,7.7,4.8,14.4,22.3,106.5
592,2019,6,65,15725,40.5,88.4,0.458,11.2,32.0,0.351,29.3,56.4,0.519,15.2,20.0,0.758,10.9,33.6,44.5,23.2,6.9,3.3,16.6,18.4,107.5
593,2019,5,65,15675,39.5,88.5,0.447,12.2,35.1,0.348,27.3,53.5,0.511,15.5,20.5,0.755,10.4,31.4,41.8,23.2,10.0,4.1,15.4,21.8,106.7
594,2019,10,65,15725,38.2,87.2,0.438,10.3,30.9,0.334,27.9,56.3,0.495,18.5,23.0,0.803,9.8,32.5,42.4,25.3,8.1,4.5,14.7,19.8,105.2


In [None]:
# Split teams_X into train df
team_X_train = team_X[team_X["Year"] < 2019 ]
team_X_train

Unnamed: 0,Year,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,2000,29,82,20080,39.7,88.2,0.449,6.1,17.1,0.354,33.6,71.1,0.472,20.3,26.3,0.771,12.5,34.2,46.7,23.4,10.0,5.5,15.5,20.2,105.6
1,2000,17,82,19780,41.0,89.5,0.458,7.4,19.5,0.379,33.6,70.0,0.480,19.4,24.7,0.787,12.8,32.9,45.8,24.3,8.8,5.1,14.8,25.4,108.8
2,2000,14,82,19905,40.9,87.9,0.465,5.8,16.8,0.344,35.1,71.1,0.494,20.9,30.7,0.683,14.3,33.9,48.2,24.8,7.4,6.4,15.6,24.6,108.4
3,2000,7,82,19805,40.1,87.3,0.459,6.7,17.6,0.381,33.4,69.7,0.479,20.2,25.4,0.794,10.8,33.4,44.2,22.6,8.0,6.4,14.8,24.8,107.1
4,2000,32,82,19955,40.3,92.2,0.437,5.7,15.4,0.369,34.6,76.8,0.451,19.6,26.2,0.747,14.8,33.4,48.2,26.5,7.9,6.9,14.3,23.1,105.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
561,2018,16,82,19730,40.3,89.4,0.450,11.5,32.9,0.349,28.8,56.5,0.509,15.3,22.1,0.695,11.4,35.6,47.1,24.7,7.8,5.5,15.0,21.2,107.3
562,2018,5,82,19905,39.8,87.8,0.453,9.1,25.9,0.351,30.7,61.9,0.496,16.2,20.6,0.783,8.7,34.1,42.8,21.9,7.3,4.3,14.1,20.3,104.8
563,2018,23,82,19780,38.2,88.2,0.433,10.0,29.5,0.340,28.2,58.7,0.479,18.1,23.8,0.759,10.4,34.2,44.7,20.1,6.8,5.1,14.0,20.9,104.5
564,2018,6,82,19755,40.1,90.3,0.444,10.6,30.0,0.355,29.4,60.3,0.488,16.9,21.3,0.792,11.0,32.9,44.0,21.3,6.7,2.5,13.9,20.6,107.7


In [None]:
# Split team_X into test df
team_X_test = team_X[team_X["Year"] == 2019 ]
# Create results dataframe to push predictions to match teams to predictions
team_results = team_X_test[["Year", "Team"]]
team_results2 = team_X_test[["Year", "Team"]]
team_results["Team"] = team_encoder.inverse_transform(team_results["Team"])
team_results2["Team"] = team_encoder.inverse_transform(team_results2["Team"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [None]:
#removing Cause columns to form a Effect DF for playoffs
team_playoffs = Team_DF[["Year","Playoffs"]]
team_playoffs


Unnamed: 0,Year,Playoffs
0,2000,True
1,2000,True
2,2000,True
3,2000,True
4,2000,True
...,...,...
591,2019,False
592,2019,False
593,2019,False
594,2019,False


In [None]:
# In order to push data frame to machine learning all boolean values must be removed
# therefore we add a label encoder to match true and false to 1 & 0.
from sklearn.preprocessing import LabelEncoder
result_encoder = LabelEncoder()
team_playoffs = Team_DF[["Year","Playoffs"]]
result_encoder.fit(team_playoffs["Playoffs"])
team_playoffs["Playoffs"] = result_encoder.transform(team_playoffs["Playoffs"])
team_playoffs

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


Unnamed: 0,Year,Playoffs
0,2000,1
1,2000,1
2,2000,1
3,2000,1
4,2000,1
...,...,...
591,2019,0
592,2019,0
593,2019,0
594,2019,0


In [None]:
# Split effect dfs into train and split effect dfs
team_playoffs_train = team_playoffs[team_playoffs["Year"] < 2019]
team_playoffs_test = team_playoffs[team_playoffs["Year"] == 2019]
team_playoffs_test

Unnamed: 0,Year,Playoffs
566,2019,1
567,2019,1
568,2019,1
569,2019,1
570,2019,1
571,2019,0
572,2019,0
573,2019,0
574,2019,0
575,2019,1


In [None]:
# Removing Cause columns to form a Effect DF for championship
# Changing boolean values to 1 & 0
team_champ = Team_DF[["Year","Championship"]]
team_champ["Championship"] = result_encoder.transform(team_champ["Championship"])
team_champ

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,Year,Championship
0,2000,0
1,2000,0
2,2000,1
3,2000,0
4,2000,0
...,...,...
591,2019,0
592,2019,0
593,2019,0
594,2019,0


In [None]:
# Split effect dfs into train and split effect dfs
team_champ_train = team_champ[team_champ["Year"] < 2019]
team_champ_test = team_champ[team_champ["Year"] == 2019]
team_champ_test

Unnamed: 0,Year,Championship
566,2019,0
567,2019,0
568,2019,0
569,2019,0
570,2019,0
571,2019,0
572,2019,0
573,2019,0
574,2019,0
575,2019,0


In [None]:
# Drop the Year column
team_X_test = team_X_test.drop(columns = ["Year"])
team_X_train = team_X_train.drop(columns = ["Year"])
team_playoffs_test = team_playoffs_test.drop(columns = ["Year"])
team_playoffs_train = team_playoffs_train.drop(columns = ["Year"])
team_champ_test = team_champ_test.drop(columns = ["Year"])
team_champ_train = team_champ_train.drop(columns = ["Year"])


In [None]:
# Reshape playoffs and champ dbs
team_playoffs_test_values = team_playoffs_test["Playoffs"].values.reshape(-1,1)
team_playoffs_train_values = team_playoffs_train["Playoffs"].values.reshape(-1,1)
team_champ_test_values = team_champ_test["Championship"].values.reshape(-1,1)
team_champ_train_values = team_champ_train["Championship"].values.reshape(-1,1)
print(team_playoffs_test_values.shape, team_playoffs_train_values.shape, team_champ_test_values.shape, team_champ_train_values.shape, team_X_test.shape, team_X_train.shape  )

(30, 1) (566, 1) (30, 1) (566, 1) (30, 24) (566, 24)


In [None]:
print(team_playoffs_test_values.shape, team_playoffs_train_values.shape, team_champ_test_values.shape, team_champ_train_values.shape, team_X_test.shape, team_X_train.shape  )

(30, 1) (566, 1) (30, 1) (566, 1) (30, 24) (566, 24)


In [None]:
from sklearn.preprocessing import StandardScaler

team_X_scaler = StandardScaler().fit(team_X_train)
team_playoffs_scaler = StandardScaler().fit(team_playoffs_train_values)
team_champ_scaler = StandardScaler().fit(team_champ_train_values)

### Linear Regression Model Machine Learning

In [None]:

# import and fit linear regression models for playoffs and championship data
from sklearn.linear_model import LinearRegression
linear_playoff_model = LinearRegression()
linear_champ_model = LinearRegression()
linear_playoff_model.fit(team_X_train, team_playoffs_train_values)
linear_champ_model.fit(team_X_train, team_champ_train_values)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
# convert tuple of tuple into list to move to results
from itertools import chain
champ_predictions = linear_champ_model.predict(team_X_test)
champ_prediction = [x for x in chain.from_iterable(champ_predictions)]
print(champ_prediction)

[0.06513860143190264, 0.16524553197714908, 0.10763573965199491, -0.00020062596018277645, 0.078474340407519, 0.07741280314816201, 0.08160077481569594, 0.01919744190279893, 0.06204630191977856, 0.11475956374663276, 0.11157978144074931, 0.0544666935102196, 0.1504674654098732, 0.11093505933887648, 0.13680533212737167, 0.1452099311851356, -0.0037518061129571123, 0.08947158331808858, 0.12037907586218033, 0.08274417530248535, 0.0049438040383851245, 0.017359943781405818, 0.001177653752044705, -0.027663552246972145, -0.0006981813190094499, -0.012066466495122441, -0.024490889955949058, 0.044924248189735394, -0.017440496765727787, -0.029370017491906708]


In [None]:
# make the max value true and all other values false to predict championship
champ_max = max(champ_prediction)
champ_prediction_bool = []
for i in champ_prediction:
  if i == champ_max:
    champ_prediction_bool.append(True)
  else:
    champ_prediction_bool.append(False)
print(champ_prediction_bool)

[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]


In [None]:
# push results, values, and rank to team_results df
team_results["Value"] = champ_prediction
team_results["Rank"] = team_results["Value"].rank(method = 'max', ascending = False)
team_results["Champion"] = champ_prediction_bool

team_results = team_results.sort_values(by = ["Rank"])
team_results.set_index("Rank")

Unnamed: 0_level_0,Year,Team,Value,Champion
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,2019,Milwaukee Bucks,0.165246,True
2.0,2019,Toronto Raptors,0.150467,False
3.0,2019,Los Angeles Lakers,0.14521,False
4.0,2019,Philadelphia 76ers,0.136805,False
5.0,2019,Indiana Pacers,0.120379,False
6.0,2019,Boston Celtics,0.11476,False
7.0,2019,Miami Heat,0.11158,False
8.0,2019,San Antonio Spurs,0.110935,False
9.0,2019,Portland Trail Blazers,0.107636,False
10.0,2019,Utah Jazz,0.089472,False


In [None]:
from google.colab import files
team_results.to_csv('linear_champ_results.csv')
#files.download('linear_champ_results.csv')

### Neural Network Model Machine Learning

In [None]:
# convert effect dfs to categroical using tensorflow
from tensorflow.keras.utils import to_categorical
team_champ_train_categorical = to_categorical(team_champ_train_values)
team_champ_test_categorical = to_categorical(team_champ_test_values)

In [None]:
# set up model and layers for neural network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


In [None]:
# build nueral network
model = Sequential()
model.add(Dense(units=200, activation='relu', input_dim=24))
model.add(Dense(units=200, activation='relu'))
model.add(Dense(units=200, activation='relu'))
model.add(Dense(units=2, activation='softmax'))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 200)               5000      
_________________________________________________________________
dense_1 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 402       
Total params: 85,802
Trainable params: 85,802
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
model.fit(
    team_X_train,
    team_champ_train_categorical,
    epochs=150,
    shuffle=True,
    verbose=2
)

Epoch 1/150
18/18 - 1s - loss: 135.1550 - accuracy: 0.9134
Epoch 2/150
18/18 - 0s - loss: 61.0342 - accuracy: 0.9099
Epoch 3/150
18/18 - 0s - loss: 49.3980 - accuracy: 0.9099
Epoch 4/150
18/18 - 0s - loss: 33.0423 - accuracy: 0.9311
Epoch 5/150
18/18 - 0s - loss: 45.1297 - accuracy: 0.9664
Epoch 6/150
18/18 - 0s - loss: 25.0036 - accuracy: 0.9134
Epoch 7/150
18/18 - 0s - loss: 23.6532 - accuracy: 0.9099
Epoch 8/150
18/18 - 0s - loss: 26.4980 - accuracy: 0.9664
Epoch 9/150
18/18 - 0s - loss: 28.9907 - accuracy: 0.9134
Epoch 10/150
18/18 - 0s - loss: 19.5679 - accuracy: 0.9134
Epoch 11/150
18/18 - 0s - loss: 19.5459 - accuracy: 0.9664
Epoch 12/150
18/18 - 0s - loss: 19.5855 - accuracy: 0.9081
Epoch 13/150
18/18 - 0s - loss: 20.0234 - accuracy: 0.9134
Epoch 14/150
18/18 - 0s - loss: 49.1373 - accuracy: 0.9134
Epoch 15/150
18/18 - 0s - loss: 32.9434 - accuracy: 0.9276
Epoch 16/150
18/18 - 0s - loss: 18.7615 - accuracy: 0.9664
Epoch 17/150
18/18 - 0s - loss: 11.8318 - accuracy: 0.9134
Epoch

<tensorflow.python.keras.callbacks.History at 0x7f15f072b650>

In [None]:
 # use predicted values for true
 encoded_predictions = model.predict_proba(team_X_test)
 print(encoded_predictions)

[[0.9446469  0.05535307]
 [0.93929994 0.06070006]
 [0.94140506 0.05859496]
 [0.93813795 0.0618621 ]
 [0.9374947  0.06250528]
 [0.93817043 0.06182953]
 [0.93933296 0.06066707]
 [0.9369162  0.06308381]
 [0.9399564  0.06004358]
 [0.9389072  0.06109283]
 [0.9409418  0.05905821]
 [0.94154227 0.05845773]
 [0.9368494  0.06315052]
 [0.9356632  0.06433685]
 [0.93912077 0.06087923]
 [0.9350132  0.0649868 ]
 [0.93959063 0.06040938]
 [0.9367222  0.0632778 ]
 [0.9409325  0.05906751]
 [0.9383371  0.06166293]
 [0.93834096 0.061659  ]
 [0.9393923  0.06060763]
 [0.9286547  0.0713454 ]
 [0.91872036 0.08127965]
 [0.92431706 0.0756829 ]
 [0.9229434  0.07705654]
 [0.9223241  0.07767589]
 [0.9216385  0.0783615 ]
 [0.92181146 0.07818855]
 [0.9221561  0.07784384]]




In [None]:
values = []
for x in encoded_predictions:
  values.append(x[1])
print(values)

[0.05535307, 0.060700063, 0.05859496, 0.061862096, 0.06250528, 0.061829526, 0.06066707, 0.06308381, 0.06004358, 0.061092827, 0.059058208, 0.058457732, 0.06315052, 0.06433685, 0.06087923, 0.064986795, 0.06040938, 0.063277796, 0.05906751, 0.061662927, 0.061659, 0.060607634, 0.071345404, 0.08127965, 0.0756829, 0.07705654, 0.07767589, 0.0783615, 0.07818855, 0.077843845]


In [None]:
team_results2["Value"] = values
team_results2["Rank"] = team_results2["Value"].rank(method = 'max', ascending = False)
team_results2 = team_results2.sort_values(by = ["Rank"])
team_results2 = team_results2.set_index("Rank")

In [None]:
team_results2

Unnamed: 0_level_0,Year,Team,Value
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1.0,2019,Brooklyn Nets,0.08128
2.0,2019,Portland Trail Blazers,0.078361
3.0,2019,Denver Nuggets,0.078189
4.0,2019,Dallas Mavericks,0.077844
5.0,2019,Miami Heat,0.077676
6.0,2019,Indiana Pacers,0.077057
7.0,2019,Memphis Grizzlies,0.075683
8.0,2019,Orlando Magic,0.071345
9.0,2019,New Orleans Pelicans,0.064987
10.0,2019,Los Angeles Clippers,0.064337


In [None]:
team_X_test["Values"] = values
team_X_test["Team"] = team_encoder.inverse_transform(team_X_test["Team"])
team_X_test

Unnamed: 0,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Values
566,Dallas Mavericks,75,18175,41.5,90.0,0.461,15.1,41.1,0.367,26.4,48.9,0.541,18.5,23.8,0.779,10.5,36.3,46.8,24.6,6.1,4.8,12.7,19.4,116.7,0.055353
567,Milwaukee Bucks,73,17595,41.0,86.1,0.476,13.1,36.9,0.355,27.9,49.3,0.567,17.3,23.4,0.742,9.0,40.0,49.0,24.5,6.8,5.6,14.3,18.6,112.4,0.0607
568,Portland Trail Blazers,74,17835,41.7,90.2,0.463,12.7,33.7,0.377,29.0,56.4,0.514,17.5,21.8,0.804,10.1,34.7,44.8,20.4,6.2,6.0,12.6,21.5,113.7,0.058595
569,Houston Rockets,72,17380,39.1,86.7,0.451,15.0,43.4,0.345,24.1,43.3,0.557,19.8,25.0,0.791,9.4,33.1,42.4,20.7,8.3,4.9,14.1,20.8,112.9,0.061862
570,Los Angeles Clippers,72,17380,40.7,87.4,0.466,12.2,32.8,0.371,28.5,54.6,0.522,20.4,25.8,0.791,10.4,36.2,46.7,23.2,6.9,4.6,14.3,21.7,113.9,0.062505
571,New Orleans Pelicans,72,17430,40.7,87.6,0.465,13.0,35.3,0.37,27.6,52.3,0.528,16.3,22.4,0.729,10.6,33.9,44.4,25.6,7.2,4.8,15.7,20.2,110.7,0.06183
572,Phoenix Suns,73,17595,40.5,86.5,0.468,11.2,31.2,0.358,29.3,55.3,0.529,19.5,23.4,0.834,9.6,33.2,42.8,26.7,7.5,3.9,14.5,21.7,111.7,0.060667
573,Washington Wizards,72,17355,40.3,88.1,0.457,11.6,31.6,0.368,28.6,56.6,0.506,18.8,23.8,0.788,9.9,30.9,40.8,24.3,7.7,4.2,13.7,22.0,110.9,0.063084
574,Memphis Grizzlies,73,17570,41.3,88.1,0.468,10.6,30.5,0.347,30.7,57.6,0.532,16.1,21.1,0.763,10.0,35.1,45.1,26.1,7.7,5.3,14.8,20.5,109.2,0.060044
575,Boston Celtics,72,17430,41.1,89.3,0.461,12.5,34.4,0.364,28.6,54.8,0.522,18.5,23.1,0.801,10.6,35.3,45.9,22.9,8.2,5.6,13.8,21.5,113.3,0.061093


In [None]:
team_X_test

Unnamed: 0,Team,G,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Values
566,Dallas Mavericks,75,18175,41.5,90.0,0.461,15.1,41.1,0.367,26.4,48.9,0.541,18.5,23.8,0.779,10.5,36.3,46.8,24.6,6.1,4.8,12.7,19.4,116.7,0.055353
567,Milwaukee Bucks,73,17595,41.0,86.1,0.476,13.1,36.9,0.355,27.9,49.3,0.567,17.3,23.4,0.742,9.0,40.0,49.0,24.5,6.8,5.6,14.3,18.6,112.4,0.0607
568,Portland Trail Blazers,74,17835,41.7,90.2,0.463,12.7,33.7,0.377,29.0,56.4,0.514,17.5,21.8,0.804,10.1,34.7,44.8,20.4,6.2,6.0,12.6,21.5,113.7,0.058595
569,Houston Rockets,72,17380,39.1,86.7,0.451,15.0,43.4,0.345,24.1,43.3,0.557,19.8,25.0,0.791,9.4,33.1,42.4,20.7,8.3,4.9,14.1,20.8,112.9,0.061862
570,Los Angeles Clippers,72,17380,40.7,87.4,0.466,12.2,32.8,0.371,28.5,54.6,0.522,20.4,25.8,0.791,10.4,36.2,46.7,23.2,6.9,4.6,14.3,21.7,113.9,0.062505
571,New Orleans Pelicans,72,17430,40.7,87.6,0.465,13.0,35.3,0.37,27.6,52.3,0.528,16.3,22.4,0.729,10.6,33.9,44.4,25.6,7.2,4.8,15.7,20.2,110.7,0.06183
572,Phoenix Suns,73,17595,40.5,86.5,0.468,11.2,31.2,0.358,29.3,55.3,0.529,19.5,23.4,0.834,9.6,33.2,42.8,26.7,7.5,3.9,14.5,21.7,111.7,0.060667
573,Washington Wizards,72,17355,40.3,88.1,0.457,11.6,31.6,0.368,28.6,56.6,0.506,18.8,23.8,0.788,9.9,30.9,40.8,24.3,7.7,4.2,13.7,22.0,110.9,0.063084
574,Memphis Grizzlies,73,17570,41.3,88.1,0.468,10.6,30.5,0.347,30.7,57.6,0.532,16.1,21.1,0.763,10.0,35.1,45.1,26.1,7.7,5.3,14.8,20.5,109.2,0.060044
575,Boston Celtics,72,17430,41.1,89.3,0.461,12.5,34.4,0.364,28.6,54.8,0.522,18.5,23.1,0.801,10.6,35.3,45.9,22.9,8.2,5.6,13.8,21.5,113.3,0.061093


In [None]:
from google.colab import files
team_X_test.to_csv('neural_champ_results.csv')
files.download('neural_champ_results.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>