# Research Question

Which starting lineup was Kobe Bryant the most successful; we define new lineups as major changes where 1 or more players starting players (the 5 starters of the team) are swapped out. We will quantify this by examining Kobe Bryant's average points scored per minute within a certain lineup.

# Hypotheses

1. Kobe will perform better in teams where he has lots of experience with (the lineups he had the most games played with).

# Datasets

- Dataset Name: General Stats of Kobe Bryant's Career
- Links to the dataset:
1. https://www.basketball-reference.com/players/b/bryanko01.html
2. https://www.basketball-reference.com/players/b/bryanko01/gamelog/1997/
3. https://www.basketball-reference.com/boxscores/199611010LAL.html
- Number of observations: 1198

Using these links, I scraped all of the games that Kobe has started in (part of the starting lineup) by recursively extracting tables from the three websites above. While the list of his games goes up to around 1500, after cleaning the data, it ends up being just below 1200. 

# Setup

In [239]:
# Imports 
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import copy

import seaborn as sns
sns.set()
sns.set_context('talk')

import warnings
warnings.filterwarnings('ignore')

import patsy
import statsmodels.api as sm
import scipy.stats as stats
from scipy.stats import ttest_ind, chisquare, normaltest

import requests
import bs4
from bs4 import BeautifulSoup
import re

# Data Gathering and Cleaning

In [248]:
def convert_to_PPM(points, minutes): 
    
    """Converts minutes to a float, divide points by float to get points per minute by Kobe in a game

    Parameters:
    points (string): Points Kobe made in a game
    minutes (string): Minutes Kobe played in a game
    
    Returns:
    float: Points per minute Kobe made in a game

   """
    
    indexN = minutes.index(':')
    
    minute = minutes[0:indexN]
    second = minutes[indexN+1:]
    
    seconds = int(minute)*60 + int(second)
    floatMinutes = seconds/60
    
    return float(points)/floatMinutes

In [4]:
#Retrieving the page for the table with all of Kobe's games listed by year
site = "https://www.basketball-reference.com/players/b/bryanko01.html"
page = requests.get(site)
soup = BeautifulSoup(page.content, "html.parser")

#Scraping the page to get all of the links to the games Kobe's played by year

table = soup.find_all(href=re.compile("/players/b/bryanko01/gamelog/"))

links = []

for html in table:
    links.append("https://www.basketball-reference.com" + html['href'])
    
#Truncating the list as duplicate links were listed
del links[20:] 

In [5]:
#Retrieving every game Kobe has played based on the links previously retrieved
game_links = []

for year in links:
    tempPage = requests.get(year)
    tempSoup = BeautifulSoup(tempPage.content, "html.parser")
    
    tempTable = tempSoup.find_all(href=re.compile("/boxscores/"))
    
    for tempHtml in tempTable:
        game_links.append("https://www.basketball-reference.com" + tempHtml['href'])

In [6]:
#Cleaning the links that have no data
game_links[:] = [link for link in game_links if "https://www.basketball-reference.com/boxscores/" != link]

In [73]:
#Retrieving the correct team info as both teams show up on the website
games = []

#Getting the BeautifulSoup Table, converting to Dataframe, and putting it in a list
for game in game_links:
    tempPage = requests.get(game)
    tempSoup = BeautifulSoup(tempPage.content, "html.parser")
    
    for caption in tempSoup.find_all('caption'):
        if 'Los Angeles Lakers' in caption.get_text():
            tempGame = caption.find_parent('table')
            gameDF = pd.read_html(str(tempGame))[0]
        
            #Dropping all rows that do not contain starters
            gameDF = gameDF.drop(gameDF.index[5:])
        
            #Correcting the column names
            gameDF.columns=['Starters', 'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', '+/-']
        
            #Adding the table to the list if Kobe started in the game
            if(gameDF["Starters"].str.contains("Kobe Bryant").any()):
                games.append(gameDF)
            break

In [74]:
games[0]

Unnamed: 0,Starters,MP,FG,FGA,FG%,3P,3PA,3P%,FT,FTA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,+/-
0,Shaquille O'Neal,36:45,13,23,0.565,0,0,,5,10,...,4,6,10,3,0,2,1,3,31,13
1,Nick Van Exel,32:59,8,13,0.615,3,6,0.5,0,0,...,1,4,5,6,4,0,3,0,19,11
2,Kobe Bryant,31:45,5,11,0.455,0,3,0.0,2,3,...,1,2,3,2,2,1,3,1,12,10
3,Eddie Jones,30:13,3,11,0.273,0,4,0.0,0,0,...,0,3,3,2,3,0,2,2,6,1
4,Elden Campbell,17:56,2,4,0.5,0,0,,0,0,...,4,3,7,1,0,1,1,2,4,3


In [75]:
#Making a list of lists of lineups 
lineups = []

for game in games:
    lineup = []
    
    game.set_index("Starters", inplace=True)
    lineup.append(game.index.tolist())
    lineup.append(convert_to_PPM(game.loc["Kobe Bryant"].PTS, game.loc["Kobe Bryant"].MP))
    
    lineups.append(lineup)

In [174]:
#Sorting lineups alphabetically, creating a list of unique lineups, averaging the PPM of the unique lineups
lineups[0].append(1)
lineups[0][0].sort()
uniqueLineups = [lineups[0]]


for lineup in lineups:
    
    #Skip first iteration 
    if(lineups.index(lineup) == 0):
        continue

    lineup[0].sort()
    count = 0
    
    for(players in uniqueLineups):
        count += 1
        
        if(lineup[0] == players[0]):
            players[1] += lineup[1]
            players[2] += 1
            break

        if(count == len(uniqueLineups)):
            newlineup = []
            newlineup.append(lineup[0])
            newlineup.append(lineup[1])
            newlineup.append(1)
            uniqueLineups.append(newlineup)
            break

# Data Analysis and Results

In [189]:
mostCommonLineup = uniqueLineups[0]

for lineups in uniqueLineups:
    if(lineups[2] > mostCommonLineup[2]):
        mostCommonLineup = lineups
        
print(mostCommonLineup)

[['Andrew Bynum', 'Derek Fisher', 'Kobe Bryant', 'Metta World Peace', 'Pau Gasol'], 80.25794852971109, 112]


As you can see, uniqueLineups is a list of all unique lineups Kobe Bryant has played in. The first index contains a list of names of the lineup. The second index contains the summed total points per minute Kobe made with that lineup. The third variable shows how many times Kobe played with that lineup. Next, I will do some exploratory analysis on the collected data through dataframes.

In [242]:
dfLineups = pd.DataFrame(uniqueLineups, columns=["Starters", "Total Kobe PPM", "Times Played"])
dfLineups.set_index("Starters", inplace=True)

In order to get a fair idea on how Kobe's PPM varied between groups, I divided Kobe's total PPM by the number of times he playe with the lineup.

In [249]:
#Averaging Kobe's PPM based on how many times he played with the team
dfLineups["Kobe PPM Average With Lineup"] = dfLineups["Total Kobe PPM"]/dfLineups["Times Played"]
dfLineups.head()

Unnamed: 0_level_0,Total Kobe PPM,Times Played,Kobe PPM Average With Lineup
Starters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"[Kobe Bryant, Kwame Brown, Lamar Odom, Sasha Vujačić, Smush Parker]",1.300867,1,1.300867
"[Jordan Farmar, Kobe Bryant, Kwame Brown, Lamar Odom, Luke Walton]",2.17697,2,1.088485
"[Chris Duhon, Dwight Howard, Jordan Hill, Kobe Bryant, Metta World Peace]",1.031519,1,1.031519
"[Derek Fisher, Didier Ilunga-Mbenga, Kobe Bryant, Lamar Odom, Metta World Peace]",2.049285,2,1.024643
"[Chris Mihm, Chucky Atkins, Jumaine Jones, Kobe Bryant, Sasha Vujačić]",0.996857,1,0.996857


In [250]:
dfLineups.sort_values("Times Played", ascending=False)

Unnamed: 0_level_0,Total Kobe PPM,Times Played,Kobe PPM Average With Lineup
Starters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"[Andrew Bynum, Derek Fisher, Kobe Bryant, Metta World Peace, Pau Gasol]",80.257949,112,0.716589
"[A.C. Green, Glen Rice, Kobe Bryant, Ron Harper, Shaquille O'Neal]",32.155771,54,0.595477
"[Caron Butler, Chris Mihm, Chucky Atkins, Kobe Bryant, Lamar Odom]",29.943371,46,0.650943
"[Derek Fisher, Kobe Bryant, Rick Fox, Samaki Walker, Shaquille O'Neal]",32.472603,45,0.721613
"[Derek Fisher, Kobe Bryant, Lamar Odom, Metta World Peace, Pau Gasol]",32.109344,44,0.729758
...,...,...,...
"[Derek Fisher, Devean George, Kobe Bryant, Robert Horry, Soumaila Samake]",0.631579,1,0.631579
"[Ed Davis, Jeremy Lin, Jordan Hill, Kobe Bryant, Wesley Johnson]",0.615970,1,0.615970
"[Chris Mihm, Chucky Atkins, Jumaine Jones, Kobe Bryant, Luke Walton]",0.589549,1,0.589549
"[Kobe Bryant, Kwame Brown, Lamar Odom, Ronny Turiaf, Smush Parker]",0.588697,1,0.588697


In [251]:
dfLineups.sort_values("Kobe PPM Average With Lineup", ascending=False, inplace=True)
dfLineups

Unnamed: 0_level_0,Total Kobe PPM,Times Played,Kobe PPM Average With Lineup
Starters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"[Kobe Bryant, Kwame Brown, Lamar Odom, Sasha Vujačić, Smush Parker]",1.300867,1,1.300867
"[Jordan Farmar, Kobe Bryant, Kwame Brown, Lamar Odom, Luke Walton]",2.176970,2,1.088485
"[Chris Duhon, Dwight Howard, Jordan Hill, Kobe Bryant, Metta World Peace]",1.031519,1,1.031519
"[Derek Fisher, Didier Ilunga-Mbenga, Kobe Bryant, Lamar Odom, Metta World Peace]",2.049285,2,1.024643
"[Chris Mihm, Chucky Atkins, Jumaine Jones, Kobe Bryant, Sasha Vujačić]",0.996857,1,0.996857
...,...,...,...
"[Derek Harper, Kobe Bryant, Robert Horry, Shaquille O'Neal, Travis Knight]",0.327273,1,0.327273
"[Kobe Bryant, Pau Gasol, Robert Sacre, Steve Blake, Wesley Johnson]",0.322196,1,0.322196
"[A.C. Green, Derek Fisher, Glen Rice, Kobe Bryant, Shaquille O'Neal]",0.627978,2,0.313989
"[Eddie Jones, Kobe Bryant, Nick Van Exel, Robert Horry, Shaquille O'Neal]",0.250149,1,0.250149


Lineups where Kobe only played with them once greatly misrepresent how the lineup affected his performance; they are artificially inflated. In order to lessen these effects, I will show the data where he played with the lineup at least 3 times. I chose three times because even at 2 times played, there were many data points that were misrepresentational. Also, choosing to show these data halved the amount of data there was (this could be a good or bad thing).

In [252]:
dfLineups[dfLineups["Times Played"] > 2]

Unnamed: 0_level_0,Total Kobe PPM,Times Played,Kobe PPM Average With Lineup
Starters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"[Kobe Bryant, Kwame Brown, Lamar Odom, Luke Walton, Smush Parker]",18.018037,19,0.948318
"[Chris Mihm, Kobe Bryant, Kwame Brown, Lamar Odom, Smush Parker]",20.395699,22,0.927077
"[Andrew Bynum, Kobe Bryant, Lamar Odom, Maurice Evans, Smush Parker]",2.734621,3,0.911540
"[Caron Butler, Chris Mihm, Chucky Atkins, Kobe Bryant, Stanislav Medvedenko]",2.652106,3,0.884035
"[Derek Fisher, Kobe Bryant, Rick Fox, Robert Horry, Samaki Walker]",7.741503,9,0.860167
...,...,...,...
"[Dwight Howard, Jodie Meeks, Kobe Bryant, Pau Gasol, Steve Blake]",1.581101,3,0.527034
"[Derek Harper, Eddie Jones, Kobe Bryant, Shaquille O'Neal, Travis Knight]",5.643433,11,0.513039
"[Corie Blount, Derek Fisher, Eddie Jones, Kobe Bryant, Shaquille O'Neal]",1.460556,3,0.486852
"[Derek Fisher, Glen Rice, J.R. Reid, Kobe Bryant, Shaquille O'Neal]",4.545713,10,0.454571
