# Thought Process for Combining Current NBA Players with Strong Resumes & Younger Players on Trajectory to the HOF Dataset
When building a model to predict the likelihood of an NBA player making it to the Hall of Fame (HOF), it's essential to include a balanced representation of both historical and current players, along with rising stars who have the potential to reach the Hall of Fame. This allows the model to capture trends over time and recognize the evolving nature of player performance and HOF eligibility. Here's how we can combine current NBA players with strong resumes and younger players on trajectory into the HOF dataset:

## Why Include Current NBA Players with Strong Resumes & Younger Players on Trajectory?
Reflecting Modern Player Performance: Current NBA players with strong resumes, such as Damian Lillard, Paul George, and Jimmy Butler, are active players who have already made significant contributions to the game. Their performance, accolades, and longevity can offer valuable insights into the probability of reaching the HOF.

- Predicting Future HOF Inductees: Including strong active players along with younger players who are on the trajectory to the Hall of Fame helps the model identify patterns and criteria that may not be present in historical data. For example, the modern emphasis on 3-point shooting or advanced stats like PER (Player Efficiency Rating) could play a larger role in future induction considerations.

- Capturing the Trajectory of Rising Stars: Young players like Luka Dončić, Jayson Tatum, and others with promising careers provide valuable data on how current performance may lead to future Hall of Fame status. This category helps ensure the model takes into account players who are still early in their careers but have high potential.

- Comparing Active and Retired Players: Comparing current players, including those on the trajectory to the Hall of Fame, to retired players in the dataset allows the model to evaluate what separates those who made the HOF from those who didn’t. This comparison is crucial for understanding evolving trends in the NBA and identifying potential future inductees.

## Method of Integration
- Strong Active Players & Future HOF Candidates Identification: Select players who are currently active in the NBA and have a track record of significant accomplishments such as multiple All-Star selections, All-NBA team appearances, or championships, along with younger players who are on the trajectory to the Hall of Fame due to their performance and potential.

- Maintaining Dataset Balance: While it's important to include both strong current players and young players with high potential, the dataset needs to remain balanced with historical HOF inductees and non-HOF players (e.g., retired borderline candidates, role players, and below-average players). This balance ensures the model doesn’t overfit to current player trends.

- Feature Representation: For each player, we collect a variety of features, such as career averages, individual awards, championships, and advanced metrics (e.g., PER, win shares). By including both strong current players and younger players, we ensure that the features reflect both modern and historical achievements, as well as potential future success.

By combining both historical, strong current players, and rising stars, the dataset becomes a powerful tool for training the model to identify the key predictors of HOF eligibility and success.

## Examples
- Damian Lillard – Multiple-time All-Star, All-NBA selections, prolific scorer
- Paul George – Multiple-time All-Star, All-NBA selections, strong two-way player
- Jimmy Butler – Multiple-time All-Star, Finals appearance, strong playoff performer
- Kawhi Leonard – Two-time NBA Champion, multiple-time Finals MVP, multiple-time Defensive Player of the Year
- James Harden – Multiple-time MVP, multiple-time All-Star, All-NBA selections, scoring leader
- Anthony Davis – NBA Champion, multiple-time All-Star, Defensive Player of the Year
- Devin Booker – Multiple-time All-Star, NBA Finals appearance, rising star
- Donovan Mitchell – Multiple-time All-Star, high-scoring guard, playoff performer
- Jayson Tatum – Multiple-time All-Star, NBA Finals appearance, rising young star
- Luka Dončić – Multiple-time All-Star, MVP candidate, young superstar

**Additionally** including sure-fire HOF current active players such as:

- LeBron James
- Stephen Curry
- Kevin Durant
- Nikola Jokic
- Etc

In [1]:
#Import libraries
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time 

In [None]:
#Scrape career stats for players

#Player Url's from basketball reference 
player_urls = [
    "https://www.basketball-reference.com/players/j/jamesle01.html", #Lebron James
    "https://www.basketball-reference.com/players/c/curryst01.html", #Steph Curry
    "https://www.basketball-reference.com/players/d/duranke01.html", #Kevin Durant
    "https://www.basketball-reference.com/players/l/leonaka01.html", #Kawhi Leonard
    "https://www.basketball-reference.com/players/h/hardeja01.html", #James Harden
    "https://www.basketball-reference.com/players/w/westbru01.html", #Russel Westbrook
    "https://www.basketball-reference.com/players/a/antetgi01.html", #Giannis Antetokounmpo
    "https://www.basketball-reference.com/players/l/lillada01.html", #Damian Lillard 
    "https://www.basketball-reference.com/players/d/davisan02.html", #Anthony Davis
    "https://www.basketball-reference.com/players/b/butleji01.html", #Jimmy Butler
    "https://www.basketball-reference.com/players/g/georgpa01.html", #Paul George
    "https://www.basketball-reference.com/players/t/thompkl01.html", #Klay Thompson 
    "https://www.basketball-reference.com/players/b/bookede01.html", #Devin Booker
    "https://www.basketball-reference.com/players/t/tatumja01.html", #Jayson Tatum
    "https://www.basketball-reference.com/players/d/doncilu01.html", #Luka Doncic 
    "https://www.basketball-reference.com/players/g/greendr01.html", #Draymon Green
    "https://www.basketball-reference.com/players/h/holidjr01.html", #Jrue Holiday
    "https://www.basketball-reference.com/players/h/horfoal01.html", #Al Horford
    "https://www.basketball-reference.com/players/j/jokicni01.html", #Nikola Jokic
    "https://www.basketball-reference.com/players/e/embiijo01.html", #Joel Embiid
    "https://www.basketball-reference.com/players/p/paulch01.html", #Chris Paul
    "https://www.basketball-reference.com/players/i/irvinky01.html", #Kyrie Irving
    "https://www.basketball-reference.com/players/g/goberru01.html", #Rudy Gobert
    "https://www.basketball-reference.com/players/d/derozde01.html", #Demar DeRozan
    "https://www.basketball-reference.com/players/l/lowryky01.html", #Kyle Lowry
    "https://www.basketball-reference.com/players/l/loveke01.html", #Kevin Love
    "https://www.basketball-reference.com/players/b/bealbr01.html", #Bradely Beal
    "https://www.basketball-reference.com/players/l/lavinza01.html", #Zach LaVine
    "https://www.basketball-reference.com/players/t/townska01.html", #Karl-Anthony Towns
    "https://www.basketball-reference.com/players/y/youngtr01.html", #Trae Young
    "https://www.basketball-reference.com/players/m/mitchdo01.html", #Donovan Mitchell
    "https://www.basketball-reference.com/players/s/sabondo01.html", #Domantus Sabonis 
    "https://www.basketball-reference.com/players/r/randlju01.html", #Julius Randle
    "https://www.basketball-reference.com/players/m/murraja01.html", #Jamal Murray
    "https://www.basketball-reference.com/players/f/foxde01.html", #De'Aaron Fox
    "https://www.basketball-reference.com/players/b/brownja02.html", #Jaylon Brown
    "https://www.basketball-reference.com/players/s/siakapa01.html" #Pascal Siakam
]

#Headers for request 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url, headers=headers)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and 'Experience:' in s).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                per = float(stat_finder(lambda x: x and 'Player Efficiency Rating' in x))
                win_shares = float(stat_finder(lambda x: x and 'Win Shares' in x))

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and 'All-NBA' in s)
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'data-tip': lambda x: x and 'MVP' in x and 'AS MVP' not in x and 'IST MVP' not in x and 'Finals MVP' not in x and 'ECF MVP' not in x and 'WCF MVP' not in x})
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and 'NBA Champ' in s) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Not inducted in the HOF yet
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_one = pd.DataFrame(all_player_data)

In [4]:
df_one

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,LeBron James,SF,1540,21,27.0,7.5,7.4,27.0,50.6,73.6,...,21,20,6,1,4,4,1,0,1,0
1,Stephen Curry,PG,1002,15,24.7,4.7,6.4,23.4,47.1,91.0,...,11,10,0,1,2,4,0,0,2,0
2,Kevin Durant,SF,1102,16,27.3,7.0,4.4,24.9,50.2,88.2,...,15,11,0,1,1,2,1,0,4,0
3,Kawhi Leonard,SF,711,12,19.9,6.3,3.0,23.3,49.8,86.1,...,6,6,7,1,0,2,0,2,0,0
4,James Harden,PG,1123,15,24.0,5.6,7.2,23.7,43.9,86.2,...,11,7,0,1,1,0,0,0,3,0
5,Russell Westbrook,PG,1210,16,21.4,7.1,8.1,21.8,43.9,77.3,...,9,9,0,1,1,0,0,0,2,0
6,Giannis Antetokounmpo,PF,833,11,23.8,9.9,4.9,25.6,54.9,69.4,...,9,8,5,1,2,1,0,1,0,0
7,Damian Lillard,PG,886,12,25.2,4.3,6.8,22.3,43.9,89.9,...,9,7,0,1,0,0,1,0,0,0
8,Anthony Davis,PF,779,12,24.2,10.7,2.5,26.9,52.3,79.5,...,10,5,5,1,0,1,0,0,0,0
9,Jimmy Butler,SF,843,13,18.3,5.3,4.3,21.7,47.2,84.2,...,6,5,5,0,0,0,0,0,0,0


In [5]:
print(response.status_code)

200


In [None]:
#Scrape career stats for players

#Player Url's from basketball reference 
player_urls = [
    "https://www.basketball-reference.com/players/m/moranja01.html", #Ja Morant
    "https://www.basketball-reference.com/players/g/gilgesh01.html", #Shai Gilgeous-Alexander
    "https://www.basketball-reference.com/players/b/ballla01.html", #LaMelo Ball
    "https://www.basketball-reference.com/players/e/edwaran01.html", #Anthony Edwards
    "https://www.basketball-reference.com/players/c/cunnica01.html", #Cade Cunningham
    "https://www.basketball-reference.com/players/l/lopezbr01.html", #Brook Lopez
    "https://www.basketball-reference.com/players/m/middlkh01.html", #Khris Middleton
    "https://www.basketball-reference.com/players/v/vanvlfr01.html", #Fred VanVleet
    "https://www.basketball-reference.com/players/i/ingrabr01.html", #Brandom Ingram
    "https://www.basketball-reference.com/players/p/porzikr01.html", #Kristaps Porziņģis
    "https://www.basketball-reference.com/players/m/mccolcj01.html", #Cj McCollum
    "https://www.basketball-reference.com/players/v/vucevni01.html", #Nikola Vučević
    "https://www.basketball-reference.com/players/w/whitede01.html", #Derrick White
    "https://www.basketball-reference.com/players/a/adebaba01.html", #Bam Adebayo
    "https://www.basketball-reference.com/players/d/drumman01.html", #Andre Drummond
    "https://www.basketball-reference.com/players/c/conlemi01.html", #Mike Conley
    "https://www.basketball-reference.com/players/h/haywago01.html", #Gordan Hayward
    "https://www.basketball-reference.com/players/a/aytonde01.html", #Deandre Ayton
    "https://www.basketball-reference.com/players/j/jordade01.html", #DeAndre Jordan
    "https://www.basketball-reference.com/players/s/smartma01.html", #Marcus Smart
    "https://www.basketball-reference.com/players/b/banchpa01.html", #Paolo Banchero
    "https://www.basketball-reference.com/players/j/jacksja02.html", #Jaren Jackson
    "https://www.basketball-reference.com/players/m/maxeyty01.html", #Tyrese Maxey
    "https://www.basketball-reference.com/players/w/wembavi01.html", #Victor Wembanyama
    "https://www.basketball-reference.com/players/h/halibty01.html", #Tyrese Haliburton
    "https://www.basketball-reference.com/players/b/brunsja01.html", #Jalen Brunson
    "https://www.basketball-reference.com/players/w/wagnefr01.html", #Franz Wagner
    "https://www.basketball-reference.com/players/h/holmgch01.html", #Chet Holmgren
    "https://www.basketball-reference.com/players/s/sengual01.html", #Alperen Sengun
    "https://www.basketball-reference.com/players/b/barnesc01.html", #Scottie Barnes
    "https://www.basketball-reference.com/players/w/willizi01.html", #Zion Williamson
    "https://www.basketball-reference.com/players/m/mobleev01.html", #Evan Mobley
    "https://www.basketball-reference.com/players/w/walkeke02.html", #Kemba Walker
    "https://www.basketball-reference.com/players/t/thomais02.html", #Isaiah Thomas
    "https://www.basketball-reference.com/players/r/rondora01.html", #Rajon Rondo (should of been in my borderline dataset but I forgot)
    "https://www.basketball-reference.com/players/d/dragigo01.html", #Goran Dragic
    "https://www.basketball-reference.com/players/m/millspa01.html", #Paul Millsap (could also be considered a role player)
    "https://www.basketball-reference.com/players/g/gordoer01.html", #Eric Gordan (could also be considered a role player)
    "https://www.basketball-reference.com/players/i/ibakase01.html", #Serge Ibaka
    "https://www.basketball-reference.com/players/w/willija06.html" #Jaylen Williams 
]

#Headers for request 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url, headers=headers)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length:' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                per = float(stat_finder(lambda x: x and 'Player Efficiency Rating' in x))
                win_shares = float(stat_finder(lambda x: x and 'Win Shares' in x))

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and 'All-NBA' in s)
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'data-tip': lambda x: x and 'MVP' in x and 'AS MVP' not in x and 'IST MVP' not in x and 'Finals MVP' not in x and 'ECF MVP' not in x and 'WCF MVP' not in x})
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and 'NBA Champ' in s) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Not inducted in the HOF yet
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_two = pd.DataFrame(all_player_data)

In [7]:
print(response.status_code)

429


In [9]:
df_two

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Ja Morant,PG,289,5,22.3,4.8,7.4,20.2,46.9,76.1,...,2,1,0,1,0,0,1,0,0,0
1,Shai Gilgeous-Alexander,PG,439,6,24.0,4.8,5.0,23.1,50.0,85.9,...,3,2,0,1,0,0,0,0,0,0
2,LaMelo Ball,PG,217,4,21.1,6.1,7.4,19.2,42.5,83.2,...,1,0,0,1,0,0,1,0,0,0
3,Anthony Edwards,SG,356,4,23.6,5.3,4.2,17.4,44.5,80.1,...,3,1,0,1,0,0,0,0,0,0
4,Cade Cunningham,PG,188,3,21.4,5.4,7.2,16.4,43.9,85.5,...,1,0,0,1,0,0,0,0,0,0
5,Brook Lopez,C,1077,16,15.9,6.1,1.4,18.5,49.5,79.7,...,1,0,2,1,0,1,0,0,0,0
6,Khris Middleton,SG,762,12,16.7,4.8,4.0,16.8,46.1,87.9,...,3,0,0,0,0,1,0,0,0,0
7,Fred VanVleet,PG,535,8,15.0,3.4,5.7,16.0,40.3,86.5,...,1,0,0,0,0,1,0,0,0,0
8,Brandon Ingram,SF,495,8,19.5,5.2,4.3,16.1,46.8,78.8,...,1,0,0,1,0,0,0,0,0,0
9,Kristaps Porziņģis,PF,488,8,19.6,7.8,1.8,20.8,46.1,83.1,...,1,0,0,1,0,1,0,0,0,0


In [10]:
#Join together 
df = pd.concat([df_one,df_two], ignore_index=True)
#Display
df.head()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,LeBron James,SF,1540,21,27.0,7.5,7.4,27.0,50.6,73.6,...,21,20,6,1,4,4,1,0,1,0
1,Stephen Curry,PG,1002,15,24.7,4.7,6.4,23.4,47.1,91.0,...,11,10,0,1,2,4,0,0,2,0
2,Kevin Durant,SF,1102,16,27.3,7.0,4.4,24.9,50.2,88.2,...,15,11,0,1,1,2,1,0,4,0
3,Kawhi Leonard,SF,711,12,19.9,6.3,3.0,23.3,49.8,86.1,...,6,6,7,1,0,2,0,2,0,0
4,James Harden,PG,1123,15,24.0,5.6,7.2,23.7,43.9,86.2,...,11,7,0,1,1,0,0,0,3,0


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             77 non-null     object 
 1   Position         77 non-null     object 
 2   Games            77 non-null     int64  
 3   Career Length    77 non-null     int64  
 4   PPG              77 non-null     float64
 5   RPG              77 non-null     float64
 6   APG              77 non-null     float64
 7   PER              77 non-null     float64
 8   FG%              77 non-null     float64
 9   FT%              77 non-null     float64
 10  Win Shares       77 non-null     float64
 11  All-Stars        77 non-null     int64  
 12  All-NBA          77 non-null     int64  
 13  All-Defense      77 non-null     int64  
 14  All-Rookie Team  77 non-null     int64  
 15  MVPs             77 non-null     int64  
 16  Chips            77 non-null     int64  
 17  ROY              7

In [14]:
#Save to csv file
df.to_csv('Current NBA Players CSV', index=False)

In [4]:
url = "https://www.basketball-reference.com/players/l/leonaka01.html"
#Initialize response
response = requests.get(url)
print(response.status_code)



200


In [None]:
url = "https://www.basketball-reference.com/players/r/rondora01.html"
#Initialize response
response = requests.get(url)
#Start scraping
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    try:
        #Testing field
        #Get Career Length
        career_length_elem = soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length:' in s))
        career_length = int(career_length_elem.next_sibling.text.strip().split()[0])
        print(career_length_elem)
        print(career_length)
    except Exception as e:
        print(f'Error Scraping data for {url}: {e}')


<strong>Career Length:</strong>
16


In [2]:
#Read in csv file
df = pd.read_csv('Current NBA Players CSV')
df.head()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,LeBron James,SF,1540,21,27.0,7.5,7.4,27.0,50.6,73.6,...,21,20,6,1,4,4,1,0,1,0
1,Stephen Curry,PG,1002,15,24.7,4.7,6.4,23.4,47.1,91.0,...,11,10,0,1,2,4,0,0,2,0
2,Kevin Durant,SF,1102,16,27.3,7.0,4.4,24.9,50.2,88.2,...,15,11,0,1,1,2,1,0,4,0
3,Kawhi Leonard,SF,711,12,19.9,6.3,3.0,23.3,49.8,86.1,...,6,6,7,1,0,2,0,2,0,0
4,James Harden,PG,1123,15,24.0,5.6,7.2,23.7,43.9,86.2,...,11,7,0,1,1,0,0,0,3,0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             77 non-null     object 
 1   Position         77 non-null     object 
 2   Games            77 non-null     int64  
 3   Career Length    77 non-null     int64  
 4   PPG              77 non-null     float64
 5   RPG              77 non-null     float64
 6   APG              77 non-null     float64
 7   PER              77 non-null     float64
 8   FG%              77 non-null     float64
 9   FT%              77 non-null     float64
 10  Win Shares       77 non-null     float64
 11  All-Stars        77 non-null     int64  
 12  All-NBA          77 non-null     int64  
 13  All-Defense      77 non-null     int64  
 14  All-Rookie Team  77 non-null     int64  
 15  MVPs             77 non-null     int64  
 16  Chips            77 non-null     int64  
 17  ROY              7

There were some players who weren't scraped in the 2nd list due to a 429 error, so I'll go ahead and scrape those players now. 

In [4]:
#Scrape career stats for players

#Player Url's from basketball reference 
player_urls = [
    "https://www.basketball-reference.com/players/g/greenja05.html", #Jalen Green
    "https://www.basketball-reference.com/players/l/leeda02.html" #David Lee
]

#Headers for request 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url, headers=headers)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length:' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                per = float(stat_finder(lambda x: x and 'Player Efficiency Rating' in x))
                win_shares = float(stat_finder(lambda x: x and 'Win Shares' in x))

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and 'All-NBA' in s)
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'data-tip': lambda x: x and 'MVP' in x and 'AS MVP' not in x and 'IST MVP' not in x and 'Finals MVP' not in x and 'ECF MVP' not in x and 'WCF MVP' not in x})
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and 'NBA Champ' in s) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Not inducted in the HOF yet
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
rest_of_players = pd.DataFrame(all_player_data)

In [5]:
#Check dataframe
rest_of_players

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
1,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0


In [7]:
#Concat dataframes together
df = pd.concat([df, rest_of_players], ignore_index=True)

#Check info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             81 non-null     object 
 1   Position         81 non-null     object 
 2   Games            81 non-null     int64  
 3   Career Length    81 non-null     int64  
 4   PPG              81 non-null     float64
 5   RPG              81 non-null     float64
 6   APG              81 non-null     float64
 7   PER              81 non-null     float64
 8   FG%              81 non-null     float64
 9   FT%              81 non-null     float64
 10  Win Shares       81 non-null     float64
 11  All-Stars        81 non-null     int64  
 12  All-NBA          81 non-null     int64  
 13  All-Defense      81 non-null     int64  
 14  All-Rookie Team  81 non-null     int64  
 15  MVPs             81 non-null     int64  
 16  Chips            81 non-null     int64  
 17  ROY              8

In [8]:
duplicates = df.duplicated()
duplicates.value_counts()

False    79
True      2
Name: count, dtype: int64

In [9]:
df.loc[duplicates]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
79,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
80,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0


In [10]:
df.iloc[79:81]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
79,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
80,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0


In [11]:
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
76,Jalen Williams,SG,197,2,17.7,4.6,4.2,18.0,51.4,80.1,...,1,0,0,1,0,0,0,0,0,0
77,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
78,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0
79,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
80,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0


In [12]:
#Drop duplicates
df = df.drop_duplicates()

In [13]:
#Check
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
74,Eric Gordon,SG,925,16,15.3,2.3,2.7,13.7,43.0,81.0,...,0,0,0,1,0,0,0,0,0,0
75,Serge Ibaka,PF,919,14,12.0,7.1,0.8,17.1,51.3,75.7,...,0,0,3,0,0,1,0,0,0,0
76,Jalen Williams,SG,197,2,17.7,4.6,4.2,18.0,51.4,80.1,...,1,0,0,1,0,0,0,0,0,0
77,Jalen Green,SG,280,3,20.1,4.2,3.3,14.1,42.2,80.3,...,0,0,0,1,0,0,0,0,0,0
78,David Lee,PF,829,12,13.5,8.8,2.2,19.1,53.5,77.2,...,2,1,0,0,0,1,0,0,0,0


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 79 entries, 0 to 78
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             79 non-null     object 
 1   Position         79 non-null     object 
 2   Games            79 non-null     int64  
 3   Career Length    79 non-null     int64  
 4   PPG              79 non-null     float64
 5   RPG              79 non-null     float64
 6   APG              79 non-null     float64
 7   PER              79 non-null     float64
 8   FG%              79 non-null     float64
 9   FT%              79 non-null     float64
 10  Win Shares       79 non-null     float64
 11  All-Stars        79 non-null     int64  
 12  All-NBA          79 non-null     int64  
 13  All-Defense      79 non-null     int64  
 14  All-Rookie Team  79 non-null     int64  
 15  MVPs             79 non-null     int64  
 16  Chips            79 non-null     int64  
 17  ROY              79 non

In [15]:
#Save to csv file
df.to_csv('Current NBA Players.csv', index=False)