# Rationale for Including Role Players in NBA Hall of Fame Dataset

## Introduction
When constructing a dataset to predict the probability of an NBA player making it to the Hall of Fame (HOF), it is crucial to include a diverse range of players. One important category to consider is **role players**—both retired and active. These players may not have the accolades of Hall of Famers or borderline candidates, but they provide essential contrast in the dataset. 

## Importance of Role Players in the Dataset

### 1. **Establishing a Clear Decision Boundary**
Including role players helps create a clearer distinction between Hall of Famers and non-Hall of Famers. Since HOF players typically have superior career achievements, comparing them against role players who lack similar credentials strengthens the predictive power of the model.

### 2. **Improving Model Generalization**
A dataset without role players might only include superstars or borderline candidates, making it harder for the model to generalize when predicting outcomes for lower-tier players. By including role players, the model learns to differentiate between elite and non-elite careers more effectively.

### 3. **Handling Class Imbalance**
The number of HOF players is relatively small compared to all NBA players. Without role players, the dataset might skew too heavily toward higher-caliber players. Role players balance the dataset, ensuring the model does not overfit to superstar-level statistics.

### 4. **Realistic Predictions for Active and Future Players**
By training the model on a broad spectrum of player types, including those with minimal accolades, the model becomes more robust in predicting Hall of Fame probabilities for current and future players.

### 5. **Accounting for Longevity and Contribution**
Role players often have long careers due to their niche skills, even without accumulating major awards. Understanding how longevity, efficiency, and team contributions factor into HOF consideration is valuable for the model.

## Selection Criteria for Role Players
Role players included in the dataset should come from different backgrounds:
- **Retired Role Players:** Players with long careers but few accolades (e.g., Shane Battier, J.R. Smith).
- **Active Role Players:** Players contributing significantly to their teams but unlikely to be Hall of Famers (e.g., Patrick Beverley).
- **Below-Average Players:** Players who had shorter careers or minimal impact (e.g., Michael Carter-Williams, Thon Maker).

## Conclusion
Including role players in the dataset enhances the model’s ability to distinguish between different tiers of NBA players, improving classification accuracy and predictive reliability. This approach ensures a more comprehensive and realistic assessment of a player's probability of making it to the Hall of Fame.


In [1]:
#Import Libraries
import pandas as pd
from bs4 import BeautifulSoup
import time
import requests

In [27]:
#Scrape career stats for players

#Player Url's from basketball reference 
player_urls = [
    "https://www.basketball-reference.com/players/w/waltolu01.html", #Luke Walton
    "https://www.basketball-reference.com/players/y/youngni01.html", #Nick Young
    "https://www.basketball-reference.com/players/s/smithjr01.html", #J.R. Smith
    "https://www.basketball-reference.com/players/b/battish01.html", #Shane Battier
    "https://www.basketball-reference.com/players/b/bellra01.html", #Raja Bell
    "https://www.basketball-reference.com/players/n/nelsoja01.html", #Jameer Nelson
    "https://www.basketball-reference.com/players/w/willima01.html", #Mo Williams
    "https://www.basketball-reference.com/players/f/fishede01.html", #Derek Fisher
    "https://www.basketball-reference.com/players/h/hinriki01.html", #Kirk Hinrich
    "https://www.basketball-reference.com/players/d/dudleja01.html", #Jared Dudley
    "https://www.basketball-reference.com/players/c/chalmma01.html", #Mario Chalmers
    "https://www.basketball-reference.com/players/b/barbole01.html", #Leandro Barbosa
    "https://www.basketball-reference.com/players/l/leeco01.html", #Courtney Lee
    "https://www.basketball-reference.com/players/t/terryja01.html", #Jason Terry
    "https://www.basketball-reference.com/players/a/arizatr01.html", #Trevor Ariza
    "https://www.basketball-reference.com/players/j/jefferi01.html", #Richard Jefferson
    "https://www.basketball-reference.com/players/b/barnema02.html", #Matt Barnes
    "https://www.basketball-reference.com/players/j/jacksst02.html", #Stephen Jackson
    "https://www.basketball-reference.com/players/p/pachuza01.html", #Zaza Pachulia
    "https://www.basketball-reference.com/players/p/perkike01.html", #Kendrick Perkins
    "https://www.basketball-reference.com/players/d/dalemsa01.html", #Samuel Dalembert
    "https://www.basketball-reference.com/players/d/diawbo01.html", #Boris Diaw
    "https://www.basketball-reference.com/players/m/millspa02.html", #Patty Mills
    "https://www.basketball-reference.com/players/k/kerrst01.html", #Steve Kerr
    "https://www.basketball-reference.com/players/l/livinsh01.html", #Shaun Livingston
    "https://www.basketball-reference.com/players/h/harpero01.html", #Ron Harper
    "https://www.basketball-reference.com/players/s/smithke01.html", #Kenny Smith
    "https://www.basketball-reference.com/players/r/rileypa01.html", #Pat Riley
    "https://www.basketball-reference.com/players/a/aingeda01.html", #Danny Ainge
    "https://www.basketball-reference.com/players/c/curryde01.html", #Dell Curry
    "https://www.basketball-reference.com/players/c/crawfja01.html", #Jamal Crawford
    "https://www.basketball-reference.com/players/w/willilo02.html", #Lou Williams 
    "https://www.basketball-reference.com/players/j/johnsed03.html", #Eddie Johnson
    "https://www.basketball-reference.com/players/m/mckieaa01.html", #Aaron McKie
    "https://www.basketball-reference.com/players/g/greenda02.html" #Danny Green
]

#Headers for request 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url, headers=headers)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                per = float(stat_finder(lambda x: x and 'Player Efficiency Rating' in x))
                win_shares = float(stat_finder(lambda x: x and 'Win Shares' in x))

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and 'All-NBA' in s)
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'data-tip': lambda x: x and 'MVP' in x and 'AS MVP' not in x and 'IST MVP' not in x and 'Finals MVP' not in x and 'ECF MVP' not in x and 'WCF MVP' not in x})
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and 'NBA Champ' in s) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Not inducted in the HOF yet
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_one = pd.DataFrame(all_player_data)

In [29]:
df_one

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Luke Walton,SF,564,10,4.7,2.8,2.3,11.7,42.9,71.5,...,0,0,0,0,0,2,0,0,0,0
1,Nick Young,SG,720,12,11.4,2.0,1.0,12.9,41.8,83.6,...,0,0,0,0,0,1,0,0,0,0
2,J.R. Smith,SG,977,16,12.4,3.1,2.1,13.9,41.9,73.3,...,0,0,0,0,0,2,0,0,0,0
3,Shane Battier,SG,977,13,8.6,4.2,1.8,12.6,43.7,74.3,...,0,0,2,1,0,2,0,0,0,0
4,Raja Bell,SG,706,12,9.9,2.8,1.7,10.9,43.4,79.9,...,0,0,2,0,0,0,0,0,0,0
5,Jameer Nelson,PG,878,14,11.3,3.0,5.1,14.4,43.6,81.0,...,1,0,0,1,0,0,0,0,0,0
6,Mo Williams,PG,818,13,13.2,2.8,4.9,15.0,43.4,87.1,...,1,0,0,0,0,1,0,0,0,0
7,Derek Fisher,PG,1287,18,8.3,2.1,3.0,11.7,39.9,81.7,...,0,0,0,0,0,5,0,0,0,0
8,Kirk Hinrich,PG,879,13,10.9,2.9,4.8,12.8,41.1,80.0,...,0,0,1,1,0,0,0,0,0,0
9,Jared Dudley,PF,904,14,7.3,3.2,1.5,12.5,46.3,73.2,...,0,0,0,0,0,1,0,0,0,0


In [2]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/p/piercri01.html", #Ricky Pierce
    "https://www.basketball-reference.com/players/s/starkjo01.html", #John Starks
    "https://www.basketball-reference.com/players/m/masonan01.html", #Anthony Mason
    "https://www.basketball-reference.com/players/m/mbahalu01.html", #Luc Mbah a Moute
    "https://www.basketball-reference.com/players/k/kirilan01.html", #Andrei Kirilenko
    "https://www.basketball-reference.com/players/k/korveky01.html", #Kyle Korver
    "https://www.basketball-reference.com/players/a/anderch01.html", #Chris Andersen
    "https://www.basketball-reference.com/players/g/greenje02.html", #Jeff Green
    "https://www.basketball-reference.com/players/i/ilyaser01.html", #Ersan İlyasova
    "https://www.basketball-reference.com/players/b/brownsh01.html", #Shannon Brown
    "https://www.basketball-reference.com/players/c/chrisdo01.html", #Doug Christie
    "https://www.basketball-reference.com/players/j/jeffrja01.html", #Jared Jeffries
    "https://www.basketball-reference.com/players/n/novakst01.html", #Steve Novak
    "https://www.basketball-reference.com/players/r/redicjj01.html", #JJ Redick
    "https://www.basketball-reference.com/players/p/poseyja01.html", #James Posey
    "https://www.basketball-reference.com/players/k/kaponja01.html", #Jason Kapono
    "https://www.basketball-reference.com/players/d/davisan01.html", #Antonio Davis	
    "https://www.basketball-reference.com/players/m/mohamna01.html", #Nazr Mohammed
    "https://www.basketball-reference.com/players/h/hasleud01.html", #Udonis Haslem
    "https://www.basketball-reference.com/players/t/thomaku01.html", #Kurt Thomas
    "https://www.basketball-reference.com/players/g/gibsota01.html", #Taj Gibson
    "https://www.basketball-reference.com/players/j/jacksbo01.html", #Bobby Jackson
    "https://www.basketball-reference.com/players/s/stuckro01.html", #Rodney Stuckey
    "https://www.basketball-reference.com/players/g/gordobe01.html", #Ben Gordon
    "https://www.basketball-reference.com/players/c/clarkjo01.html", #Jordan Clarkson
    "https://www.basketball-reference.com/players/l/larkish01.html", #Shane Larkin
    "https://www.basketball-reference.com/players/t/templga01.html", #Garrett Temple
    "https://www.basketball-reference.com/players/b/booketr01.html", #Trevor Booker
    "https://www.basketball-reference.com/players/b/beverpa01.html", #Patrick Beverley
    "https://www.basketball-reference.com/players/a/aminual01.html", #Al-Farouq Aminu
    "https://www.basketball-reference.com/players/c/cagemi01.html", #Michael Cage
    "https://www.basketball-reference.com/players/f/ferryda01.html", #Danny Ferry
    "https://www.basketball-reference.com/players/g/garripa01.html", #Pat Garrity 
    "https://www.basketball-reference.com/players/o/outlabo01.html", #Bo Outlaw	
    "https://www.basketball-reference.com/players/j/jonespo01.html", #Popeye Jones
    "https://www.basketball-reference.com/players/o/onealro01.html" #Royce O'Neal
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_two = pd.DataFrame(all_player_data)

In [4]:
df_two

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Ricky Pierce,PG,969,16,14.9,2.4,1.9,17.7,49.3,87.5,...,1,0,0,0,0,0,0,0,0,0
1,John Starks,SG,866,13,12.5,2.5,3.6,14.0,41.2,76.9,...,1,0,1,0,0,0,0,0,0,0
2,Anthony Mason,PF,882,13,10.9,8.3,3.4,14.6,50.9,70.9,...,1,1,1,0,0,0,0,0,0,0
3,Luc Mbah a Moute,SF,689,12,6.4,4.1,0.9,10.7,45.4,65.9,...,0,0,0,0,0,0,0,0,0,0
4,Andrei Kirilenko,SF,797,13,11.8,5.5,2.7,18.7,47.4,75.4,...,1,0,3,1,0,0,0,0,0,0
5,Kyle Korver,SG,1232,17,9.7,3.0,1.7,12.8,44.2,87.7,...,1,0,0,0,0,0,0,0,0,0
6,Chris Andersen,PF,695,15,5.4,5.0,0.5,16.7,53.2,65.4,...,0,0,0,0,0,1,0,0,0,0
7,Jeff Green,PF,1208,16,11.9,4.0,1.5,13.1,45.0,80.4,...,0,0,0,1,0,1,0,0,0,0
8,Ersan İlyasova,PF,825,13,10.1,5.6,1.1,15.4,44.3,77.7,...,0,0,0,0,0,0,0,0,0,0
9,Shannon Brown,SG,408,9,7.6,1.9,1.1,12.4,42.0,80.7,...,0,0,0,0,0,2,0,0,0,0


In [6]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/w/willija02.html", #Jason Williams
    "https://www.basketball-reference.com/players/i/ibakase01.html", #Serge Ibaka
    "https://www.basketball-reference.com/players/m/millspa01.html", #Paul Millsap
    "https://www.basketball-reference.com/players/w/westda01.html", #David West
    "https://www.basketball-reference.com/players/h/howarjo01.html", #Josh Howard
    "https://www.basketball-reference.com/players/h/humphkr01.html", #Kris Humphries
    "https://www.basketball-reference.com/players/h/harrilu01.html", #Lucious Harris
    "https://www.basketball-reference.com/players/b/biyombi01.html", #Bismack Biyombo
    "https://www.basketball-reference.com/players/b/belinma01.html", #Marco Belinelli
    "https://www.basketball-reference.com/players/s/singlky01.html", #Kyle Singler
    "https://www.basketball-reference.com/players/w/wallage01.html", #Gerald Wallace
    "https://www.basketball-reference.com/players/r/reidjr01.html", #J.R. Reid
    "https://www.basketball-reference.com/players/t/tollian01.html", #Anthony Tolliver
    "https://www.basketball-reference.com/players/n/nelsoja01.html", #Jameer Nelson
    "https://www.basketball-reference.com/players/b/boganke01.html", #Keith Bogans
    "https://www.basketball-reference.com/players/b/boguemu01.html", #Muggsy Bogues
    "https://www.basketball-reference.com/players/t/thomati01.html", #Tim Thomas
    "https://www.basketball-reference.com/players/w/wilcoch01.html", #Chris Wilcox
    "https://www.basketball-reference.com/players/v/vujacsa01.html", #Sasha Vujačić
    "https://www.basketball-reference.com/players/d/dorsejo01.html", #Joey Dorsey
    "https://www.basketball-reference.com/players/m/mcadoja01.html", #James Michael McAdoo
    "https://www.basketball-reference.com/players/h/hayesch01.html", #Chuck Hayes
    "https://www.basketball-reference.com/players/u/udrihbe01.html", #Beno Udrih
    "https://www.basketball-reference.com/players/o/ollieke01.html", #Kevin Ollie
    "https://www.basketball-reference.com/players/r/rossqu01.html", #Quinton Ross
    "https://www.basketball-reference.com/players/s/shawbr01.html", #Brain Shaw
    "https://www.basketball-reference.com/players/v/varejan01.html", #Anderson Varejão
    "https://www.basketball-reference.com/players/w/westde01.html", #Delonte West
    "https://www.basketball-reference.com/players/i/ilgauzy01.html", #Zydrunas Ilgauskas
    "https://www.basketball-reference.com/players/g/gibsoda01.html", #Daniel Gibson
    "https://www.basketball-reference.com/players/h/hicksjj01.html" #J.J. Hickson
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_three = pd.DataFrame(all_player_data)

In [9]:
df_three

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Jason Williams,PG,788,12,10.5,2.3,5.9,14.2,39.8,81.3,...,0,0,0,1,0,1,0,0,0,0
1,Serge Ibaka,PF,919,14,12.0,7.1,0.8,17.1,51.3,75.7,...,0,0,3,0,0,1,0,0,0,0
2,Paul Millsap,PF,1085,16,13.4,7.1,2.2,18.7,48.9,73.6,...,4,0,1,1,0,0,0,0,0,0
3,David West,PF,1034,15,13.6,6.4,2.2,18.6,49.5,81.7,...,2,0,0,0,0,2,0,0,0,0
4,Josh Howard,SF,507,10,14.3,5.7,1.6,16.7,44.8,77.0,...,1,0,0,1,0,0,0,0,0,0
5,Kris Humphries,SF,800,13,6.7,5.4,0.7,15.4,46.3,70.0,...,0,0,0,0,0,0,0,0,0,0
6,Lucious Harris,PG,800,12,7.2,2.3,1.4,12.8,42.6,79.0,...,0,0,0,0,0,0,0,0,0,0
7,Bismack Biyombo,C,851,13,5.1,5.9,0.7,13.1,53.7,55.1,...,0,0,0,0,0,0,0,0,0,0
8,Marco Belinelli,SG,860,13,9.7,2.1,1.7,12.2,42.4,84.6,...,0,0,0,0,0,1,0,0,0,0
9,Kyle Singler,SF,356,6,6.5,2.9,0.8,9.5,41.8,78.6,...,0,0,0,1,0,0,0,0,0,0


In [12]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/w/willike02.html", #Kevin Willis
    "https://www.basketball-reference.com/players/s/silaspa01.html", #Paul Silas
    "https://www.basketball-reference.com/players/m/majerda01.html", #Dan Majerle
    "https://www.basketball-reference.com/players/a/adamsal01.html", #Alvan Adams
    "https://www.basketball-reference.com/players/b/bibbymi01.html", #Mike Bibby
    "https://www.basketball-reference.com/players/b/barrybr01.html", #Brent Barry
    "https://www.basketball-reference.com/players/w/williho01.html", #Hot Rod Williams 
    "https://www.basketball-reference.com/players/h/hairsha01.html", #Happy Hairston
    "https://www.basketball-reference.com/players/r/riverdo01.html", #Doc Rivers
    "https://www.basketball-reference.com/players/r/riverau01.html", #Austin Rivers
    "https://www.basketball-reference.com/players/j/jonesca01.html", #Caldwell Jones
    "https://www.basketball-reference.com/players/b/brandte01.html", #Terrell Brandon
    "https://www.basketball-reference.com/players/w/willima02.html", #Marvin Williams
    "https://www.basketball-reference.com/players/l/laettch01.html", #Christian Laettner
    "https://www.basketball-reference.com/players/b/boozebo01.html", #Bob Boozer 
    "https://www.basketball-reference.com/players/r/robisda01.html", #Dave Robisch
    "https://www.basketball-reference.com/players/k/knighbi01.html", #Billy Knight
    "https://www.basketball-reference.com/players/r/rollitr01.html", #Tree Rollins
    "https://www.basketball-reference.com/players/t/turkohe01.html", #Hedo Türkoğlu
    "https://www.basketball-reference.com/players/b/brownfr01.html", #Fred Brown
    "https://www.basketball-reference.com/players/d/davisba01.html", #Baron Davis
    "https://www.basketball-reference.com/players/c/campbel01.html", #Elden Campbell
    "https://www.basketball-reference.com/players/a/anderke01.html", #Kenny Anderson
    "https://www.basketball-reference.com/players/d/donalja01.html", #James Donaldson
    "https://www.basketball-reference.com/players/k/kerrre01.html", #Red Kerr
    "https://www.basketball-reference.com/players/m/mckeyde01.html", #Derrick McKey
    "https://www.basketball-reference.com/players/o/owensto01.html", #Tom Owens
    "https://www.basketball-reference.com/players/l/larusru01.html", #Rudy LaRusso
    "https://www.basketball-reference.com/players/g/gortama01.html", #Marcin Gortat
    "https://www.basketball-reference.com/players/w/wesleda01.html", #David Wesley
    "https://www.basketball-reference.com/players/l/leverfa01.html", #Fat Lever
    "https://www.basketball-reference.com/players/m/millemi01.html" #Mike Miller
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_four = pd.DataFrame(all_player_data)

In [14]:
df_four

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Kevin Willis,PF,1424,21,12.1,8.4,0.9,15.7,48.7,71.3,...,1,1,0,0,0,1,0,0,0,0
1,Paul Silas,PF,1254,16,9.4,9.9,2.1,13.9,43.2,67.3,...,2,0,5,0,0,3,0,0,0,0
2,Dan Majerle,SG,955,14,11.4,4.5,2.9,13.8,43.1,74.1,...,3,0,2,0,0,0,0,0,0,0
3,Alvan Adams,PF,988,13,14.1,7.0,4.1,18.3,49.8,78.8,...,1,0,0,1,0,0,1,0,0,0
4,Mike Bibby,PG,1001,14,14.7,3.1,5.5,16.1,43.6,80.2,...,0,0,0,1,0,0,0,0,0,0
5,Brent Barry,SG,912,14,9.3,3.0,3.2,15.7,46.0,82.3,...,0,0,0,1,0,2,0,0,0,0
6,Hot Rod Williams,PF,887,13,11.0,6.8,1.8,15.6,48.0,72.6,...,0,0,0,1,0,0,0,0,0,0
7,Happy Hairston,SF,776,11,14.8,10.3,1.6,16.5,47.8,74.1,...,0,0,0,0,0,1,0,0,0,0
8,Doc Rivers,PG,864,13,10.9,3.0,5.7,16.6,44.4,78.4,...,1,0,0,0,0,0,0,0,0,0
9,Austin Rivers,PG,707,11,8.5,2.0,2.1,9.7,41.9,65.3,...,0,0,0,0,0,0,0,0,0,0


In [17]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/s/smithjo02.html", #Joe Smith
    "https://www.basketball-reference.com/players/f/favorde01.html", #Derrick Favors
    "https://www.basketball-reference.com/players/l/longgr01.html", #Grant Long
    "https://www.basketball-reference.com/players/s/smithra01.html", #Randy Smith
    "https://www.basketball-reference.com/players/e/edwarja01.html", #James Edwards
    "https://www.basketball-reference.com/players/m/marinja01.html", #Jack Martin
    "https://www.basketball-reference.com/players/t/thompmy01.html", #Mychal Thompson
    "https://www.basketball-reference.com/players/m/marshdo01.html", #Donyell Marshall
    "https://www.basketball-reference.com/players/j/johnsmi01.html", #Mickey Johnson 
    "https://www.basketball-reference.com/players/d/dunlemi02.html", #Mike Dunleavy
    "https://www.basketball-reference.com/players/n/natersw01.html", #Swen Nater (Check)
    "https://www.basketball-reference.com/players/w/weathcl01.html", #Clarence Weatherspoon
    "https://www.basketball-reference.com/players/c/caldejo01.html", #Jose Calderon
    "https://www.basketball-reference.com/players/s/snydedi01.html", #Dick Synder
    "https://www.basketball-reference.com/players/g/gilliar01.html", #Armen Gilliam
    "https://www.basketball-reference.com/players/g/greenjo01.html", #Johhny Green
    "https://www.basketball-reference.com/players/m/mixst01.html", #Steve Mix
    "https://www.basketball-reference.com/players/e/eakinji01.html", #Jim Eakins (check)
    "https://www.basketball-reference.com/players/n/nattca01.html", #Calvin Natt
    "https://www.basketball-reference.com/players/b/busedo01.html", #Don Buse (check)
    "https://www.basketball-reference.com/players/r/raycl01.html", #Clifford Ray
    "https://www.basketball-reference.com/players/c/corbity01.html", #Tyrone Corbin
    "https://www.basketball-reference.com/players/j/johnsav01.html", #Avery Johnson
    "https://www.basketball-reference.com/players/d/dischte01.html", #Terry Dischinger
    "https://www.basketball-reference.com/players/h/hillty01.html", #Tyrone Hill
    "https://www.basketball-reference.com/players/m/mccraro01.html", #Rodney McCray
    "https://www.basketball-reference.com/players/g/gminsmi01.html", #Mike Gminski
    "https://www.basketball-reference.com/players/r/reddmi01.html", #Michael Redd
    "https://www.basketball-reference.com/players/v/vanexni01.html", #Nick Van Excel
    "https://www.basketball-reference.com/players/s/searske01.html", #Kenny Sears
    "https://www.basketball-reference.com/players/s/stoudda01.html", #Damon Stoudamire
    "https://www.basketball-reference.com/players/a/anderni01.html", #Nick Anderson
    "https://www.basketball-reference.com/players/h/houstal01.html", #Allan Houston
    "https://www.basketball-reference.com/players/j/johnsam01.html", #Amir Johnson
    "https://www.basketball-reference.com/players/b/beckby01.html", #Byron Beck
    "https://www.basketball-reference.com/players/h/harride01.html" #Devin Harris
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_five = pd.DataFrame(all_player_data)

In [19]:
df_five

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
0,Joe Smith,PF,1030,16,10.9,6.4,1.0,15.4,45.5,79.0,...,0,0,0,1,0,0,0,0,0,0
1,Derrick Favors,PF,790,12,10.6,7.1,1.1,18.8,53.4,66.3,...,0,0,0,1,0,0,0,0,0,0
2,Grant Long,PF,1003,15,9.5,6.1,1.7,13.3,46.7,76.1,...,0,0,0,0,0,0,0,0,0,0
3,Randy Smith,SG,976,12,16.7,3.7,4.6,16.5,47.0,78.1,...,2,1,0,0,0,0,0,0,0,0
4,James Edwards,C,1168,19,12.7,5.1,1.3,14.7,49.5,69.8,...,0,0,0,0,0,3,0,0,0,0
5,Jack Marin,SF,849,11,14.8,5.2,2.1,15.2,46.5,84.3,...,2,0,0,1,0,0,0,0,0,0
6,Mychal Thompson,PF,935,12,13.7,7.4,2.3,15.7,50.4,65.5,...,0,0,0,1,0,2,0,0,0,0
7,Donyell Marshall,SF,957,15,11.2,6.7,1.4,16.8,43.5,73.1,...,0,0,0,1,0,0,0,0,0,0
8,Mickey Johnson,PF,904,12,14.1,7.2,3.0,16.9,44.9,80.0,...,0,0,0,0,0,0,0,0,0,0
9,Mike Dunleavy,SF,986,15,11.2,4.3,2.2,13.9,44.1,80.3,...,0,0,0,0,0,0,0,0,0,0


In [24]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/o/okurme01.html", #Mehmet Okur
    "https://www.basketball-reference.com/players/m/malonje01.html", #Jeff Malone
    "https://www.basketball-reference.com/players/f/francst01.html", #Steve Francis 
    "https://www.basketball-reference.com/players/b/ballagr01.html", #Greg Ballard
    "https://www.basketball-reference.com/players/p/paxsoji02.html", #Jim Paxson
    "https://www.basketball-reference.com/players/w/wedmasc01.html", #Scott Wedman
    "https://www.basketball-reference.com/players/d/davisbr01.html", #Brad Davis
    "https://www.basketball-reference.com/players/s/szczewa02.html", #Wally Szczerbiak
    "https://www.basketball-reference.com/players/w/woolror01.html", #Orlando Woolridge
    "https://www.basketball-reference.com/players/k/kanteen01.html", #Enes Freedom
    "https://www.basketball-reference.com/players/d/dampier01.html", #Erick Dampier
    "https://www.basketball-reference.com/players/t/teaguje01.html", #Jeff Teague
    "https://www.basketball-reference.com/players/m/moblecu01.html", #Cuttino Mobley
    "https://www.basketball-reference.com/players/f/flemive01.html", #Vern Fleming
    "https://www.basketball-reference.com/players/r/russeca01.html", #Cazzie Russell
    "https://www.basketball-reference.com/players/s/shortpu01.html", #Purvis Short
    "https://www.basketball-reference.com/players/l/laceysa01.html", #Sam Lacey
    "https://www.basketball-reference.com/players/b/bogutan01.html", #Andrew Bogut
    "https://www.basketball-reference.com/players/f/floydsl01.html", #Sleepy Ford
    "https://www.basketball-reference.com/players/m/monrogr01.html", #Greg Monroe
    "https://www.basketball-reference.com/players/n/newlimi01.html", #Mike Newlin
    "https://www.basketball-reference.com/players/r/robbire01.html", #Red Robbins
    "https://www.basketball-reference.com/players/m/murphtr01.html", #Troy Murphy
    "https://www.basketball-reference.com/players/n/newmajo01.html", #Johnny Newman 
    "https://www.basketball-reference.com/players/m/mcmilna01.html", #Nate McMillan
    "https://www.basketball-reference.com/players/q/quinnch01.html", #Chris Quinn
    "https://www.basketball-reference.com/players/u/udohek01.html", #Ekpe Udoh
    "https://www.basketball-reference.com/players/u/udokaim01.html", #Ime Udoka
    "https://www.basketball-reference.com/players/v/valanjo01.html", #Jonas Valančiūnas
    "https://www.basketball-reference.com/players/y/youngth01.html", #Thaddeus Young
    "https://www.basketball-reference.com/players/y/youngda01.html", #Danny Young
    "https://www.basketball-reference.com/players/y/youngsa01.html", #Sam Young
    "https://www.basketball-reference.com/players/z/zelleco01.html", #Cody Zeller
    "https://www.basketball-reference.com/players/z/zellety01.html", #Tyler Zeller
    "https://www.basketball-reference.com/players/a/anthogr01.html", #Greg Anthony
    "https://www.basketball-reference.com/players/a/anthojo01.html" #Joel Anthony
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_six = pd.DataFrame(all_player_data)

In [30]:
#Concat dataframe
role_players = pd.concat([df_one, df_two, df_three, df_four, df_five, df_six], ignore_index=True)

In [31]:
#Check shape
role_players.shape

(206, 21)

In [33]:
#Check duplicates
duplicates = role_players.duplicated()
duplicates.value_counts()

False    205
True       1
Name: count, dtype: int64

In [34]:
#Check duplicate
role_players[duplicates]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
84,Jameer Nelson,PG,878,14,11.3,3.0,5.1,14.4,43.6,81.0,...,1,0,0,1,0,0,0,0,0,0


In [35]:
#Check Jameer Nelson
role_players[role_players['Name'].str.startswith('Jameer')]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
5,Jameer Nelson,PG,878,14,11.3,3.0,5.1,14.4,43.6,81.0,...,1,0,0,1,0,0,0,0,0,0
84,Jameer Nelson,PG,878,14,11.3,3.0,5.1,14.4,43.6,81.0,...,1,0,0,1,0,0,0,0,0,0


In [36]:
#Drop duplicates
role_players = role_players.drop_duplicates()

#Recheck duplicates
duplicates = role_players.duplicated()
duplicates.value_counts()

False    205
Name: count, dtype: int64

In [38]:
#Check unique players
role_players['Name'].unique()

array(['Luke Walton', 'Nick Young', 'J.R. Smith', 'Shane Battier',
       'Raja Bell', 'Jameer Nelson', 'Mo Williams', 'Derek Fisher',
       'Kirk Hinrich', 'Jared Dudley', 'Mario Chalmers',
       'Leandro Barbosa', 'Courtney Lee', 'Jason Terry', 'Trevor Ariza',
       'Richard Jefferson', 'Matt Barnes', 'Stephen Jackson',
       'Zaza Pachulia', 'Kendrick Perkins', 'Samuel Dalembert',
       'Boris Diaw', 'Patty Mills', 'Steve Kerr', 'Shaun Livingston',
       'Ron Harper', 'Kenny Smith', 'Pat Riley', 'Danny Ainge',
       'Dell Curry', 'Jamal Crawford', 'Lou Williams', 'Eddie Johnson',
       'Aaron McKie', 'Danny Green', 'Ricky Pierce', 'John Starks',
       'Anthony Mason', 'Luc Mbah a Moute', 'Andrei Kirilenko',
       'Kyle Korver', 'Chris Andersen', 'Jeff Green', 'Ersan İlyasova',
       'Shannon Brown', 'Doug Christie', 'Jared Jeffries', 'Steve Novak',
       'JJ Redick', 'James Posey', 'Jason Kapono', 'Antonio Davis',
       'Nazr Mohammed', 'Udonis Haslem', 'Kurt Thomas', '

In [None]:
#Paste player links
player_urls = [
   "https://www.basketball-reference.com/players/r/rondora01.html", #Rajon Rondo
   "https://www.basketball-reference.com/players/d/dragigo01.html", #Goran Dragić
   "https://www.basketball-reference.com/players/w/watsoea01.html", #Earl Watson
   "https://www.basketball-reference.com/players/l/leeda02.html", #David Lee
   "https://www.basketball-reference.com/players/b/brownde02.html", #Devin Brown
   "https://www.basketball-reference.com/players/e/elsonfr01.html", #Francisco Elson
   "https://www.basketball-reference.com/players/p/petrojo01.html", #Johan Petro
   "https://www.basketball-reference.com/players/b/brezepr01.html", #Primož Brezec
   "https://www.basketball-reference.com/players/c/cookbr01.html", #Brain Cook
   "https://www.basketball-reference.com/players/c/cookqu01.html", #Quinn Cook
   "https://www.basketball-reference.com/players/c/collija04.html", #Jason Collins
   "https://www.basketball-reference.com/players/h/hassetr01.html", #Trenton Hassell
   "https://www.basketball-reference.com/players/n/najered01.html", #Eduardo Nájera
   "https://www.basketball-reference.com/players/a/alstora01.html", #Rafer Alston
   "https://www.basketball-reference.com/players/s/smithja02.html", #Jason Smith
   "https://www.basketball-reference.com/players/s/songada01.html", #Darius Songaila
   "https://www.basketball-reference.com/players/e/elyme01.html", #Melvin Ely
   "https://www.basketball-reference.com/players/f/fosteje01.html", #Jeff Foster
   "https://www.basketball-reference.com/players/e/evansre01.html", #Reggie Evans
   "https://www.basketball-reference.com/players/b/buckngr01.html", #Greg Buckner
   "https://www.basketball-reference.com/players/d/diopde01.html", #DeSagana Diop
   "https://www.basketball-reference.com/players/b/bowenry01.html", #Ryan Bowen
   "https://www.basketball-reference.com/players/f/foylead01.html", #Adonal Foyle
   "https://www.basketball-reference.com/players/s/singlja01.html", #James Singleton
   "https://www.basketball-reference.com/players/l/longllu01.html", #Luc Longley
   "https://www.basketball-reference.com/players/j/jonesch01.html", #Charles Jones
   "https://www.basketball-reference.com/players/m/madsema01.html", #Mark Madsen
   "https://www.basketball-reference.com/players/s/snower01.html", #Eric Snow
   "https://www.basketball-reference.com/players/p/przybjo01.html", #Joel Przybilla
   "https://www.basketball-reference.com/players/b/brownkw01.html" #Kwame Brown
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_seven = pd.DataFrame(all_player_data)

In [None]:
#Add to main dataframe
df = pd.read_csv("NBA Role Players.csv")
df = pd.concat([df, df_seven], ignore_index=True)

#View tail
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
230,Charles Jones,PF,726,15,2.5,4.5,0.9,9.2,48.0,61.8,...,0,0,0,0,0,1,0,0,0,0
231,Mark Madsen,PF,453,9,2.2,2.6,0.4,8.1,45.7,52.7,...,0,0,0,0,0,2,0,0,0,0
232,Eric Snow,PG,846,13,6.8,2.5,5.0,12.2,42.4,76.3,...,0,0,1,0,0,0,0,0,0,0
233,Joel Przybilla,C,592,13,3.9,6.2,0.4,12.0,55.2,55.7,...,0,0,0,0,0,0,0,0,0,0
234,Kwame Brown,PF,607,12,6.6,5.5,0.9,12.5,49.2,57.0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
df.shape

(235, 21)

In [22]:
#Save to csv file
df.to_csv("NBA Role Players.csv", index=False)

In [3]:
df = pd.read_csv('NBA Role Players.csv')
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
230,Charles Jones,PF,726,15,2.5,4.5,0.9,9.2,48.0,61.8,...,0,0,0,0,0,1,0,0,0,0
231,Mark Madsen,PF,453,9,2.2,2.6,0.4,8.1,45.7,52.7,...,0,0,0,0,0,2,0,0,0,0
232,Eric Snow,PG,846,13,6.8,2.5,5.0,12.2,42.4,76.3,...,0,0,1,0,0,0,0,0,0,0
233,Joel Przybilla,C,592,13,3.9,6.2,0.4,12.0,55.2,55.7,...,0,0,0,0,0,0,0,0,0,0
234,Kwame Brown,PF,607,12,6.6,5.5,0.9,12.5,49.2,57.0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/m/morriad01.html", #Adam Morrison
    "https://www.basketball-reference.com/players/o/olowomi01.html", #Michael Olowokandi
    "https://www.basketball-reference.com/players/b/bennean01.html", #Anthony Bennett
    "https://www.basketball-reference.com/players/a/alexajo01.html", #Joe Alexander
    "https://www.basketball-reference.com/players/m/milicda01.html", #Darko Miličić
    "https://www.basketball-reference.com/players/t/thabeha01.html", #Hasheem Thabeet
    "https://www.basketball-reference.com/players/v/veselja01.html", #Jan Veselý
    "https://www.basketball-reference.com/players/t/tskitni01.html", #Nikoloz Tskitishvili
    "https://www.basketball-reference.com/players/b/bradlsh01.html", #Shawn Bradely
    "https://www.basketball-reference.com/players/j/jianlyi01.html", #Yi Jianlian
    "https://www.basketball-reference.com/players/o/odengr01.html", #Greg Oden
    "https://www.basketball-reference.com/players/b/bendedr01.html", #Dragan Bender
    "https://www.basketball-reference.com/players/r/robinth01.html", #Thomas Robinson
    "https://www.basketball-reference.com/players/b/beaslmi01.html", #Michael Beasley
    "https://www.basketball-reference.com/players/w/wagneda02.html", #Dejaun Wagner
    "https://www.basketball-reference.com/players/s/swiftst01.html", #Stromile Swift
    "https://www.basketball-reference.com/players/f/flynnjo01.html", #Jonny Flynn
    "https://www.basketball-reference.com/players/p/parkesm01.html", #Smush Parker
    "https://www.basketball-reference.com/players/v/villach01.html", #Charlie Villanueva
    "https://www.basketball-reference.com/players/t/telfase01.html", #Sebastian Telfair
    "https://www.basketball-reference.com/players/j/jeffrja01.html", #Jared Jeffries
    "https://www.basketball-reference.com/players/d/doubyqu01.html", #Quincy Douby
    "https://www.basketball-reference.com/players/m/mccanra01.html", #Rashad McCants
    "https://www.basketball-reference.com/players/f/fredeji01.html", #Jimmer Fredette
    "https://www.basketball-reference.com/players/s/sullija01.html", #Jared Sullinger
    "https://www.basketball-reference.com/players/t/tylerje01.html", #Jeremy Tyler
    "https://www.basketball-reference.com/players/w/willish02.html", #Shelden Williams
    "https://www.basketball-reference.com/players/e/eyengch01.html", #Christian Eyenga
    "https://www.basketball-reference.com/players/d/dioguik01.html", #Ike Diogu
    "https://www.basketball-reference.com/players/w/wrighdo01.html", #Dorell Wright
    "https://www.basketball-reference.com/players/c/carnero01.html", #Rodney Carney
    "https://www.basketball-reference.com/players/r/richaja01.html", #Jason Richarson
    "https://www.basketball-reference.com/players/h/harrial01.html", #Al Harrington
    "https://www.basketball-reference.com/players/l/leoname01.html", #Meyers Leonard
    "https://www.basketball-reference.com/players/r/robinna01.html", #Nate Robinson
    "https://www.basketball-reference.com/players/m/maggeco01.html", #Corey Maggette
    "https://www.basketball-reference.com/players/h/hendege01.html", #Gerald Henderson
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_eight = pd.DataFrame(all_player_data)

In [6]:
df_eight.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
32,Al Harrington,PF,981,16,13.5,5.6,1.7,14.3,44.4,72.7,...,0,0,0,0,0,0,0,0,0,0
33,Meyers Leonard,PF,456,10,5.6,3.9,0.9,12.0,48.2,81.2,...,0,0,0,0,0,0,0,0,0,0
34,Nate Robinson,PG,618,11,11.0,2.3,3.0,15.6,42.3,79.6,...,0,0,0,0,0,0,0,0,0,0
35,Corey Maggette,SF,827,14,16.0,4.9,2.1,17.9,45.3,82.2,...,0,0,0,0,0,0,0,0,0,0
36,Gerald Henderson,PG,871,13,8.9,1.7,3.6,13.4,47.2,77.6,...,0,0,0,0,0,3,0,0,0,0


In [7]:
#Add to orignal role player dataframe
df = pd.concat([df, df_eight], ignore_index=True)

#Check result
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
267,Al Harrington,PF,981,16,13.5,5.6,1.7,14.3,44.4,72.7,...,0,0,0,0,0,0,0,0,0,0
268,Meyers Leonard,PF,456,10,5.6,3.9,0.9,12.0,48.2,81.2,...,0,0,0,0,0,0,0,0,0,0
269,Nate Robinson,PG,618,11,11.0,2.3,3.0,15.6,42.3,79.6,...,0,0,0,0,0,0,0,0,0,0
270,Corey Maggette,SF,827,14,16.0,4.9,2.1,17.9,45.3,82.2,...,0,0,0,0,0,0,0,0,0,0
271,Gerald Henderson,PG,871,13,8.9,1.7,3.6,13.4,47.2,77.6,...,0,0,0,0,0,3,0,0,0,0


In [8]:
df.shape

(272, 21)

In [9]:
#Save to csv file
df.to_csv("NBA Role Players.csv", index=False)

In [11]:
#Paste player links
player_urls = [
    "https://www.basketball-reference.com/players/o/obryapa01.html", #Patrick O'Bryant
    "https://www.basketball-reference.com/players/c/crittja01.html", #Javaris Crittenton
    "https://www.basketball-reference.com/players/m/mayse01.html", #Sean May
    "https://www.basketball-reference.com/players/m/mooremi01.html", #Mikki Moore
    "https://www.basketball-reference.com/players/t/tsakaja01.html", #Jake Tsakalidis
    "https://www.basketball-reference.com/players/g/grayaa01.html", #Aaron Grey
    "https://www.basketball-reference.com/players/a/armsthi01.html", #Hilton Armstrong
    "https://www.basketball-reference.com/players/t/turiaro01.html", #Ronny Turiaf
    "https://www.basketball-reference.com/players/n/nestera01.html", #Rasho Nesterović
    "https://www.basketball-reference.com/players/k/krstine01.html", #Nenad Krstić
    "https://www.basketball-reference.com/players/d/dampier01.html", #Erick Dampier
    "https://www.basketball-reference.com/players/m/moisoje01.html", #Jérôme Moïso
    "https://www.basketball-reference.com/players/m/mbengdj01.html", #DJ Mbenga
    "https://www.basketball-reference.com/players/a/arroyca01.html", #Carlos Arroyo
    "https://www.basketball-reference.com/players/p/pavloal01.html", #Sasha Pavlović
    "https://www.basketball-reference.com/players/i/iveyro01.html", #Royal Ivey
    "https://www.basketball-reference.com/players/w/westma02.html", #Mario West
    "https://www.basketball-reference.com/players/j/jackja01.html", #Jarrett Jack
    "https://www.basketball-reference.com/players/m/murraro01.html", #Ronald Murray
    "https://www.basketball-reference.com/players/b/brookaa01.html", #Aaron Brooks
    "https://www.basketball-reference.com/players/d/douglto01.html", #Toney Douglas
    "https://www.basketball-reference.com/players/h/howarju01.html", #Juwan Howard
    "https://www.basketball-reference.com/players/r/ratlith01.html", #Theo Ratcliff
    "https://www.basketball-reference.com/players/j/johnsav01.html", #Avery Johnson
    "https://www.basketball-reference.com/players/b/bradlav01.html", #Avery Bradely
    "https://www.basketball-reference.com/players/j/johnsan02.html", #Anthony Johnson
    "https://www.basketball-reference.com/players/j/jonesda01.html", #Damon Jones
    "https://www.basketball-reference.com/players/g/gillke01.html", #Kendall Gill
    "https://www.basketball-reference.com/players/n/nunnke01.html", #Kendrick Nunn
    "https://www.basketball-reference.com/players/g/gatlich01.html", #Chris Gatling
    "https://www.basketball-reference.com/players/j/jamesmi01.html", #Mike James
    "https://www.basketball-reference.com/players/h/houseed01.html", #Eddie House
    "https://www.basketball-reference.com/players/d/davisri01.html", #Ricky Davis
    "https://www.basketball-reference.com/players/g/goodedr01.html", #Drew Gooden
    "https://www.basketball-reference.com/players/m/murraro01.html" #Ronald Murray
]

#Initialize empty list to store player data
all_player_data = []

#Begin for loop to iterate throguh each player link
for url in player_urls:
        #Initialize response
        response = requests.get(url)
    #Start scraping
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')

            try:
                #Get player name
                name = soup.find('h1').find('span').text
                
                #Get Career Length
                career_length = int(soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text.strip().split()[0])

                #Get Position and use if-elif-else statement to assign position abbrevations 
                position = soup.find('strong', string= lambda s: s and 'Position:' in s).next_sibling.text.strip().split(',')[0]
                if 'Point Guard' in position:
                    position = 'PG'
                elif 'Shooting Guard' in position:
                    position = 'SG'
                elif 'Small Forward' in position:
                    position = 'SF'
                elif 'Power Forward' in position:
                    position = 'PF'
                elif 'Center' in position:
                    position = 'C'
                else:
                    position = None
                
                #Get stats such as games played, ppg, rpg, apg, etc
                def stat_finder(word): #Define function 
                    element = soup.find('span', {'data-tip': word})
                    if element:
                        element = element.find_next('p').find_next('p').text.strip()
                        return element
                    else:
                        return 0.0
                games = int(stat_finder('Games'))
                ppg = float(stat_finder('Points'))
                rpg = float(stat_finder('Total Rebounds'))
                apg = float(stat_finder('Assists'))
                fg = float(stat_finder('Field Goal Percentage'))
                ft = float(stat_finder('Free Throw Percentage'))
                
                #Extract advanced stats
                per = float(soup.find('strong', string= lambda s: s and 'PER' in s).find_next('p').find_next('p').text.strip())
                win_shares = float(soup.find('strong', string= lambda s: s and 'WS' in s).find_next('p').find_next('p').text.strip())

                #Extract awards: mvps, all stars, all nba, etc
                all_stars_find = soup.find('li', {'class':'all_star'})
                if all_stars_find:
                    all_stars = int(all_stars_find.text.strip().split('x')[0])
                else: 
                    all_stars = 0
                all_nba_find = soup.find('li', string= lambda s: s and ('All-NBA' in s or 'All-ABA' in s or 'All-BAA' in s))
                if all_nba_find:
                    all_nba_text = all_nba_find.text
                    if 'x' in all_nba_text:
                        all_nba = int(all_nba_text.strip().split('x')[0])
                    else:
                        all_nba = 1
                else:
                    all_nba = 0
                all_rookie_find = soup.find('li', string = lambda s: s and 'All-Rookie' in s)
                if all_rookie_find:
                    all_rookie = 1
                else:
                    all_rookie = 0
                roy = soup.find('li', {'data-tip': lambda x: x and 'ROY' in x})
                if roy:
                    roy = 1
                else:
                    roy = 0
                all_defensive = soup.find('a', string= lambda s: s and 'All-Defensive' in s)
                if all_defensive:
                    all_defensive_text = all_defensive.text
                    if 'x' in all_defensive_text:
                        all_defensive = int(all_defensive_text.strip().split('x')[0])
                    else:
                        all_defensive = 1
                else:
                    all_defensive = 0
                mvp_find = soup.find('li', {'class':'poptip'}, string = lambda s: s and 'MVP' in s and 'AS MVP' not in s and 'IST MVP' not in s 
                                     and 'Finals MVP' not in s and 'ECF MVP' not in s and 'WCF MVP' not in s and 'AS' not in s
                                     and 'MBWA' not in s and 'USBWA MVP' not in s)
                if mvp_find:
                    mvp_text = mvp_find.text
                    if 'x' in mvp_text:
                        mvps = int(mvp_text.strip().split('x')[0])
                    else:
                        mvps = 1
                else:
                    mvps = 0
                chip_find = soup.find('a', string= lambda s: s and ('NBA Champ' in s or 'ABA Champ' in s or 'BAA Champ' in s)) #NBA Championship finder
                if chip_find:
                    chip_text = chip_find.text
                    if 'x' in chip_text:
                        chips = int(chip_text.strip().split('x')[0])
                    else:
                        chips = 1
                else:
                    chips = 0
                sc_find = soup.find('a', string= lambda s: s and 'Scoring Champ' in s) #Scoring champ finder
                if sc_find:
                    sc_text = sc_find.text
                    if 'x' in sc_text:
                        sc = int(sc_text.strip().split('x')[0])
                    else:
                        sc = 1
                else:
                    sc = 0
                dpoy_find = soup.find('a', string= lambda s: s and 'Def. POY' in s) #Defensive Player of the Year Finder
                if dpoy_find:
                    dpoy_text = dpoy_find.text
                    if 'x' in dpoy_text:
                        dpoy = int(dpoy_text.strip().split('x')[0])
                    else:
                        dpoy = 1
                else:
                    dpoy = 0
                #Set HOF to 0 since these players aren't in the HOF as of right now
                hof = 0

                #Store the data
                player_data = {
                        'Name': name,
                        'Position': position,
                        'Games': games,
                        'Career Length': career_length,
                        'PPG': ppg,
                        'RPG': rpg,
                        'APG': apg,
                        'PER': per,
                        'FG%': fg,
                        'FT%': ft,
                        'Win Shares': win_shares,
                        'All-Stars': all_stars,
                        'All-NBA': all_nba,
                        'All-Defense': all_defensive,
                        'All-Rookie Team': all_rookie,
                        'MVPs': mvps,
                        'Chips': chips,
                        'ROY': roy,
                        'DPOYs': dpoy,
                        'Scoring Champ': sc,
                        'HOF': hof
                    }
                #Append player data to list
                all_player_data.append(player_data)
            except Exception as e:
                print(f'Error Scraping data for {url}: {e}')

            #Avoid overwhelming server
            time.sleep(1)

#Create dataframe of player data
df_nine = pd.DataFrame(all_player_data)

In [13]:
df_nine.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
30,Mike James,PG,595,12,9.9,2.2,3.5,14.0,41.7,80.2,...,0,0,0,0,0,1,0,0,0,0
31,Eddie House,PG,717,11,7.5,1.7,1.6,13.2,40.9,85.1,...,0,0,0,0,0,1,0,0,0,0
32,Ricky Davis,SG,736,12,13.5,3.5,3.3,14.7,44.6,78.1,...,0,0,0,0,0,0,0,0,0,0
33,Drew Gooden,PF,790,14,11.0,7.1,1.1,16.3,46.2,76.0,...,0,0,0,1,0,0,0,0,0,0
34,Ronald Murray,PG,487,8,9.9,2.1,2.3,12.7,41.4,72.5,...,0,0,0,0,0,0,0,0,0,0


In [14]:
#Add to role player dataframe
df = pd.concat([df, df_nine], ignore_index=True)

#View result
df.tail()

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
302,Mike James,PG,595,12,9.9,2.2,3.5,14.0,41.7,80.2,...,0,0,0,0,0,1,0,0,0,0
303,Eddie House,PG,717,11,7.5,1.7,1.6,13.2,40.9,85.1,...,0,0,0,0,0,1,0,0,0,0
304,Ricky Davis,SG,736,12,13.5,3.5,3.3,14.7,44.6,78.1,...,0,0,0,0,0,0,0,0,0,0
305,Drew Gooden,PF,790,14,11.0,7.1,1.1,16.3,46.2,76.0,...,0,0,0,1,0,0,0,0,0,0
306,Ronald Murray,PG,487,8,9.9,2.1,2.3,12.7,41.4,72.5,...,0,0,0,0,0,0,0,0,0,0


In [15]:
#Check shape
df.shape

(307, 21)

In [16]:
#Check for duplicates
duplicates = df.duplicated()
duplicates.value_counts()

False    303
True       4
Name: count, dtype: int64

In [17]:
#Check duplicates
df[duplicates]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
255,Jared Jeffries,SF,629,11,4.8,4.1,1.3,9.6,42.6,58.3,...,0,0,0,0,0,0,0,0,0,0
282,Erick Dampier,C,987,16,7.4,7.1,0.8,14.3,49.8,62.6,...,0,0,0,0,0,0,0,0,0,0
295,Avery Johnson,PG,1054,16,8.4,1.7,5.5,14.5,47.9,70.6,...,0,0,0,0,0,1,0,0,0,0
306,Ronald Murray,PG,487,8,9.9,2.1,2.3,12.7,41.4,72.5,...,0,0,0,0,0,0,0,0,0,0


In [19]:
#Check Jared Jeffries
df[df['Name'].str.contains('Jared Jeff')]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
46,Jared Jeffries,SF,629,11,4.8,4.1,1.3,9.6,42.6,58.3,...,0,0,0,0,0,0,0,0,0,0
255,Jared Jeffries,SF,629,11,4.8,4.1,1.3,9.6,42.6,58.3,...,0,0,0,0,0,0,0,0,0,0


In [20]:
#Check Erick Dampier
df[df['Name'].str.contains('Erick Dampier')]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
179,Erick Dampier,C,987,16,7.4,7.1,0.8,14.3,49.8,62.6,...,0,0,0,0,0,0,0,0,0,0
282,Erick Dampier,C,987,16,7.4,7.1,0.8,14.3,49.8,62.6,...,0,0,0,0,0,0,0,0,0,0


In [21]:
#Check Avery Johnson
df[df['Name'].str.contains('Avery Johnson')]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
155,Avery Johnson,PG,1054,16,8.4,1.7,5.5,14.5,47.9,70.6,...,0,0,0,0,0,1,0,0,0,0
295,Avery Johnson,PG,1054,16,8.4,1.7,5.5,14.5,47.9,70.6,...,0,0,0,0,0,1,0,0,0,0


In [22]:
#Check Ronald Murray
df[df['Name'].str.contains('Ronald Murray')]

Unnamed: 0,Name,Position,Games,Career Length,PPG,RPG,APG,PER,FG%,FT%,...,All-Stars,All-NBA,All-Defense,All-Rookie Team,MVPs,Chips,ROY,DPOYs,Scoring Champ,HOF
290,Ronald Murray,PG,487,8,9.9,2.1,2.3,12.7,41.4,72.5,...,0,0,0,0,0,0,0,0,0,0
306,Ronald Murray,PG,487,8,9.9,2.1,2.3,12.7,41.4,72.5,...,0,0,0,0,0,0,0,0,0,0


In [23]:
#Drop duplicates and check result
df = df.drop_duplicates()

duplicates = df.duplicated()
duplicates.value_counts()

False    303
Name: count, dtype: int64

303 role players scraped.

In [24]:
#Save to a csv file
df.to_csv("NBA Role Players.csv", index=False)

In [13]:
url = "https://www.basketball-reference.com/players/r/rondora01.html"
#Initialize response
response = requests.get(url)
#Start scraping
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    try:
        #Scrape
        career_length = soup.find('strong', string= lambda s: s and ('Experience:' in s or 'Career Length' in s)).next_sibling.text

        print(career_length)
        career_span = soup.find('strong', string=('Career Length:' or 'Experience')).next_sibling.text
        print(career_span)
    except Exception as e:
        print(f'Error Scraping data for {url}: {e}')


 16 years
  

 16 years
  

