# Scraping Player Efficiency Rating (PER) from ESPN

In [130]:
df = pd.read_html('http://insider.espn.com/nba/hollinger/statistics/_/qualified/false', header=1)[0]

In [131]:
df.head()
# This gives us a df of the first page of players, with sub tables we don't want (every time df.player == 'PLAYER')
# We need a list of links for each page of PER ratings (there are 12 total pages)
# once we have the list of links, we can loop through 

Unnamed: 0,RK,PLAYER,GP,MPG,TS%,AST,TO,USG,ORR,DRR,REBR,PER,VA,EWA
0,1,"Ahmad Caver, IND",1,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,78.83,1.0,0.0
1,2,"Sekou Doumbouya, LAL",2,8.0,0.717,0.0,17.0,0.0,0.0,0.0,0.0,41.23,0.0,0.0
2,3,"Joe Johnson, BOS",1,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,39.82,0.9,0.0
3,4,"Nikola Jokic, DEN",48,32.9,0.662,25.4,11.7,0.0,0.0,0.0,0.0,33.21,533.1,17.8
4,5,"Giannis Antetokounmpo, MIL",47,32.5,0.623,18.5,10.2,0.0,0.0,0.0,0.0,32.16,0.0,0.0


### Next steps: scraping all 12 pages of PER ratings
* The above gives us a df of the first page of players, with sub tables we don't want (every time df.player == 'PLAYER')
* We need a list of links for each page of PER ratings (there are 12 total pages)
* Once we have the list of links, we can loop through that list and: 
    * Append the table for each page
    * Remove the supurfluous rows 

In [132]:
# page 1 link: http://insider.espn.com/nba/hollinger/statistics/_/qualified/false
# page 2 link: http://insider.espn.com/nba/hollinger/statistics/_/page/2/qualified/false
# page formula: 'http://insider.espn.com/nba/hollinger/statistics/_/page/' + str(x) + '/qualified/false'

per_links = []
for x in range(1,13): 
    my_url = 'http://insider.espn.com/nba/hollinger/statistics/_/page/' + str(x) + '/qualified/false'
    per_links.append(my_url)

len(per_links)

12

### Now we have our list of links. Let's loop through that list and create a df from each link's content**

In [133]:
df = []
for link in per_links: 
    my_df = pd.read_html(link, header=1)[0]
    df.append(my_df)


In [134]:
df_per = pd.concat(df)
df_per.shape

(639, 14)

### Great! now we have our df of PER values for every single player in the NBA for the 2021-22 season! All 591 of them.

Now, let's remove those redundant rows where the column names are repeated

In [135]:
df_per.columns = df_per.columns.str.lower().str.replace(' ', '_')
i = df_per[df_per.player == 'PLAYER'].index
df_per = df_per.drop(i)
df_per.shape

(591, 14)

^ That seems much nicer, doesn't it?

### Final data cleaning
* Here, we are going to split the player column after the comma and add another column for 'team'
* After that, we'll take a look at some key players to get some intial insights on PER across all teams --> Check out the analysis folder for this!

In [136]:
df_per[['player', 'team']] = df_per['player'].str.split(',', expand=True)

In [137]:
df_per.head()

#nice!

Unnamed: 0,rk,player,gp,mpg,ts%,ast,to,usg,orr,drr,rebr,per,va,ewa,team
0,1,Ahmad Caver,1,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,78.83,1.0,0.0,IND
1,2,Sekou Doumbouya,2,8.0,0.717,0.0,17.0,0.0,0.0,0.0,0.0,41.23,0.0,0.0,LAL
2,3,Joe Johnson,1,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,39.82,0.9,0.0,BOS
3,4,Nikola Jokic,48,32.9,0.662,25.4,11.7,0.0,0.0,0.0,0.0,33.21,533.1,17.8,DEN
4,5,Giannis Antetokounmpo,47,32.5,0.623,18.5,10.2,0.0,0.0,0.0,0.0,32.16,0.0,0.0,MIL


In [122]:
#df_per.to_csv('per_21_22.csv', index=False)

**Now we have a nice, clean csv with all the player efficiency ratings for all of the NBA players during the 2021-22 season**

# Making a dataframe + CSV of team effciency 

In [45]:
df_offeff = pd.read_html('http://www.espn.com/nba/hollinger/teamstats', header=1)[0]
#yay

In [46]:
df_offeff

Unnamed: 0,RK,TEAM,PACE,AST,TO,ORR,DRR,REBR,EFF FG%,TS%,OFF EFF,DEF EFF
0,1.0,Utah,100.3,16.8,12.9,24.1,0.0,52.1,55.7,59.1,113.4,106.7
1,2.0,Atlanta,99.8,18.3,11.3,22.6,0.0,50.4,53.9,57.6,112.2,111.4
2,3.0,Phoenix,100.5,19.4,11.7,23.4,0.0,51.3,54.3,57.6,111.9,103.3
3,4.0,Chicago,100.5,18.3,11.9,20.6,0.0,49.5,54.5,58.2,111.5,109.2
4,5.0,Milwaukee,100.7,17.3,12.2,23.3,0.0,51.4,54.0,57.3,110.8,106.8
5,6.0,Miami,97.1,19.2,13.7,25.4,0.0,52.2,54.0,57.7,110.5,105.2
6,7.0,Memphis,101.4,17.8,11.4,29.9,0.0,53.7,51.8,54.8,110.2,105.7
7,8.0,Denver,99.0,20.0,12.8,21.3,0.0,50.4,54.8,57.8,110.0,108.7
8,9.0,Golden State,99.9,19.8,13.9,23.3,0.0,52.4,55.1,58.1,109.6,101.2
9,10.0,Brooklyn,100.6,18.4,12.5,23.6,0.0,50.4,53.1,56.8,109.4,108.5


In [47]:
#df_offeff.to_csv('team_efficiency_21_22.csv', index=False)