<a id="top"></a>

# This is Part 2 of my NBA web scraping example that I did [earlier](http://nbviewer.ipython.org/github/pybokeh/ipython_notebooks/blob/master/web_scraping/NBA_Player_Stats.ipynb).

In this part, I will be scraping the regular season player stats for <strong>ALL</strong> players.  This data is quite big compared to the tables in my part 1 of this series.  I also added a 3 second delay between requests to prevent timeout error.  [Here's](http://espn.go.com/nba/player/stats/_/id/1966/lebron-james) an example of regular season stats data for LeBron James.<br><br>
I re-wrote this script to use the [requests](http://docs.python-requests.org) library.

#### Quick Links

- [source code for scraping regular seaons avgs](#season_avgs)
- [source code for scraping regular season totals](#season_totals)
- [source code for scraping regular season misc totals](#season_misc)
- [query the NBA sqlite database using db.py](#db_py)
- [sqlite table definitions](#sqlite_tables)

### Let's get the urls for all the NBA teams

In [30]:
import requests # pip install requests
from bs4 import BeautifulSoup
import re

base_url = 'http://espn.go.com'

teams_url = 'http://espn.go.com/nba/teams'
html_teams = requests.get(teams_url)

soup_teams = BeautifulSoup(html_teams.text, 'lxml')

In [31]:
urls = soup_teams.find_all(href=re.compile('/nba/teams/stats'))
urls

[<a href="/nba/teams/stats?team=bos">Stats</a>,
 <a href="/nba/teams/stats?team=bkn">Stats</a>,
 <a href="/nba/teams/stats?team=nyk">Stats</a>,
 <a href="/nba/teams/stats?team=phi">Stats</a>,
 <a href="/nba/teams/stats?team=tor">Stats</a>,
 <a href="/nba/teams/stats?team=gsw">Stats</a>,
 <a href="/nba/teams/stats?team=lac">Stats</a>,
 <a href="/nba/teams/stats?team=lal">Stats</a>,
 <a href="/nba/teams/stats?team=pho">Stats</a>,
 <a href="/nba/teams/stats?team=sac">Stats</a>,
 <a href="/nba/teams/stats?team=chi">Stats</a>,
 <a href="/nba/teams/stats?team=cle">Stats</a>,
 <a href="/nba/teams/stats?team=det">Stats</a>,
 <a href="/nba/teams/stats?team=ind">Stats</a>,
 <a href="/nba/teams/stats?team=mil">Stats</a>,
 <a href="/nba/teams/stats?team=dal">Stats</a>,
 <a href="/nba/teams/stats?team=hou">Stats</a>,
 <a href="/nba/teams/stats?team=mem">Stats</a>,
 <a href="/nba/teams/stats?team=nor">Stats</a>,
 <a href="/nba/teams/stats?team=sas">Stats</a>,
 <a href="/nba/teams/stats?team=atl">Sta

### But I want the full URLs and without the markup junk

In [32]:
team_urls = [base_url+url['href'] for url in urls]
team_urls

['http://espn.go.com/nba/teams/stats?team=bos',
 'http://espn.go.com/nba/teams/stats?team=bkn',
 'http://espn.go.com/nba/teams/stats?team=nyk',
 'http://espn.go.com/nba/teams/stats?team=phi',
 'http://espn.go.com/nba/teams/stats?team=tor',
 'http://espn.go.com/nba/teams/stats?team=gsw',
 'http://espn.go.com/nba/teams/stats?team=lac',
 'http://espn.go.com/nba/teams/stats?team=lal',
 'http://espn.go.com/nba/teams/stats?team=pho',
 'http://espn.go.com/nba/teams/stats?team=sac',
 'http://espn.go.com/nba/teams/stats?team=chi',
 'http://espn.go.com/nba/teams/stats?team=cle',
 'http://espn.go.com/nba/teams/stats?team=det',
 'http://espn.go.com/nba/teams/stats?team=ind',
 'http://espn.go.com/nba/teams/stats?team=mil',
 'http://espn.go.com/nba/teams/stats?team=dal',
 'http://espn.go.com/nba/teams/stats?team=hou',
 'http://espn.go.com/nba/teams/stats?team=mem',
 'http://espn.go.com/nba/teams/stats?team=nor',
 'http://espn.go.com/nba/teams/stats?team=sas',
 'http://espn.go.com/nba/teams/stats?tea

### Now, let's get the urls for all players for a particular team

In [33]:
import requests # pip install requests
from bs4 import BeautifulSoup
import re

team_url = 'http://espn.go.com/nba/team/stats/_/name/cle/cleveland-cavaliers'
html_team = requests.get(team_url)

soup_team = BeautifulSoup(html_team.text, 'lxml')

In [34]:
html_rows = soup_team.find_all('tr', class_=re.compile('player'))

### Now let's look at the URLs

In [35]:
for row in html_rows:
    print(row.a['href'])

http://espn.go.com/nba/player/_/id/1966/lebron-james
http://espn.go.com/nba/player/_/id/6442/kyrie-irving
http://espn.go.com/nba/player/_/id/3449/kevin-love
http://espn.go.com/nba/player/_/id/6628/dion-waiters
http://espn.go.com/nba/player/_/id/2419/anderson-varejao
http://espn.go.com/nba/player/_/id/6474/tristan-thompson
http://espn.go.com/nba/player/_/id/510/shawn-marion
http://espn.go.com/nba/player/_/id/2489716/matthew-dellavedova
http://espn.go.com/nba/player/_/id/2009/james-jones
http://espn.go.com/nba/player/_/id/558/mike-miller
http://espn.go.com/nba/player/_/id/2528794/joe-harris
http://espn.go.com/nba/player/_/id/1000/brendan-haywood
http://espn.go.com/nba/player/_/id/2489897/will-cherry
http://espn.go.com/nba/player/_/id/4010/a.j.-price
http://espn.go.com/nba/player/_/id/3041/lou-amundson
http://espn.go.com/nba/player/_/id/2528355/alex-kirk
http://espn.go.com/nba/player/_/id/1966/lebron-james
http://espn.go.com/nba/player/_/id/6442/kyrie-irving
http://espn.go.com/nba/player/

### The problem with the URLs above is that they are the URLs to the shortened version of each players stats page.  To get the complete stats page, we need to slightly modify the URLs by replacing the underscore character with 'stats/_'

In [36]:
for row in html_rows:
    print(row.a['href'].replace('_','stats/_'))

http://espn.go.com/nba/player/stats/_/id/1966/lebron-james
http://espn.go.com/nba/player/stats/_/id/6442/kyrie-irving
http://espn.go.com/nba/player/stats/_/id/3449/kevin-love
http://espn.go.com/nba/player/stats/_/id/6628/dion-waiters
http://espn.go.com/nba/player/stats/_/id/2419/anderson-varejao
http://espn.go.com/nba/player/stats/_/id/6474/tristan-thompson
http://espn.go.com/nba/player/stats/_/id/510/shawn-marion
http://espn.go.com/nba/player/stats/_/id/2489716/matthew-dellavedova
http://espn.go.com/nba/player/stats/_/id/2009/james-jones
http://espn.go.com/nba/player/stats/_/id/558/mike-miller
http://espn.go.com/nba/player/stats/_/id/2528794/joe-harris
http://espn.go.com/nba/player/stats/_/id/1000/brendan-haywood
http://espn.go.com/nba/player/stats/_/id/2489897/will-cherry
http://espn.go.com/nba/player/stats/_/id/4010/a.j.-price
http://espn.go.com/nba/player/stats/_/id/3041/lou-amundson
http://espn.go.com/nba/player/stats/_/id/2528355/alex-kirk
http://espn.go.com/nba/player/stats/_/id

### Now let's make a list containing the URLs of each player

In [37]:
player_urls = [row.a['href'].replace('_','stats/_') for row in html_rows]

In [38]:
player_urls

['http://espn.go.com/nba/player/stats/_/id/1966/lebron-james',
 'http://espn.go.com/nba/player/stats/_/id/6442/kyrie-irving',
 'http://espn.go.com/nba/player/stats/_/id/3449/kevin-love',
 'http://espn.go.com/nba/player/stats/_/id/6628/dion-waiters',
 'http://espn.go.com/nba/player/stats/_/id/2419/anderson-varejao',
 'http://espn.go.com/nba/player/stats/_/id/6474/tristan-thompson',
 'http://espn.go.com/nba/player/stats/_/id/510/shawn-marion',
 'http://espn.go.com/nba/player/stats/_/id/2489716/matthew-dellavedova',
 'http://espn.go.com/nba/player/stats/_/id/2009/james-jones',
 'http://espn.go.com/nba/player/stats/_/id/558/mike-miller',
 'http://espn.go.com/nba/player/stats/_/id/2528794/joe-harris',
 'http://espn.go.com/nba/player/stats/_/id/1000/brendan-haywood',
 'http://espn.go.com/nba/player/stats/_/id/2489897/will-cherry',
 'http://espn.go.com/nba/player/stats/_/id/4010/a.j.-price',
 'http://espn.go.com/nba/player/stats/_/id/3041/lou-amundson',
 'http://espn.go.com/nba/player/stats/_

### I can grab the player's ID which is embedded in the URL also

In [39]:
player_id = player_urls[0].split('/')[8]  # the id is the 8th token when splitting the url up
player_id

'1966'

### So let's snag all the stats for LeBron James

In [4]:
player_url = 'http://espn.go.com/nba/player/stats/_/id/1966/lebron-james'
html_player = requests.get(player_url)

soup_player = BeautifulSoup(html_player.text, 'lxml')

### Need to get player's name so that we can include it in the tables we will be creating

In [15]:
soup_name = soup_player.find('meta', property='og:title')
player_name = soup_name['content']
player_name

'LeBron James'

In [16]:
regular_season_stats = soup_player.find_all('tr', class_=re.compile('row'))

In [17]:
len(regular_season_stats)

36

### Here's what the raw soup data looks like

In [18]:
regular_season_stats

[<tr class="oddrow"><td>'03-'04</td><td><ul class="game-schedule"><li class="team-logo-small logo-nba-small nba-small-5"><a href="http://espn.go.com/nba/team/_/name/cle/cleveland-cavaliers"></a></li><li class="team-name"><a href="http://espn.go.com/nba/team/_/name/cle/cleveland-cavaliers">CLE</a></li></ul></td><td>79</td><td>79</td><td style="text-align:right;">39.5</td><td style="text-align:right;">7.9-18.9</td><td style="text-align:right;">.417</td><td style="text-align:right;">0.8-2.7</td><td style="text-align:right;">.290</td><td style="text-align:right;">4.4-5.8</td><td style="text-align:right;">.754</td><td style="text-align:right;">1.3</td><td style="text-align:right;">4.2</td><td style="text-align:right;">5.5</td><td style="text-align:right;">5.9</td><td style="text-align:right;">0.7</td><td style="text-align:right;">1.6</td><td style="text-align:right;">1.9</td><td style="text-align:right;">3.5</td><td style="text-align:right;">20.9</td></tr>,
 <tr class="evenrow"><td>'04-'05<

### But let's just get the data that we want and print them out to make sure it is what we want

In [19]:
for stat in regular_season_stats:
    print(stat.get_text())

'03-'04CLE797939.57.9-18.9.4170.8-2.7.2904.4-5.8.7541.34.25.55.90.71.61.93.520.9
'04-'05CLE808042.49.9-21.1.4721.4-3.9.3516.0-8.0.7501.46.07.47.20.72.21.83.327.2
'05-'06CLE797942.511.1-23.1.4801.6-4.8.3357.6-10.3.7381.06.17.06.60.81.62.33.331.4
'06-'07CLE787840.99.9-20.8.4761.3-4.0.3196.3-9.0.6981.15.76.76.00.71.62.23.227.3
'07-'08CLE757440.410.6-21.9.4841.5-4.8.3157.3-10.3.7121.86.17.97.21.11.82.23.430.0
'08-'09CLE818137.79.7-19.9.4891.6-4.7.3447.3-9.4.7801.36.37.67.21.11.71.73.028.4
'09-'10CLE767639.010.1-20.1.5031.7-5.1.3337.8-10.2.7670.96.47.38.61.01.61.63.429.7
'10-'11MIA797938.89.6-18.8.5101.2-3.5.3306.4-8.4.7591.06.57.57.00.61.62.13.626.7
'11-'12MIA626237.510.0-18.9.5310.9-2.4.3626.2-8.1.7711.56.47.96.20.81.91.53.427.1
'12-'13MIA767637.910.1-17.8.5651.4-3.3.4065.3-7.0.7531.36.88.07.30.91.71.43.026.8
'13-'14MIA777737.710.0-17.6.5671.5-4.0.3795.7-7.6.7501.15.96.96.40.31.61.63.527.1
'14-'15CLE292937.58.8-18.1.4881.7-4.5.3695.9-7.9.7430.74.65.37.60.81.31.73.825.2
'03-'04CLE622-1492.

#### The regular season stats above actually contains 3 sections of data: regular season averages, regular season totals, and regular season misc totals.  We can isolate them with simple Python slicing.  You can check out LeBron's [page](http://espn.go.com/nba/player/stats/_/id/1966/lebron-james) to see what I'm talking about.

### Since LeBron has participated in 12 seasons and there are 3 sets of stats, we will break up this data into 3 partitions.

In [20]:
size = int(len(regular_season_stats)/3)  # LeBron's has participated in 12 seasons
size

12

### I'm going to use Python's slice() method to create named slices or named partitions, just for sake of readability.

In [21]:
season_avgs_slice = slice(0,size)
season_totals_slice = slice(size,size*2)
season_misc_totals_slice = slice(size*2,size*3)

In [22]:
regular_season_avgs = regular_season_stats[season_avgs_slice]
regular_season_totals = regular_season_stats[season_totals_slice]
regular_season_misc_totals = regular_season_stats[season_misc_totals_slice]

### Now that we have partitions setup, let's inspect each one

In [23]:
for stat in regular_season_avgs:
    print(stat.get_text())

'03-'04CLE797939.57.9-18.9.4170.8-2.7.2904.4-5.8.7541.34.25.55.90.71.61.93.520.9
'04-'05CLE808042.49.9-21.1.4721.4-3.9.3516.0-8.0.7501.46.07.47.20.72.21.83.327.2
'05-'06CLE797942.511.1-23.1.4801.6-4.8.3357.6-10.3.7381.06.17.06.60.81.62.33.331.4
'06-'07CLE787840.99.9-20.8.4761.3-4.0.3196.3-9.0.6981.15.76.76.00.71.62.23.227.3
'07-'08CLE757440.410.6-21.9.4841.5-4.8.3157.3-10.3.7121.86.17.97.21.11.82.23.430.0
'08-'09CLE818137.79.7-19.9.4891.6-4.7.3447.3-9.4.7801.36.37.67.21.11.71.73.028.4
'09-'10CLE767639.010.1-20.1.5031.7-5.1.3337.8-10.2.7670.96.47.38.61.01.61.63.429.7
'10-'11MIA797938.89.6-18.8.5101.2-3.5.3306.4-8.4.7591.06.57.57.00.61.62.13.626.7
'11-'12MIA626237.510.0-18.9.5310.9-2.4.3626.2-8.1.7711.56.47.96.20.81.91.53.427.1
'12-'13MIA767637.910.1-17.8.5651.4-3.3.4065.3-7.0.7531.36.88.07.30.91.71.43.026.8
'13-'14MIA777737.710.0-17.6.5671.5-4.0.3795.7-7.6.7501.15.96.96.40.31.61.63.527.1
'14-'15CLE292937.58.8-18.1.4881.7-4.5.3695.9-7.9.7430.74.65.37.60.81.31.73.825.2


In [24]:
for stat in regular_season_totals:
    print(stat.get_text())

'03-'04CLE622-1492.41763-217.290347-460.75499333432465581301492731654
'04-'05CLE795-1684.472108-308.351477-636.750111477588577521771462622175
'05-'06CLE875-1823.480127-379.335601-814.73875481556521661231812602478
'06-'07CLE772-1621.47699-310.319489-701.69883443526470551251712502132
'07-'08CLE794-1642.484113-359.315549-771.712133459592539811381652552250
'08-'09CLE789-1613.489132-384.344594-762.780106507613587931371392412304
'09-'10CLE768-1528.503129-387.333593-773.76771483554651771251192612258
'10-'11MIA758-1485.51092-279.330503-663.75980510590554501241632842111
'11-'12MIA621-1169.53154-149.362387-502.7719439849238750115962131683
'12-'13MIA765-1354.565103-254.406403-535.75397513610551671291102262036
'13-'14MIA767-1353.567116-306.379439-585.75081452533489261211262702089
'14-'15CLE256-525.48848-130.369171-230.74319134153221233950111731


In [25]:
for stat in regular_season_misc_totals:
    print(stat.get_text())

'03-'04CLE12000201.700.4834.321.109.438
'04-'05CLE25410402.200.6847.091.292.504
'05-'06CLE21500002.000.4750.091.359.515
'06-'07CLE16110221.880.5044.081.315.507
'07-'08CLE31710202.110.5450.681.370.518
'08-'09CLE297001002.440.5749.971.428.530
'09-'10CLE31400402.490.4852.661.478.545
'10-'11MIA31400701.950.4446.731.422.541
'11-'12MIA23000301.820.5447.861.440.554
'12-'13MIA36400612.440.5750.301.504.603
'13-'14MIA12110511.810.4547.071.544.610
'14-'15CLE11000201.990.3543.011.392.533


### Looks good.  So we'll create 3 lists for each set of stats:

- one for regular season average ("avgs")
- one for regular season totals ("totals")
- one for regular season misc totals ("misc_totals")

In [44]:
avgs = []
for row in regular_season_avgs:
    for data in row:
        avgs.append(data.get_text())

In [45]:
avgs

["'03-'04",
 'CLE',
 '79',
 '79',
 '39.5',
 '7.9-18.9',
 '.417',
 '0.8-2.7',
 '.290',
 '4.4-5.8',
 '.754',
 '1.3',
 '4.2',
 '5.5',
 '5.9',
 '0.7',
 '1.6',
 '1.9',
 '3.5',
 '20.9',
 "'04-'05",
 'CLE',
 '80',
 '80',
 '42.4',
 '9.9-21.1',
 '.472',
 '1.4-3.9',
 '.351',
 '6.0-8.0',
 '.750',
 '1.4',
 '6.0',
 '7.4',
 '7.2',
 '0.7',
 '2.2',
 '1.8',
 '3.3',
 '27.2',
 "'05-'06",
 'CLE',
 '79',
 '79',
 '42.5',
 '11.1-23.1',
 '.480',
 '1.6-4.8',
 '.335',
 '7.6-10.3',
 '.738',
 '1.0',
 '6.1',
 '7.0',
 '6.6',
 '0.8',
 '1.6',
 '2.3',
 '3.3',
 '31.4',
 "'06-'07",
 'CLE',
 '78',
 '78',
 '40.9',
 '9.9-20.8',
 '.476',
 '1.3-4.0',
 '.319',
 '6.3-9.0',
 '.698',
 '1.1',
 '5.7',
 '6.7',
 '6.0',
 '0.7',
 '1.6',
 '2.2',
 '3.2',
 '27.3',
 "'07-'08",
 'CLE',
 '75',
 '74',
 '40.4',
 '10.6-21.9',
 '.484',
 '1.5-4.8',
 '.315',
 '7.3-10.3',
 '.712',
 '1.8',
 '6.1',
 '7.9',
 '7.2',
 '1.1',
 '1.8',
 '2.2',
 '3.4',
 '30.0',
 "'08-'09",
 'CLE',
 '81',
 '81',
 '37.7',
 '9.7-19.9',
 '.489',
 '1.6-4.7',
 '.344',
 '7.3-9.4'

### So that we can merge this table with other tables associated with a player, I want to include the player's ID also in the list.

In [46]:
index = 0 # insert the player ID before the player's season
increment = 0
for row in range(len(regular_season_avgs)):
    avgs.insert(index + increment, player_id)
    index = index + 20  # There are 20 columns in the season avgs section
    increment = increment + 1

### Also want to insert player's name as the 2nd element in the list

In [47]:
index = 1 # insert the player's name after the player's ID
increment = 0
for row in range(len(regular_season_avgs)):
    avgs.insert(index + increment, player_name)
    index = index + 21  # There are 21 columns in the season avgs section since I've just added player ID
    increment = increment + 1

### Let's see if the player ID and player's name was added:

In [48]:
avgs

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '79',
 '79',
 '39.5',
 '7.9-18.9',
 '.417',
 '0.8-2.7',
 '.290',
 '4.4-5.8',
 '.754',
 '1.3',
 '4.2',
 '5.5',
 '5.9',
 '0.7',
 '1.6',
 '1.9',
 '3.5',
 '20.9',
 '1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '80',
 '80',
 '42.4',
 '9.9-21.1',
 '.472',
 '1.4-3.9',
 '.351',
 '6.0-8.0',
 '.750',
 '1.4',
 '6.0',
 '7.4',
 '7.2',
 '0.7',
 '2.2',
 '1.8',
 '3.3',
 '27.2',
 '1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '79',
 '79',
 '42.5',
 '11.1-23.1',
 '.480',
 '1.6-4.8',
 '.335',
 '7.6-10.3',
 '.738',
 '1.0',
 '6.1',
 '7.0',
 '6.6',
 '0.8',
 '1.6',
 '2.3',
 '3.3',
 '31.4',
 '1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '78',
 '78',
 '40.9',
 '9.9-20.8',
 '.476',
 '1.3-4.0',
 '.319',
 '6.3-9.0',
 '.698',
 '1.1',
 '5.7',
 '6.7',
 '6.0',
 '0.7',
 '1.6',
 '2.2',
 '3.2',
 '27.3',
 '1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '75',
 '74',
 '40.4',
 '10.6-21.9',
 '.484',
 '1.5-4.8',
 '.315',
 '7.3-10.3',
 '.712',
 '1.8',
 '6.1',
 '7.9',
 '7.2',
 '1.1',

### OK, so far so good

### Now, we'll do the same for the season totals and season misc totals

In [55]:
totals = []
for row in regular_season_totals:
    for data in row:
        totals.append(data.get_text())

In [50]:
totals

["'03-'04",
 'CLE',
 '622-1492',
 '.417',
 '63-217',
 '.290',
 '347-460',
 '.754',
 '99',
 '333',
 '432',
 '465',
 '58',
 '130',
 '149',
 '273',
 '1654',
 "'04-'05",
 'CLE',
 '795-1684',
 '.472',
 '108-308',
 '.351',
 '477-636',
 '.750',
 '111',
 '477',
 '588',
 '577',
 '52',
 '177',
 '146',
 '262',
 '2175',
 "'05-'06",
 'CLE',
 '875-1823',
 '.480',
 '127-379',
 '.335',
 '601-814',
 '.738',
 '75',
 '481',
 '556',
 '521',
 '66',
 '123',
 '181',
 '260',
 '2478',
 "'06-'07",
 'CLE',
 '772-1621',
 '.476',
 '99-310',
 '.319',
 '489-701',
 '.698',
 '83',
 '443',
 '526',
 '470',
 '55',
 '125',
 '171',
 '250',
 '2132',
 "'07-'08",
 'CLE',
 '794-1642',
 '.484',
 '113-359',
 '.315',
 '549-771',
 '.712',
 '133',
 '459',
 '592',
 '539',
 '81',
 '138',
 '165',
 '255',
 '2250',
 "'08-'09",
 'CLE',
 '789-1613',
 '.489',
 '132-384',
 '.344',
 '594-762',
 '.780',
 '106',
 '507',
 '613',
 '587',
 '93',
 '137',
 '139',
 '241',
 '2304',
 "'09-'10",
 'CLE',
 '768-1528',
 '.503',
 '129-387',
 '.333',
 '593-

### Again, I'm just inserting the player's ID into the list and then player's name

In [56]:
index = 0 # insert the player ID before the player's season
increment = 0
for row in range(len(regular_season_totals)):
    totals.insert(index + increment, player_id)
    index = index + 17 # There are 17 columns in the reg season totals section
    increment = increment + 1

In [57]:
index = 1 # insert the player's name after the player's ID
increment = 0
for row in range(len(regular_season_totals)):
    totals.insert(index + increment, player_name)
    index = index + 18 # There are now 18 columns in the reg season totals after inserting player's ID
    increment = increment + 1

In [58]:
totals

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '622-1492',
 '.417',
 '63-217',
 '.290',
 '347-460',
 '.754',
 '99',
 '333',
 '432',
 '465',
 '58',
 '130',
 '149',
 '273',
 '1654',
 '1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '795-1684',
 '.472',
 '108-308',
 '.351',
 '477-636',
 '.750',
 '111',
 '477',
 '588',
 '577',
 '52',
 '177',
 '146',
 '262',
 '2175',
 '1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '875-1823',
 '.480',
 '127-379',
 '.335',
 '601-814',
 '.738',
 '75',
 '481',
 '556',
 '521',
 '66',
 '123',
 '181',
 '260',
 '2478',
 '1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '772-1621',
 '.476',
 '99-310',
 '.319',
 '489-701',
 '.698',
 '83',
 '443',
 '526',
 '470',
 '55',
 '125',
 '171',
 '250',
 '2132',
 '1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '794-1642',
 '.484',
 '113-359',
 '.315',
 '549-771',
 '.712',
 '133',
 '459',
 '592',
 '539',
 '81',
 '138',
 '165',
 '255',
 '2250',
 '1966',
 'LeBron James',
 "'08-'09",
 'CLE',
 '789-1613',
 '.489',
 '132-384',
 '.344',
 '594-

In [59]:
misc_totals = []
for row in regular_season_misc_totals:
    for data in row:
        misc_totals.append(data.get_text())

In [60]:
misc_totals

["'03-'04",
 'CLE',
 '12',
 '0',
 '0',
 '0',
 '2',
 '0',
 '1.70',
 '0.48',
 '34.32',
 '1.109',
 '.438',
 "'04-'05",
 'CLE',
 '25',
 '4',
 '1',
 '0',
 '4',
 '0',
 '2.20',
 '0.68',
 '47.09',
 '1.292',
 '.504',
 "'05-'06",
 'CLE',
 '21',
 '5',
 '0',
 '0',
 '0',
 '0',
 '2.00',
 '0.47',
 '50.09',
 '1.359',
 '.515',
 "'06-'07",
 'CLE',
 '16',
 '1',
 '1',
 '0',
 '2',
 '2',
 '1.88',
 '0.50',
 '44.08',
 '1.315',
 '.507',
 "'07-'08",
 'CLE',
 '31',
 '7',
 '1',
 '0',
 '2',
 '0',
 '2.11',
 '0.54',
 '50.68',
 '1.370',
 '.518',
 "'08-'09",
 'CLE',
 '29',
 '7',
 '0',
 '0',
 '10',
 '0',
 '2.44',
 '0.57',
 '49.97',
 '1.428',
 '.530',
 "'09-'10",
 'CLE',
 '31',
 '4',
 '0',
 '0',
 '4',
 '0',
 '2.49',
 '0.48',
 '52.66',
 '1.478',
 '.545',
 "'10-'11",
 'MIA',
 '31',
 '4',
 '0',
 '0',
 '7',
 '0',
 '1.95',
 '0.44',
 '46.73',
 '1.422',
 '.541',
 "'11-'12",
 'MIA',
 '23',
 '0',
 '0',
 '0',
 '3',
 '0',
 '1.82',
 '0.54',
 '47.86',
 '1.440',
 '.554',
 "'12-'13",
 'MIA',
 '36',
 '4',
 '0',
 '0',
 '6',
 '1',
 '2.44

### Again, I am inserting the player's ID into this list.

In [61]:
index = 0 # insert the player ID before the player's season
increment = 0
for row in range(len(regular_season_misc_totals)):
    misc_totals.insert(index + increment, player_id)
    index = index + 13 # There are 13 columns in the reg season misc totals section
    increment = increment + 1

In [63]:
index = 1 # insert the player's name after the player's ID
increment = 0
for row in range(len(regular_season_misc_totals)):
    misc_totals.insert(index + increment, player_name)
    index = index + 14 # There are now 14 columns in the reg season misc totals after inserting player ID
    increment = increment + 1

In [64]:
misc_totals

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '12',
 '0',
 '0',
 '0',
 '2',
 '0',
 '1.70',
 '0.48',
 '34.32',
 '1.109',
 '.438',
 '1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '25',
 '4',
 '1',
 '0',
 '4',
 '0',
 '2.20',
 '0.68',
 '47.09',
 '1.292',
 '.504',
 '1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '21',
 '5',
 '0',
 '0',
 '0',
 '0',
 '2.00',
 '0.47',
 '50.09',
 '1.359',
 '.515',
 '1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '16',
 '1',
 '1',
 '0',
 '2',
 '2',
 '1.88',
 '0.50',
 '44.08',
 '1.315',
 '.507',
 '1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '31',
 '7',
 '1',
 '0',
 '2',
 '0',
 '2.11',
 '0.54',
 '50.68',
 '1.370',
 '.518',
 '1966',
 'LeBron James',
 "'08-'09",
 'CLE',
 '29',
 '7',
 '0',
 '0',
 '10',
 '0',
 '2.44',
 '0.57',
 '49.97',
 '1.428',
 '.530',
 '1966',
 'LeBron James',
 "'09-'10",
 'CLE',
 '31',
 '4',
 '0',
 '0',
 '4',
 '0',
 '2.49',
 '0.48',
 '52.66',
 '1.478',
 '.545',
 '1966',
 'LeBron James',
 "'10-'11",
 'MIA',
 '31',
 '4',
 '0',
 '0',
 '7',
 '0',
 '1.95

### As in my earlier example, I need to group each season's data in its own list.  Found this generator solution on [SO](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python).

In [66]:
# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in range(0, len(l), n):
        yield l[i:i+n]

In [67]:
from pprint import pprint

for row in chunks(avgs,22):
    pprint(row)

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '79',
 '79',
 '39.5',
 '7.9-18.9',
 '.417',
 '0.8-2.7',
 '.290',
 '4.4-5.8',
 '.754',
 '1.3',
 '4.2',
 '5.5',
 '5.9',
 '0.7',
 '1.6',
 '1.9',
 '3.5',
 '20.9']
['1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '80',
 '80',
 '42.4',
 '9.9-21.1',
 '.472',
 '1.4-3.9',
 '.351',
 '6.0-8.0',
 '.750',
 '1.4',
 '6.0',
 '7.4',
 '7.2',
 '0.7',
 '2.2',
 '1.8',
 '3.3',
 '27.2']
['1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '79',
 '79',
 '42.5',
 '11.1-23.1',
 '.480',
 '1.6-4.8',
 '.335',
 '7.6-10.3',
 '.738',
 '1.0',
 '6.1',
 '7.0',
 '6.6',
 '0.8',
 '1.6',
 '2.3',
 '3.3',
 '31.4']
['1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '78',
 '78',
 '40.9',
 '9.9-20.8',
 '.476',
 '1.3-4.0',
 '.319',
 '6.3-9.0',
 '.698',
 '1.1',
 '5.7',
 '6.7',
 '6.0',
 '0.7',
 '1.6',
 '2.2',
 '3.2',
 '27.3']
['1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '75',
 '74',
 '40.4',
 '10.6-21.9',
 '.484',
 '1.5-4.8',
 '.315',
 '7.3-10.3',
 '.712',
 '1.8',
 '6.1',
 '7.9',
 '7.2',
 '1.1',

In [68]:
for row in chunks(totals,19):
    pprint(row)

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '622-1492',
 '.417',
 '63-217',
 '.290',
 '347-460',
 '.754',
 '99',
 '333',
 '432',
 '465',
 '58',
 '130',
 '149',
 '273',
 '1654']
['1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '795-1684',
 '.472',
 '108-308',
 '.351',
 '477-636',
 '.750',
 '111',
 '477',
 '588',
 '577',
 '52',
 '177',
 '146',
 '262',
 '2175']
['1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '875-1823',
 '.480',
 '127-379',
 '.335',
 '601-814',
 '.738',
 '75',
 '481',
 '556',
 '521',
 '66',
 '123',
 '181',
 '260',
 '2478']
['1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '772-1621',
 '.476',
 '99-310',
 '.319',
 '489-701',
 '.698',
 '83',
 '443',
 '526',
 '470',
 '55',
 '125',
 '171',
 '250',
 '2132']
['1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '794-1642',
 '.484',
 '113-359',
 '.315',
 '549-771',
 '.712',
 '133',
 '459',
 '592',
 '539',
 '81',
 '138',
 '165',
 '255',
 '2250']
['1966',
 'LeBron James',
 "'08-'09",
 'CLE',
 '789-1613',
 '.489',
 '132-384',
 '.344',
 '594-

In [69]:
for row in chunks(misc_totals,15):
    pprint(row)

['1966',
 'LeBron James',
 "'03-'04",
 'CLE',
 '12',
 '0',
 '0',
 '0',
 '2',
 '0',
 '1.70',
 '0.48',
 '34.32',
 '1.109',
 '.438']
['1966',
 'LeBron James',
 "'04-'05",
 'CLE',
 '25',
 '4',
 '1',
 '0',
 '4',
 '0',
 '2.20',
 '0.68',
 '47.09',
 '1.292',
 '.504']
['1966',
 'LeBron James',
 "'05-'06",
 'CLE',
 '21',
 '5',
 '0',
 '0',
 '0',
 '0',
 '2.00',
 '0.47',
 '50.09',
 '1.359',
 '.515']
['1966',
 'LeBron James',
 "'06-'07",
 'CLE',
 '16',
 '1',
 '1',
 '0',
 '2',
 '2',
 '1.88',
 '0.50',
 '44.08',
 '1.315',
 '.507']
['1966',
 'LeBron James',
 "'07-'08",
 'CLE',
 '31',
 '7',
 '1',
 '0',
 '2',
 '0',
 '2.11',
 '0.54',
 '50.68',
 '1.370',
 '.518']
['1966',
 'LeBron James',
 "'08-'09",
 'CLE',
 '29',
 '7',
 '0',
 '0',
 '10',
 '0',
 '2.44',
 '0.57',
 '49.97',
 '1.428',
 '.530']
['1966',
 'LeBron James',
 "'09-'10",
 'CLE',
 '31',
 '4',
 '0',
 '0',
 '4',
 '0',
 '2.49',
 '0.48',
 '52.66',
 '1.478',
 '.545']
['1966',
 'LeBron James',
 "'10-'11",
 'MIA',
 '31',
 '4',
 '0',
 '0',
 '7',
 '0',
 '1.95

### Below are sqlite-related code to insert the data into our sqlite database

In [70]:
import sqlite3

conn = sqlite3.connect('/home/pybokeh/databases/nba')
c = conn.cursor()

for data in chunks(avgs,22):
    try:
        c.execute('INSERT INTO regular_season_avgs VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
    except:
        pass
    conn.commit()
conn.close()

In [71]:
import sqlite3

conn = sqlite3.connect('/home/pybokeh/databases/nba')
c = conn.cursor()

for data in chunks(totals,19):
    try:
        c.execute('INSERT INTO regular_season_totals VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
    except:
        pass
    conn.commit()
conn.close()

In [72]:
import sqlite3

conn = sqlite3.connect('/home/pybokeh/databases/nba')
c = conn.cursor()

for data in chunks(misc_totals,15):
    try:
        c.execute('INSERT INTO regular_season_misc_totals VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
    except:
        pass
    conn.commit()
conn.close()

### This concludes the step-by-step explanations.

## Below is all the code above combined into 3 cells: one for each of the 3 tables we are populating into our sqlite database.  The difference in the all-in-one versions below compared to the step-by-step version above is I am using 2 for loops: one to loop through all the NBA teams' URLs and the other to loop through each of the player's URL.

<a id="season_avgs"></a>

## Populating the regular season averages table:

[[back to top]](#top)

In [27]:
import requests # pip install requests
from bs4 import BeautifulSoup
import sqlite3
import re
from datetime import datetime
import time

startTime = datetime.now()

# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in range(0, len(l), n):
        yield l[i:i+n]

base_url = 'http://espn.go.com'

teams_url = 'http://espn.go.com/nba/teams'
html_teams = requests.get(teams_url)

soup_teams = BeautifulSoup(html_teams.text, 'lxml')
urls = soup_teams.find_all(href=re.compile('/nba/teams/stats'))

team_urls = [base_url+url['href'] for url in urls]

for team in team_urls:
    html_team = requests.get(team)
    soup_team = BeautifulSoup(html_team.text, 'lxml')
    html_rows = soup_team.find_all('tr', class_=re.compile('player'))
    
    player_urls = [row.a['href'].replace('_','stats/_') for row in html_rows]
    
    for player in player_urls:
        time.sleep(3)  # added delay to prevent timeout error
        player_id   = player.split('/')[8]
        html_player = requests.get(player)
        soup_player = BeautifulSoup(html_player.text, 'lxml')
        
        soup_name = soup_player.find('meta', property='og:title')
        player_name = soup_name['content']
        
        regular_season_stats = soup_player.find_all('tr', class_=re.compile('row'))
        
        size = int(len(regular_season_stats)/3)
        
        season_avgs_slice        = slice(0,size)
        #season_totals_slice      = slice(size,size*2)
        #season_misc_totals_slice = slice(size*2,size*3)
        
        regular_season_avgs = regular_season_stats[season_avgs_slice]
        #regular_season_totals = regular_season_stats[season_totals_slice]
        #regular_season_misc_totals = regular_season_stats[season_misc_totals_slice]
        
        avgs = []
        for row in regular_season_avgs:
            for data in row:
                avgs.append(data.get_text())
                
        index = 0 # insert the player ID before the player's season
        increment = 0
        for row in range(len(regular_season_avgs)):
            avgs.insert(index + increment, player_id)
            index = index + 20  # There are 20 columns in the season avgs section
            increment = increment + 1
            
        index = 1 # insert the player's name after the player's ID
        increment = 0
        for row in range(len(regular_season_avgs)):
            avgs.insert(index + increment, player_name)
            index = index + 21  # There are 21 columns in the season avgs section since I've just added player ID
            increment = increment + 1

        conn = sqlite3.connect('/home/pybokeh/databases/nba')
        c = conn.cursor()

        for data in chunks(avgs,22):
            try:
                c.execute('INSERT INTO regular_season_avgs VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
            except:
                pass
            conn.commit()
        conn.close()
        
print(datetime.now() - startTime)

0:15:28.872399


<a id="season_totals"></a>

## Populating the regular season totals table:

[[back to top]](#top)

In [None]:
import requests # pip install requests
from bs4 import BeautifulSoup
import sqlite3
import re
from datetime import datetime
import time

startTime = datetime.now()

# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in range(0, len(l), n):
        yield l[i:i+n]

base_url = 'http://espn.go.com'

teams_url = 'http://espn.go.com/nba/teams'
html_teams = requests.get(teams_url)

soup_teams = BeautifulSoup(html_teams.text, 'lxml')
urls = soup_teams.find_all(href=re.compile('/nba/teams/stats'))

team_urls = [base_url+url['href'] for url in urls]

for team in team_urls:
    html_team = requests.get(team)
    soup_team = BeautifulSoup(html_team.text, 'lxml')
    html_rows = soup_team.find_all('tr', class_=re.compile('player'))
    
    player_urls = [row.a['href'].replace('_','stats/_') for row in html_rows]
    
    for player in player_urls:
        time.sleep(3)  # added delay to prevent time out error
        player_id   = player.split('/')[8]
        html_player = requests.get(player)
        soup_player = BeautifulSoup(html_player.text, 'lxml')
        
        soup_name = soup_player.find('meta', property='og:title')
        player_name = soup_name['content']
        
        regular_season_stats = soup_player.find_all('tr', class_=re.compile('row'))
        
        size = int(len(regular_season_stats)/3)
        
        #season_avgs_slice        = slice(0,size)
        season_totals_slice      = slice(size,size*2)
        #season_misc_totals_slice = slice(size*2,size*3)
        
        #regular_season_avgs = regular_season_stats[season_avgs_slice]
        regular_season_totals = regular_season_stats[season_totals_slice]
        #regular_season_misc_totals = regular_season_stats[season_misc_totals_slice]
        
        totals = []
        for row in regular_season_totals:
            for data in row:
                totals.append(data.get_text())
                
        index = 0 # insert the player ID before the player's season
        increment = 0
        for row in range(len(regular_season_totals)):
            totals.insert(index + increment, player_id)
            index = index + 17  # There are 17 columns in the season totals section
            increment = increment + 1
            
        index = 1 # insert the player's name after the player's ID
        increment = 0
        for row in range(len(regular_season_totals)):
            totals.insert(index + increment, player_name)
            index = index + 18 # There are now 18 columns in the reg season totals after inserting player's ID
            increment = increment + 1

        conn = sqlite3.connect('/home/pybokeh/databases/nba')
        c = conn.cursor()

        for data in chunks(totals,19):
            try:
                c.execute('INSERT INTO regular_season_totals VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
            except:
                pass
            conn.commit()
        conn.close()
        
print(datetime.now() - startTime)

<a id="season_misc"></a>

## Populating regular season misc totals table:

[[back to top]](#top)

In [None]:
import requests # pip install requests
from bs4 import BeautifulSoup
import sqlite3
import re
from datetime import datetime
import time

startTime = datetime.now()

# http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in range(0, len(l), n):
        yield l[i:i+n]

base_url = 'http://espn.go.com'

teams_url = 'http://espn.go.com/nba/teams'
html_teams = requests.get(teams_url)

soup_teams = BeautifulSoup(html_teams.text, 'lxml')
urls = soup_teams.find_all(href=re.compile('/nba/teams/stats'))

team_urls = [base_url+url['href'] for url in urls]

for team in team_urls:
    html_team = requests.get(team)
    soup_team = BeautifulSoup(html_team.text, 'lxml')
    html_rows = soup_team.find_all('tr', class_=re.compile('player'))
    
    player_urls = [row.a['href'].replace('_','stats/_') for row in html_rows]
    
    for player in player_urls:
        time.sleep(3)  # added delay to prevent time out error
        player_id   = player.split('/')[8]
        html_player = requests.get(player)
        soup_player = BeautifulSoup(html_player.text, 'lxml')
        
        soup_name = soup_player.find('meta', property='og:title')
        player_name = soup_name['content']
        
        regular_season_stats = soup_player.find_all('tr', class_=re.compile('row'))
        
        size = int(len(regular_season_stats)/3)
        
        #season_avgs_slice        = slice(0,size)
        #season_totals_slice      = slice(size,size*2)
        season_misc_totals_slice = slice(size*2,size*3)
        
        #regular_season_avgs = regular_season_stats[season_avgs_slice]
        #regular_season_totals = regular_season_stats[season_totals_slice]
        regular_season_misc_totals = regular_season_stats[season_misc_totals_slice]
        
        misc_totals = []
        for row in regular_season_misc_totals:
            for data in row:
                misc_totals.append(data.get_text())
                
        index = 0 # insert the player ID before the player's season
        increment = 0
        for row in range(len(regular_season_misc_totals)):
            misc_totals.insert(index + increment, player_id)
            index = index + 13  # There are 13 columns in the season misc totals section
            increment = increment + 1
            
        index = 1 # insert the player's name after the player's ID
        increment = 0
        for row in range(len(regular_season_misc_totals)):
            misc_totals.insert(index + increment, player_name)
            index = index + 14 # There are now 14 columns in the reg season misc totals after inserting player ID
            increment = increment + 1

        conn = sqlite3.connect('/home/pybokeh/databases/nba')
        c = conn.cursor()

        for data in chunks(misc_totals,15):
            try:
                c.execute('INSERT INTO regular_season_misc_totals VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)', data)
            except:
                pass
            conn.commit()
        conn.close()
        
print(datetime.now() - startTime)

<a id="db_py"></a>

## Now let's take a look at our sqlite database using Yhat's [db.py](http://blog.yhathq.com/posts/introducing-db-py.html)

[[back to top]](#top)

In [55]:
from db import DB
import pandas as pd

db = DB(filename="/home/pybokeh/databases/nba", dbtype="sqlite")

Indexing schema. This will take a second...finished!
Refreshing schema. Please wait...done!


In [56]:
db.tables

Table,Columns
player_game_stats,"id, name_pos, team_name, GP, GS, MIN, PPG, OFFR, DEFR, RPG, APG, SPG, BPG, TPG, FPG, A2TO, PER"
player_shooting_stats,"id, name_pos, team_name, FGM, FGA, FG_Perc, 3PM, 3PA, 3P_Perc, FTM, FTA, FT_Perc , 2PM, 2PA, 2P_Perc, PPS, AFG_Perc"
regular_season_avgs,"id, player_name, season, team, GP, GS, MIN, FGM-A, FG_Perc, 3PM-A, 3P_Perc, FTM- A, FT_Perc, OR, DR, REB, AST, BLK, STL, PF, TO, PTS"
regular_season_misc_totals,"id, player_name, season, team, DBLDBL, TRIDBL, DQ, EJECT, TECH, FLAG, AST2TO, ST L2TO, RAT, SCEFF, SHEFF"
regular_season_totals,"id, player_name, season, team, FGM-A, FG_Perc, 3PM-A, 3P_Perc, FTM-A, FT_Perc, O R, DR, REB, AST, BLK, STL, PF, TO, PTS"


### let's configure IPython notebook display options so that we can make sure we can view our results.

In [57]:
pd.set_option("display.max_columns",50)
pd.set_option("display.max_rows",999)

### I'll be querying or filtering based on player's name.  But what if I don't know what the player name columns are spelled as?

db.py has a useful find_column() method, although I hope in future versions, we can chain find_column() and find_table() together.

In [77]:
db.find_column("*name*")

Table,Column Name,Type
player_game_stats,name_pos,TEXT
player_game_stats,team_name,TEXT
player_shooting_stats,name_pos,TEXT
player_shooting_stats,team_name,TEXT
regular_season_avgs,player_name,TEXT
regular_season_misc_totals,player_name,TEXT
regular_season_totals,player_name,TEXT


### Below is the SQL needed to get LeBron's regular season averages stats:

In [74]:
sql = """
select *

from regular_season_avgs

where
player_name like '%LeBron%'
"""

### So below are LeBron's regular season averages stats:

In [75]:
db.query(sql)

Unnamed: 0,id,player_name,season,team,GP,GS,MIN,FGM-A,FG_Perc,3PM-A,3P_Perc,FTM-A,FT_Perc,OR,DR,REB,AST,BLK,STL,PF,TO,PTS
0,1966,LeBron James,'03-'04,CLE,79,79,39.5,7.9-18.9,0.417,0.8-2.7,0.29,4.4-5.8,0.754,1.3,4.2,5.5,5.9,0.7,1.6,1.9,3.5,20.9
1,1966,LeBron James,'04-'05,CLE,80,80,42.4,9.9-21.1,0.472,1.4-3.9,0.351,6.0-8.0,0.75,1.4,6.0,7.4,7.2,0.7,2.2,1.8,3.3,27.2
2,1966,LeBron James,'05-'06,CLE,79,79,42.5,11.1-23.1,0.48,1.6-4.8,0.335,7.6-10.3,0.738,1.0,6.1,7.0,6.6,0.8,1.6,2.3,3.3,31.4
3,1966,LeBron James,'06-'07,CLE,78,78,40.9,9.9-20.8,0.476,1.3-4.0,0.319,6.3-9.0,0.698,1.1,5.7,6.7,6.0,0.7,1.6,2.2,3.2,27.3
4,1966,LeBron James,'07-'08,CLE,75,74,40.4,10.6-21.9,0.484,1.5-4.8,0.315,7.3-10.3,0.712,1.8,6.1,7.9,7.2,1.1,1.8,2.2,3.4,30.0
5,1966,LeBron James,'08-'09,CLE,81,81,37.7,9.7-19.9,0.489,1.6-4.7,0.344,7.3-9.4,0.78,1.3,6.3,7.6,7.2,1.1,1.7,1.7,3.0,28.4
6,1966,LeBron James,'09-'10,CLE,76,76,39.0,10.1-20.1,0.503,1.7-5.1,0.333,7.8-10.2,0.767,0.9,6.4,7.3,8.6,1.0,1.6,1.6,3.4,29.7
7,1966,LeBron James,'10-'11,MIA,79,79,38.8,9.6-18.8,0.51,1.2-3.5,0.33,6.4-8.4,0.759,1.0,6.5,7.5,7.0,0.6,1.6,2.1,3.6,26.7
8,1966,LeBron James,'11-'12,MIA,62,62,37.5,10.0-18.9,0.531,0.9-2.4,0.362,6.2-8.1,0.771,1.5,6.4,7.9,6.2,0.8,1.9,1.5,3.4,27.1
9,1966,LeBron James,'12-'13,MIA,76,76,37.9,10.1-17.8,0.565,1.4-3.3,0.406,5.3-7.0,0.753,1.3,6.8,8.0,7.3,0.9,1.7,1.4,3.0,26.8


### Now let's get LeBron's regular season totals.  Below is the SQL to get that:

In [70]:
sql = """
select *

from regular_season_totals

where
player_name like '%LeBron%'
"""

In [71]:
db.query(sql)

Unnamed: 0,id,player_name,season,team,FGM-A,FG_Perc,3PM-A,3P_Perc,FTM-A,FT_Perc,OR,DR,REB,AST,BLK,STL,PF,TO,PTS
0,1966,LeBron James,'03-'04,CLE,622-1492,0.417,63-217,0.29,347-460,0.754,99,333,432,465,58,130,149,273,1654
1,1966,LeBron James,'04-'05,CLE,795-1684,0.472,108-308,0.351,477-636,0.75,111,477,588,577,52,177,146,262,2175
2,1966,LeBron James,'05-'06,CLE,875-1823,0.48,127-379,0.335,601-814,0.738,75,481,556,521,66,123,181,260,2478
3,1966,LeBron James,'06-'07,CLE,772-1621,0.476,99-310,0.319,489-701,0.698,83,443,526,470,55,125,171,250,2132
4,1966,LeBron James,'07-'08,CLE,794-1642,0.484,113-359,0.315,549-771,0.712,133,459,592,539,81,138,165,255,2250
5,1966,LeBron James,'08-'09,CLE,789-1613,0.489,132-384,0.344,594-762,0.78,106,507,613,587,93,137,139,241,2304
6,1966,LeBron James,'09-'10,CLE,768-1528,0.503,129-387,0.333,593-773,0.767,71,483,554,651,77,125,119,261,2258
7,1966,LeBron James,'10-'11,MIA,758-1485,0.51,92-279,0.33,503-663,0.759,80,510,590,554,50,124,163,284,2111
8,1966,LeBron James,'11-'12,MIA,621-1169,0.531,54-149,0.362,387-502,0.771,94,398,492,387,50,115,96,213,1683
9,1966,LeBron James,'12-'13,MIA,765-1354,0.565,103-254,0.406,403-535,0.753,97,513,610,551,67,129,110,226,2036


### Below is the SQL to get LeBron's regular season misc totals:

In [72]:
sql = """
select *

from regular_season_misc_totals

where
player_name like '%LeBron%'
"""

In [73]:
db.query(sql)

Unnamed: 0,id,player_name,season,team,DBLDBL,TRIDBL,DQ,EJECT,TECH,FLAG,AST2TO,STL2TO,RAT,SCEFF,SHEFF
0,1966,LeBron James,'03-'04,CLE,12,0,0,0,2,0,1.7,0.48,34.32,1.109,0.438
1,1966,LeBron James,'04-'05,CLE,25,4,1,0,4,0,2.2,0.68,47.09,1.292,0.504
2,1966,LeBron James,'05-'06,CLE,21,5,0,0,0,0,2.0,0.47,50.09,1.359,0.515
3,1966,LeBron James,'06-'07,CLE,16,1,1,0,2,2,1.88,0.5,44.08,1.315,0.507
4,1966,LeBron James,'07-'08,CLE,31,7,1,0,2,0,2.11,0.54,50.68,1.37,0.518
5,1966,LeBron James,'08-'09,CLE,29,7,0,0,10,0,2.44,0.57,49.97,1.428,0.53
6,1966,LeBron James,'09-'10,CLE,31,4,0,0,4,0,2.49,0.48,52.66,1.478,0.545
7,1966,LeBron James,'10-'11,MIA,31,4,0,0,7,0,1.95,0.44,46.73,1.422,0.541
8,1966,LeBron James,'11-'12,MIA,23,0,0,0,3,0,1.82,0.54,47.86,1.44,0.554
9,1966,LeBron James,'12-'13,MIA,36,4,0,0,6,1,2.44,0.57,50.3,1.504,0.603


## Sweet!  Checking my results and comparing them to LeBron's [site](http://espn.go.com/nba/player/stats/_/id/1966/lebron-james), it looks like my scraping worked!

<a id="sqlite_tables"></a>

# Below are the sqlite table definitions for the tables I created in this example

[[back to top]](#top)

In [None]:
CREATE TABLE "player_game_stats" (
    "id" INTEGER PRIMARY KEY NOT NULL,
    "name_pos" TEXT NOT NULL,
    "team_name" TEXT NOT NULL,
    "GP" INTEGER NOT NULL,
    "GS" INTEGER NOT NULL,
    "MIN" REAL NOT NULL,
    "PPG" REAL NOT NULL,
    "OFFR" REAL NOT NULL,
    "DEFR" REAL NOT NULL,
    "RPG" REAL NOT NULL,
    "APG" REAL NOT NULL,
    "SPG" REAL NOT NULL,
    "BPG" REAL NOT NULL,
    "TPG" REAL NOT NULL,
    "FPG" REAL NOT NULL,
    "A2TO" REAL NOT NULL,
    "PER" REAL NOT NULL
);

CREATE TABLE "player_shooting_stats" (
    "id" INTEGER PRIMARY KEY NOT NULL,
    "name_pos" TEXT NOT NULL,
    "team_name" TEXT NOT NULL,
    "FGM" REAL NOT NULL,
    "FGA" REAL NOT NULL,
    "FG_Perc" REAL NOT NULL,
    "3PM" REAL NOT NULL,
    "3PA" REAL NOT NULL,
    "3P_Perc" REAL NOT NULL,
    "FTM" REAL NOT NULL,
    "FTA" REAL NOT NULL,
    "FT_Perc" REAL NOT NULL,
    "2PM" REAL NOT NULL,
    "2PA" REAL NOT NULL,
    "2P_Perc" REAL NOT NULL,
    "PPS" REAL NOT NULL,
    "AFG_Perc" REAL NOT NULL
);

CREATE TABLE "regular_season_avgs" (
    "id" INTEGER NOT NULL,
    "player_name" TEXT NOT NULL,
    "season" TEXT NOT NULL,
    "team" TEXT NOT NULL,
    "GP" INTEGER NOT NULL,
    "GS" INTEGER NOT NULL,
    "MIN" REAL NOT NULL,
    "FGM-A" TEXT NOT NULL,
    "FG_Perc" REAL NOT NULL,
    "3PM-A" TEXT NOT NULL,
    "3P_Perc" REAL NOT NULL,
    "FTM-A" TEXT NOT NULL,
    "FT_Perc" REAL NOT NULL,
    "OR" REAL NOT NULL,
    "DR" REAL NOT NULL,
    "REB" REAL NOT NULL,
    "AST" REAL NOT NULL,
    "BLK" REAL NOT NULL,
    "STL" REAL NOT NULL,
    "PF" REAL NOT NULL,
    "TO" REAL NOT NULL,
    "PTS" REAL NOT NULL,
    unique ("id", "season","team")
);

CREATE TABLE "regular_season_totals" (
    "id" INTEGER NOT NULL,
    "player_name" TEXT NOT,
    "season" TEXT NOT NULL,
    "team" TEXT NOT NULL,
    "FGM-A" TEXT NOT NULL,
    "FG_Perc" REAL NOT NULL,
    "3PM-A" TEXT NOT NULL,
    "3P_Perc" REAL NOT NULL,
    "FTM-A" TEXT NOT NULL,
    "FT_Perc" REAL NOT NULL,
    "OR" INTEGER NOT NULL,
    "DR" INTEGER NOT NULL,
    "REB" INTEGER NOT NULL,
    "AST" INTEGER NOT NULL,
    "BLK" INTEGER NOT NULL,
    "STL" INTEGER NOT NULL,
    "PF" INTEGER NOT NULL,
    "TO" INTEGER NOT NULL,
    "PTS" INTEGER NOT NULL,
    unique ("id","season","team")
);

CREATE TABLE "regular_season_misc_totals" (
    "id" INTEGER NOT NULL,
    "player_name" TEXT NOT NULL,
    "season" TEXT NOT NULL,
    "team" TEXT NOT NULL,
    "DBLDBL" INTEGER NOT NULL,
    "TRIDBL" INTEGER NOT NULL,
    "DQ" INTEGER NOT NULL,
    "EJECT" INTEGER NOT NULL,
    "TECH" INTEGER NOT NULL,
    "FLAG" INTEGER NOT NULL,
    "AST2TO" REAL NOT NULL,
    "STL2TO" REAL NOT NULL,
    "RAT" REAL NOT NULL,
    "SCEFF" REAL NOT NULL,
    "SHEFF" REAL NOT NULL,
    unique("id","season","team")
);