# Command Line Interface (CLI) & Data collection

## GUI? CLI?

- **Graphical User Interface (GUI)**:  
    interaction via graphical objects  
    e.g., Microsoft Windows and Apple OS X

- **Command Line Interface (CLI)**:  
    interaction via commands typed into shell  
    e.g., bash, zsh, tcsh, etc.

- Shell is often accessed by terminal [Terminal in Jupyter]    

- GUI is simple to use everyday but not easy to automate repetitive tasks with

- CLI is more cumbersome to use everyday but scriptable

## Basic Shell Usage

- Common shell commands for interactions outside programming environment
    - Downloading files from a URL
    - Inspect, search, and replace text in files
    - Chaining commands together for sequential processing

- Shell and IPython (Jupyter notebook)
    - Reading shell command output into python variable
    - Passing python string back to shell command
    - IRS zip code data example: parsing website, extracting URL, and downloading all files

- Accessing NBA data
    - Understanding GET URL structure
    - JSON data format
    - Reading JSON data into python
    - Creating Pandas data frame

## Shell commands

### Commonly used commands for text files

- `cat`: prints content of a file
- `head`: prints first few lines of a file
- `sed`: (stream editor) changes texts
- `paste`: pasts text files side-by-side
- `cut`: processes columns in delimited text file
- `find`: searches file system
- `grep`: searches text given regular expression pattern
- Many more!

### Anatomy of shell commands

Here is a simple shell command:

In [1]:
! cat --help

Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

  -A, --show-all           equivalent to -vET
  -b, --number-nonblank    number nonempty output lines, overrides -n
  -e                       equivalent to -vE
  -E, --show-ends          display $ at end of each line
  -n, --number             number all output lines
  -s, --squeeze-blank      suppress repeated empty output lines
  -t                       equivalent to -vT
  -T, --show-tabs          display TAB characters as ^I
  -u                       (ignored)
  -v, --show-nonprinting   use ^ and M- notation, except for LFD and TAB
      --help     display this help and exit
      --version  output version information and exit

Examples:
  cat f - g  Output f's contents, then standard input, then g's contents.
  cat        Copy standard input to standard output.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>


1. `cat`: program name

2. `[OPTION]`: controls program behavior

3. `[FILE]`: specify file to read from or standard input

### References to learn shell command line

- [Software Carpentry Lessons](https://software-carpentry.org/lessons/)

- [Unix Power Tools](https://ucsb-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=01UCSB_ALMA51295276690003776&context=L&vid=UCSB&search_scope=default_scope&tab=default_tab&lang=en_US)

- [Explain Shell](https://explainshell.com/)

# Example: Downloading Files

- URLs of files are directly visible (e.g., Github)

- `wget` is simple and effective download tool

- Example: https://github.com/fivethirtyeight/data

- "Raw" button is the URL for actual file

- Take the candy ratings data: https://github.com/fivethirtyeight/data/tree/master/candy-power-ranking

- `wget` can be used to download files to course jupyterhub

In [2]:
%%bash
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv

--2019-10-07 17:22:07--  https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5205 (5.1K) [text/plain]
Saving to: ‘candy-data.csv.2’

     0K .....                                                 100% 17.1M=0s

2019-10-07 17:22:07 (17.1 MB/s) - ‘candy-data.csv.2’ saved [5205/5205]



### Example: Viewing file contents 

In [3]:
%%bash
head candy-data.csv

competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
100 Grand,1,0,1,0,0,1,0,1,0,.73199999,.86000001,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,.60399997,.51099998,67.602936
One dime,0,0,0,0,0,0,0,0,0,.011,.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,.011,.51099998,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,.90600002,.51099998,52.341465
Almond Joy,1,0,0,1,0,0,0,1,0,.465,.76700002,50.347546
Baby Ruth,1,0,1,1,1,0,0,1,0,.60399997,.76700002,56.914547
Boston Baked Beans,0,0,0,1,0,0,0,0,1,.31299999,.51099998,23.417824
Candy Corn,0,0,0,0,0,0,0,0,1,.90600002,.32499999,38.010963


In [4]:
! head candy-data.csv ## also works

competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
100 Grand,1,0,1,0,0,1,0,1,0,.73199999,.86000001,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,.60399997,.51099998,67.602936
One dime,0,0,0,0,0,0,0,0,0,.011,.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,.011,.51099998,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,.90600002,.51099998,52.341465
Almond Joy,1,0,0,1,0,0,0,1,0,.465,.76700002,50.347546
Baby Ruth,1,0,1,1,1,0,0,1,0,.60399997,.76700002,56.914547
Boston Baked Beans,0,0,0,1,0,0,0,0,1,.31299999,.51099998,23.417824
Candy Corn,0,0,0,0,0,0,0,0,1,.90600002,.32499999,38.010963


In [5]:
! head -n 1 candy-data.csv  ## first line is the header

competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent


In [6]:
! wc -l candy-data.csv      ## counts lines in text file

86 candy-data.csv


In [19]:
! cut -d',' -f1,3 candy-data.csv    ## prints columns of delimited text

competitorname,fruity
100 Grand,0
3 Musketeers,0
One dime,0
One quarter,0
Air Heads,1
Almond Joy,0
Baby Ruth,0
Boston Baked Beans,0
Candy Corn,0
Caramel Apple Pops,1
Charleston Chew,0
Chewey Lemonhead Fruit Mix,1
Chiclets,1
Dots,1
Dum Dums,1
Fruit Chews,1
Fun Dip,1
Gobstopper,1
Haribo Gold Bears,1
Haribo Happy Cola,0
Haribo Sour Bears,1
Haribo Twin Snakes,1
HersheyÕs Kisses,0
HersheyÕs Krackel,0
HersheyÕs Milk Chocolate,0
HersheyÕs Special Dark,0
Jawbusters,1
Junior Mints,0
Kit Kat,0
Laffy Taffy,1
Lemonhead,1
Lifesavers big ring gummies,1
Peanut butter M&MÕs,0
M&MÕs,0
Mike & Ike,1
Milk Duds,0
Milky Way,0
Milky Way Midnight,0
Milky Way Simply Caramel,0
Mounds,0
Mr Good Bar,0
Nerds,1
Nestle Butterfinger,0
Nestle Crunch,0
Nik L Nip,1
Now & Later,1
Payday,0
Peanut M&Ms,0
Pixie Sticks,0
Pop Rocks,1
Red vines,1
ReeseÕs Miniatures,0
ReeseÕs Peanut Butter cup,0
ReeseÕs pieces,0
ReeseÕs stuffed with pieces,0
Ring pop,1
Rolo,0
Root Beer B

In [8]:
! grep 'Tootsie' candy-data.csv      ## finds lines with pattern (regular expression)

Tootsie Pop,1,1,0,0,0,0,1,0,0,.60399997,.32499999,48.982651
Tootsie Roll Juniors,1,0,0,0,0,0,0,0,0,.31299999,.51099998,43.068897
Tootsie Roll Midgies,1,0,0,0,0,0,0,0,1,.17399999,.011,45.736748
Tootsie Roll Snack Bars,1,0,0,0,0,0,0,1,0,.465,.32499999,49.653503


### Chaining commands togeter

- Commands can be chained together using "pipes"

- Many commands in the shell sends output to what is called "stdout" (essentially printing to screen)

- Pipe enable "stdout" to be input into another command via "stdin" (standard input).

- Hence, we can make commands such as the following

In [9]:
! head -n1 candy-data.csv

competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent


In [10]:
! head -n1 candy-data.csv | sed 's/,/\n/g'

competitorname
chocolate
fruity
caramel
peanutyalmondy
nougat
crispedricewafer
hard
bar
pluribus
sugarpercent
pricepercent
winpercent


In [11]:
! head -n1 candy-data.csv | sed 's/,/\n/g' | sed 's/chocolate/CHOCOLATE/g'

competitorname
CHOCOLATE
fruity
caramel
peanutyalmondy
nougat
crispedricewafer
hard
bar
pluribus
sugarpercent
pricepercent
winpercent


### Example: Text file download, search, and manipulation

Comands like `grep`, `sed` and `awk` enable on-the-fly text processing.

In [12]:
%%bash

wget -q -O - https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-zip-code-data-soi \
#     | grep 'zipcode.zip' \
#     | sed 's/<a data/\n<a data/g' \
#     | grep -Po '(?<=href=")[^"]*(?=")'

<!DOCTYPE html>
<html  lang="en" dir="ltr" prefix="content: http://purl.org/rss/1.0/modules/content/  dc: http://purl.org/dc/terms/  foaf: http://xmlns.com/foaf/0.1/  og: http://ogp.me/ns#  rdfs: http://www.w3.org/2000/01/rdf-schema#  schema: http://schema.org/  sioc: http://rdfs.org/sioc/ns#  sioct: http://rdfs.org/sioc/types#  skos: http://www.w3.org/2004/02/skos/core#  xsd: http://www.w3.org/2001/XMLSchema# ">
  <head>
    <meta charset="utf-8" /><script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(e,n,t){function r(t){if(!n[t]){var o=n[t]={exports:{}};e[t][0].call(o.exports,function(n){var o=e[t][1][n];return r(o||n)},o,o.exports)}return n[t].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<t.length;o++)r(t[o]);return r}({1:[function(e,n,t){function r(){}function o(e,n,t){return function(){return i(e,[c.now()].concat(u(arguments)),n?null:this,t),n?void 0:this}}var i=e("handle"),a=e(3),u=e(4),f=e("ee").get("tracer"),c=e("loader"),

## Shell and Jupyter

- Shell and Jupyter can be used together, and this becomes even more interesting.

- Grab a webpage,

- Extract all links,

- Filter file links that end with `zipcode.zip`, and

- Download all such files

In [13]:
files = !wget -q -O - https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-zip-code-data-soi | grep 'zipcode.zip' | sed 's/<a data/\n<a data/g' | grep -Po '(?<=href=")[^"]*(?=")'
files

['https://www.irs.gov/pub/irs-soi/1998zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2001zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2002zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2004zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2005zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2006zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2007zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2008zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2009zipcode.zip',
 'https://www.irs.gov/pub/irs-soi/2010zipcode.zip']

### Python variables into shell

In [14]:
for f in files[:3]:
    ! wget -nc {f}        ## pass python variables into shell!

File ‘1998zipcode.zip’ already there; not retrieving.

File ‘2001zipcode.zip’ already there; not retrieving.

File ‘2002zipcode.zip’ already there; not retrieving.



## Deciphering the NBA stats API

![](https://cdn.nba.net/nba-drupal-prod/styles/landscape_1045w/s3/2017-07/NBA%20Secondary%20Logo.jpg)

- NBA provides a nice website: [http://stat.nba.com](http://stat.nba.com)

- For example, in order to navigate to the shooting records for Stephen Curry, you navigate their menus to get to here:

> [http://stats.nba.com/player/201939/shooting/?Season=2017-18&SeasonType=Regular%20Season](http://stats.nba.com/player/201939/shooting/?Season=2017-18&SeasonType=Regular%20Season)

Here, our choices show up as parameters :
- Season: 2017-18
- SeasonType: Regular Season ([%20 is character code for space](https://en.wikipedia.org/wiki/Percent-encoding#Character_data))
- Player: 201939 (less obvious)

### GET method

- This URL uses [GET method](https://www.w3schools.com/tags/ref_httpmethods.asp)

- GET method passes parameters in the URL

- Long URLs are usually passing a series of variables and values to target page

- Sometimes cryptic: [https://www.google.com/maps/place/M+Special+Brewing+Company/@34.4302877,-119.8723167,15z/data=!4m5!3m4!1s0x80e940babfb897db:0x261e47c5399139d!8m2!3d34.4327838!4d-119.8685351](https://www.google.com/maps/place/M+Special+Brewing+Company/@34.4302877,-119.8723167,15z/data=!4m5!3m4!1s0x80e940babfb897db:0x261e47c5399139d!8m2!3d34.4327838!4d-119.8685351)

- Tools such as [online URL parser](https://www.freeformatter.com/url-parser-query-string-splitter.html) can decipher common format

- Try passing in the URL.

Knowledge of how web sites work is useful for data science since there is so much interaction through the web.

### Example: Collect all player information

- NBA doesn't officially publish their API (application programming interface); however,

- Community has reverse engineered it: e.g., https://github.com/swar/nba_api

- Scraping using `wget` is easy

In [1]:
useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""
playerurl = "\"http://stats.nba.com/stats/commonallplayers?LeagueID=00&Season=2017-18&IsOnlyCurrentSeason=1\""

# json_str = !wget -q -O - --user-agent={useragent} {playerurl}               # might not work google cloud

# !wget -q -O data/commonallplayers.json --user-agent={useragent} {playerurl} # download from another computer
json_str = !cat data/commonallplayers.json   # saved from earlier

- `playerurl`: url to download data from

- `useragent`: suitable string to imitate a browser. Websites can return browser-dependent content 

- NBA blocks programatic scraping of websites by simple use of `wget`; however,

- Specifying user agent string makes `wget` pretend that we are using a Mozilla-type browser on OS X

### Javascript Object Notation (JSON) format

- One of the widely used standards in data formats

- Usually plain text file with python dictionary-like formatting:  
    `{"key":"value"}`

- Can be nested:  
    `{"key0":{"key1":"value1", "key2":"value2"}}`

- In fact, Jupyter notebooks are in json format.

In [16]:
! head 03-Command-Line-and-Data-collection.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [


### Parsing JSON

- Raw JSON is in a string

- Needs to be parsed to Python dictionary: i.e., keys and values.

- Parse `json_str` string with the `json` module

In [17]:
json_str

['{"resource":"commonallplayers","parameters":{"LeagueID":"00","Season":"2017-18","IsOnlyCurrentSeason":1},"resultSets":[{"name":"CommonAllPlayers","headers":["PERSON_ID","DISPLAY_LAST_COMMA_FIRST","DISPLAY_FIRST_LAST","ROSTERSTATUS","FROM_YEAR","TO_YEAR","PLAYERCODE","TEAM_ID","TEAM_CITY","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CODE","GAMES_PLAYED_FLAG","OTHERLEAGUE_EXPERIENCE_CH"],"rowSet":[[203518,"Abrines, Alex","Alex Abrines",1,"2016","2018","alex_abrines",1610612760,"Oklahoma City","Thunder","OKC","thunder","Y","00"],[203112,"Acy, Quincy","Quincy Acy",1,"2012","2018","quincy_acy",1610612751,"Brooklyn","Nets","BKN","nets","Y","01"],[201167,"Afflalo, Arron","Arron Afflalo",1,"2007","2017","arron_afflalo",1610612753,"Orlando","Magic","ORL","magic","Y","00"],[201582,"Ajinca, Alexis","Alexis Ajinca",1,"2008","2017","alexis_ajinca",1610612740,"New Orleans","Pelicans","NOP","pelicans","Y","01"],[202332,"Aldrich, Cole","Cole Aldrich",1,"2010","2017","cole_aldrich",1610612750,"Minnesota","T

In [18]:
import json
data = json.loads(json_str[0])
data

{'resource': 'commonallplayers',
 'parameters': {'LeagueID': '00',
  'Season': '2017-18',
  'IsOnlyCurrentSeason': 1},
 'resultSets': [{'name': 'CommonAllPlayers',
   'headers': ['PERSON_ID',
    'DISPLAY_LAST_COMMA_FIRST',
    'DISPLAY_FIRST_LAST',
    'ROSTERSTATUS',
    'FROM_YEAR',
    'TO_YEAR',
    'PLAYERCODE',
    'TEAM_ID',
    'TEAM_CITY',
    'TEAM_NAME',
    'TEAM_ABBREVIATION',
    'TEAM_CODE',
    'GAMES_PLAYED_FLAG',
    'OTHERLEAGUE_EXPERIENCE_CH'],
   'rowSet': [[203518,
     'Abrines, Alex',
     'Alex Abrines',
     1,
     '2016',
     '2018',
     'alex_abrines',
     1610612760,
     'Oklahoma City',
     'Thunder',
     'OKC',
     'thunder',
     'Y',
     '00'],
    [203112,
     'Acy, Quincy',
     'Quincy Acy',
     1,
     '2012',
     '2018',
     'quincy_acy',
     1610612751,
     'Brooklyn',
     'Nets',
     'BKN',
     'nets',
     'Y',
     '01'],
    [201167,
     'Afflalo, Arron',
     'Arron Afflalo',
     1,
     '2007',
     '2017',
     'arron_a

In [19]:
data.keys() ## we specified 'resource' and 'parameters' 

dict_keys(['resource', 'parameters', 'resultSets'])

In [20]:
data['resultSets'][0].keys() ## 'resultSets' contain returned results

dict_keys(['name', 'headers', 'rowSet'])

In [21]:
data['resultSets'][0]

{'name': 'CommonAllPlayers',
 'headers': ['PERSON_ID',
  'DISPLAY_LAST_COMMA_FIRST',
  'DISPLAY_FIRST_LAST',
  'ROSTERSTATUS',
  'FROM_YEAR',
  'TO_YEAR',
  'PLAYERCODE',
  'TEAM_ID',
  'TEAM_CITY',
  'TEAM_NAME',
  'TEAM_ABBREVIATION',
  'TEAM_CODE',
  'GAMES_PLAYED_FLAG',
  'OTHERLEAGUE_EXPERIENCE_CH'],
 'rowSet': [[203518,
   'Abrines, Alex',
   'Alex Abrines',
   1,
   '2016',
   '2018',
   'alex_abrines',
   1610612760,
   'Oklahoma City',
   'Thunder',
   'OKC',
   'thunder',
   'Y',
   '00'],
  [203112,
   'Acy, Quincy',
   'Quincy Acy',
   1,
   '2012',
   '2018',
   'quincy_acy',
   1610612751,
   'Brooklyn',
   'Nets',
   'BKN',
   'nets',
   'Y',
   '01'],
  [201167,
   'Afflalo, Arron',
   'Arron Afflalo',
   1,
   '2007',
   '2017',
   'arron_afflalo',
   1610612753,
   'Orlando',
   'Magic',
   'ORL',
   'magic',
   'Y',
   '00'],
  [201582,
   'Ajinca, Alexis',
   'Alexis Ajinca',
   1,
   '2008',
   '2017',
   'alexis_ajinca',
   1610612740,
   'New Orleans',
   'Pelica

### Importing data into Pandas

In [22]:
import pandas as pd

h = data['resultSets'][0]['headers']
d = data['resultSets'][0]['rowSet']
players = pd.DataFrame(d, columns=h)
players.head()

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG,OTHERLEAGUE_EXPERIENCE_CH
0,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,OKC,thunder,Y,0
1,203112,"Acy, Quincy",Quincy Acy,1,2012,2018,quincy_acy,1610612751,Brooklyn,Nets,BKN,nets,Y,1
2,201167,"Afflalo, Arron",Arron Afflalo,1,2007,2017,arron_afflalo,1610612753,Orlando,Magic,ORL,magic,Y,0
3,201582,"Ajinca, Alexis",Alexis Ajinca,1,2008,2017,alexis_ajinca,1610612740,New Orleans,Pelicans,NOP,pelicans,Y,1
4,202332,"Aldrich, Cole",Cole Aldrich,1,2010,2017,cole_aldrich,1610612750,Minnesota,Timberwolves,MIN,timberwolves,Y,1


- What other data can we download using these types of URLS? [community documentation](https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation).

### Analyzing Shot Data

- Let's analyze [shot chart data](https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation#shotchartdetail)

- Test with browser: site kindly tells me [which parameters are required if none is passed](http://stats.nba.com/stats/shotchartdetail)

- First, download [team data](https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation#commonteamyears)

In [23]:
from urllib.parse import urlencode      ## urlencode builds parameter string for us
from urllib.request import urlretrieve

params = {'LeagueID':'00'}
teamurl = 'http://stats.nba.com/stats/commonTeamYears?' + urlencode(params)
# !wget -q -O - --user-agent={useragent} {teamurl}  # if NBA doesn't cooperate

### Scraping Function

Now that we know what a general request looks like, we can create a function to make our requests simpler.

The function will do the following:
1. Set User Agent
1. Set base URL with appropriate end point
1. Set parameters required for query
1. Read JSON string into python variable
1. Parse JSON string into python object
1. Convert the objects into pandas a data frame

In [24]:
def get_nba_data(endpt, params, return_url=False):

    ## endpt: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation
    ## params: dictionary of parameters: i.e., {'LeagueID':'00'}
    from pandas import DataFrame
    from urllib.parse import urlencode
    import json
    
    useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""

    dataurl = "\"" + "http://stats.nba.com/stats/" + endpt + "?" + urlencode(params) + "\""
    
    # for debugging: just return the url
    if return_url:
        return(dataurl)
    
    jsonstr = !wget -q -O - --user-agent={useragent} {dataurl} ## Note: ! doesn't work in plain Python
    
    data = json.loads(jsonstr[0])
    
    h = data['resultSets'][0]['headers']
    d = data['resultSets'][0]['rowSet']
    
    return(DataFrame(d, columns=h))

### Testing the Scraping Function: Team data

To see what URL string is returned, set `return_url=True`.

In [25]:
params = {'LeagueID':'00'}
get_nba_data('commonTeamYears', params, return_url=True)

'"http://stats.nba.com/stats/commonTeamYears?LeagueID=00"'

Function can also return Pandas data frame

In [26]:
params = {'LeagueID':'00'}

# teamdata = get_nba_data('commonTeamYears', params)    # might not work from Google cloud
# teamdata.to_pickle('data/commonTeamYears.pkl')

teamdata = pd.read_pickle('data/commonTeamYears.pkl')   # saved from earlier
teamdata.head()

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION
0,0,1610612739,1970,2019,CLE
1,0,1610612737,1949,2019,ATL
2,0,1610612738,1946,2019,BOS
3,0,1610612740,2002,2019,NOP
4,0,1610612741,1966,2019,CHI


### Testing the Scraping Function: Player data

- Endpoint is here: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation#commonallplayers

In [27]:
params = {'LeagueID':'00', 'Season': '2017-18', 'IsOnlyCurrentSeason': '0'}

# plyrdata = get_nba_data('commonallplayers', params) # if NBA doesn't cooperate
# plyrdata.to_pickle('data/commonallplayers.pkl')

plyrdata = pd.read_pickle('data/commonallplayers.pkl')     # saved from earlier
plyrdata.head()

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG,OTHERLEAGUE_EXPERIENCE_CH
0,76001,"Abdelnaby, Alaa",Alaa Abdelnaby,0,1990,1994,HISTADD_alaa_abdelnaby,0,,,,,Y,0
1,76002,"Abdul-Aziz, Zaid",Zaid Abdul-Aziz,0,1968,1977,HISTADD_zaid_abdul-aziz,0,,,,,Y,0
2,76003,"Abdul-Jabbar, Kareem",Kareem Abdul-Jabbar,0,1969,1988,HISTADD_kareem_abdul-jabbar,0,,,,,Y,0
3,51,"Abdul-Rauf, Mahmoud",Mahmoud Abdul-Rauf,0,1990,2000,mahmoud_abdul-rauf,0,,,,,Y,0
4,1505,"Abdul-Wahad, Tariq",Tariq Abdul-Wahad,0,1997,2003,tariq_abdul-wahad,0,,,,,Y,0


### Testing the Scraping Function: Shotchart data

In [28]:
params = {'PlayerID':'201935',
          'PlayerPosition':'',
          'Season':'2017-18',
          'ContextMeasure':'FGA',
          'DateFrom':'',
          'DateTo':'',
          'GameID':'',
          'GameSegment':'',
          'LastNGames':'0',
          'LeagueID':'00',
          'Location':'',
          'Month':'0',
          'OpponentTeamID':'0',
          'Outcome':'',
          'Period':'0',
          'Position':'',
          'RookieYear':'',
          'SeasonSegment':'',
          'SeasonType':'Regular Season',
          'TeamID':'0',
          'VsConference':'',
          'VsDivision':''}

# shotdata = get_nba_data('shotchartdetail', params) # if NBA doesn't cooperate
# shotdata.to_pickle('data/shotchartdetail.pkl')

shotdata = pd.read_pickle('data/shotchartdetail.pkl')     # saved from earlier
shotdata.head()

Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,...,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM
0,Shot Chart Detail,21700002,7,201935,James Harden,1610612745,Houston Rockets,1,11,47,...,Center(C),Less Than 8 ft.,1,-10,16,1,1,20171017,GSW,HOU
1,Shot Chart Detail,21700002,10,201935,James Harden,1610612745,Houston Rockets,1,11,13,...,Center(C),8-16 ft.,10,46,94,1,0,20171017,GSW,HOU
2,Shot Chart Detail,21700002,28,201935,James Harden,1610612745,Houston Rockets,1,9,51,...,Center(C),24+ ft.,25,-52,245,1,0,20171017,GSW,HOU
3,Shot Chart Detail,21700002,36,201935,James Harden,1610612745,Houston Rockets,1,9,33,...,Center(C),Less Than 8 ft.,6,-10,61,1,0,20171017,GSW,HOU
4,Shot Chart Detail,21700002,80,201935,James Harden,1610612745,Houston Rockets,1,6,43,...,Center(C),Less Than 8 ft.,1,13,15,1,0,20171017,GSW,HOU


Finally, we can get the shot chart detail.

![](images/nba-dance.gif)