# Pandas Data Frame

- Pandas package is like `data.frame` in R

- Conceptually similar but usage differ

- Introducing Pandas using basketball data

Obtaining data from NBA can be done using the function developed previously.

In [93]:
import pandas as pd
import numpy as np

def get_nba_data(endpt, params, return_url=False):

    ## endpt: https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation
    ## params: dictionary of parameters: i.e., {'LeagueID':'00'}
    
    from pandas import DataFrame
    from urllib.parse import urlencode
    import json
    
    useragent = "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9\""
    dataurl = "\"" + "http://stats.nba.com/stats/" + endpt + "?" + urlencode(params) + "\""
    
    # for debugging: just return the url
    if return_url:
        return(dataurl)
    
    jsonstr = !wget -q -O - --user-agent={useragent} {dataurl}
    
    data = json.loads(jsonstr[0])
    
    h = data['resultSets'][0]['headers']
    d = data['resultSets'][0]['rowSet']
    
    return(DataFrame(d, columns=h))

Using the function, collect teams and players data

In [94]:
## get all teams
params = {'LeagueID':'00'}
# teams = get_nba_data('commonTeamYears', params)      # NBA doesn't cooperate
teams = pd.read_pickle('data/commonTeamYears.pkl')     # saved from earlier

## get all players
params = {'LeagueID':'00', 'Season': '2016-17', 'IsOnlyCurrentSeason': '0'}
# players = get_nba_data('commonallplayers', params)   # NBA doesn't cooperate 
players = pd.read_pickle('data/commonallplayers.pkl')  # saved from earlier

# Pandas

Pandas has an extensive set of functions. Refer to [Chapter 3 in PDSH](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) and the [official website](https://pandas.pydata.org). Latest stable release documentation is here: [http://pandas.pydata.org/pandas-docs/stable/api.html](http://pandas.pydata.org/pandas-docs/stable/api.html).

## Pandas Series and Data Frames

Pandas data frames are different objects:

In [95]:
print("data frame object :", type(teams))
print("data row object   :", type(teams.iloc[0]))
print("data column object:", type(teams.ABBREVIATION))

data frame object : <class 'pandas.core.frame.DataFrame'>
data row object   : <class 'pandas.core.series.Series'>
data column object: <class 'pandas.core.series.Series'>


- Rows and columns of pandas data frame are `Series` objects

- In R, rows would be a smaller data frame

- Methods (functions) for Series and DataFrames are not identical

- Pandas general functions: http://pandas.pydata.org/pandas-docs/stable/api.html#general-functions   
    e.g., [`pandas.melt()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html#pandas-melt) take `DataFrame` as input. 

- Series methods: http://pandas.pydata.org/pandas-docs/stable/api.html#series

- DataFrame methods: http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe

- Index methods: http://pandas.pydata.org/pandas-docs/version/0.24/reference/api/pandas.Index.html#pandas.Index

## Reading Documentation

- Make sure of the version

In [96]:
pd.__version__

'0.24.2'

- [Version 0.24 documentation](http://pandas.pydata.org/pandas-docs/version/0.24/reference/index.html)

- Locate the right section (`Series`, `DataFrame`, etc.): e.g. [`Series` section](http://pandas.pydata.org/pandas-docs/version/0.24/reference/series.html)

- Example: sorting values

In [97]:
np.random.seed(1)
somearray = pd.Series(np.random.randint(0, 10, 5))
somearray

0    5
1    8
2    9
3    5
4    0
dtype: int64

In [98]:
# somearray: [5, 8, 9, 5, 0]
somearray.sort_values()  ## shift-tab 

4    0
0    5
3    5
1    8
2    9
dtype: int64

In [99]:
# somearray: [5, 8, 9, 5, 0]
somearray.argsort()

0    4
1    0
2    3
3    1
4    2
dtype: int64

In [100]:
# copy players data frame
pc = players.copy()
pc.head(10)

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG
0,76001,"Abdelnaby, Alaa",Alaa Abdelnaby,0,1990,1994,HISTADD_alaa_abdelnaby,0,,,,,Y
1,76002,"Abdul-Aziz, Zaid",Zaid Abdul-Aziz,0,1968,1977,HISTADD_zaid_abdul-aziz,0,,,,,Y
2,76003,"Abdul-Jabbar, Kareem",Kareem Abdul-Jabbar,0,1969,1988,HISTADD_kareem_abdul-jabbar,0,,,,,Y
3,51,"Abdul-Rauf, Mahmoud",Mahmoud Abdul-Rauf,0,1990,2000,mahmoud_abdul-rauf,0,,,,,Y
4,1505,"Abdul-Wahad, Tariq",Tariq Abdul-Wahad,0,1997,2003,tariq_abdul-wahad,0,,,,,Y
5,949,"Abdur-Rahim, Shareef",Shareef Abdur-Rahim,0,1996,2007,shareef_abdur-rahim,0,,,,,Y
6,76005,"Abernethy, Tom",Tom Abernethy,0,1976,1980,HISTADD_tom_abernethy,0,,,,,Y
7,76006,"Able, Forest",Forest Able,0,1956,1956,HISTADD_frosty_able,0,,,,,Y
8,76007,"Abramovic, John",John Abramovic,0,1946,1947,HISTADD_brooms_abramovic,0,,,,,Y
9,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,OKC,thunder,Y


In [101]:
# rownames are of type index
pc.index

RangeIndex(start=0, stop=4391, step=1)

In [102]:
# column names are of type index
pc.columns

Index(['PERSON_ID', 'DISPLAY_LAST_COMMA_FIRST', 'DISPLAY_FIRST_LAST',
       'ROSTERSTATUS', 'FROM_YEAR', 'TO_YEAR', 'PLAYERCODE', 'TEAM_ID',
       'TEAM_CITY', 'TEAM_NAME', 'TEAM_ABBREVIATION', 'TEAM_CODE',
       'GAMES_PLAYED_FLAG'],
      dtype='object')

### Indexing

- `Index` can be used to subset `Series` and `DataFrames`

- https://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing.

- `.loc` is primarily for using labels and booleans:  
    e.g., column and row indices, comparison operators, etc

In [103]:
# index can be used to subset data
pc.loc[0]

PERSON_ID                                    76001
DISPLAY_LAST_COMMA_FIRST           Abdelnaby, Alaa
DISPLAY_FIRST_LAST                  Alaa Abdelnaby
ROSTERSTATUS                                     0
FROM_YEAR                                     1990
TO_YEAR                                       1994
PLAYERCODE                  HISTADD_alaa_abdelnaby
TEAM_ID                                          0
TEAM_CITY                                         
TEAM_NAME                                         
TEAM_ABBREVIATION                                 
TEAM_CODE                                         
GAMES_PLAYED_FLAG                                Y
Name: 0, dtype: object

- `.iloc` is primarily for using integer positions: i.e., like you would matrices

In [104]:
# index can be used to subset data
pc.iloc[-1]

PERSON_ID                                78650
DISPLAY_LAST_COMMA_FIRST           Zunic, Matt
DISPLAY_FIRST_LAST                  Matt Zunic
ROSTERSTATUS                                 0
FROM_YEAR                                 1948
TO_YEAR                                   1948
PLAYERCODE                  HISTADD_matt_zunic
TEAM_ID                                      0
TEAM_CITY                                     
TEAM_NAME                                     
TEAM_ABBREVIATION                             
TEAM_CODE                                     
GAMES_PLAYED_FLAG                            Y
Name: 4390, dtype: object

More on indexing later

## Series

- Series are arrays with item names

In [105]:
# list to Series
tmp = pd.Series([1, 2, 3])
tmp

0    1
1    2
2    3
dtype: int64

In [106]:
# Series to list
tmp.to_list()

[1, 2, 3]

In [107]:
# Series to dict
tmp.to_dict() # similar to converting NBA JSON data to data frame

{0: 1, 1: 2, 2: 3}

In [108]:
tmp.index # Series have index as item names

RangeIndex(start=0, stop=3, step=1)

In [109]:
tmp.index = ['one', 'two', 'three'] # new index can be set
tmp

one      1
two      2
three    3
dtype: int64

In [110]:
tmp.index # Series have index as item names

Index(['one', 'two', 'three'], dtype='object')

In [111]:
tmp.to_dict() # python dictionary

{'one': 1, 'two': 2, 'three': 3}

### `dtype` 

- `dtype: int64` indicates `Series` contain integers

- Example: change to ordered category `dtype`:

In [112]:
tmp.astype(float) # dtype to floating point number

one      1.0
two      2.0
three    3.0
dtype: float64

In [113]:
tmp.astype(pd.CategoricalDtype(ordered=True)) # dtype to ordered categorical variable

one      1
two      2
three    3
dtype: category
Categories (3, int64): [1 < 2 < 3]

- [`Series.astype()` documentation](http://pandas.pydata.org/pandas-docs/version/0.24/reference/api/pandas.Series.astype.html#pandas.Series.astype)

## `DataFrame`

- `DataFrame` is a collection of `Series`

In [114]:
teamcopy = teams.copy()
teamcopy.head()

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION
0,0,1610612737,1949,2018,ATL
1,0,1610612738,1946,2018,BOS
2,0,1610612739,1970,2018,CLE
3,0,1610612740,2002,2018,NOP
4,0,1610612741,1966,2018,CHI


- A column can be an index

In [115]:
teamcopy.set_index('ABBREVIATION').head()

Unnamed: 0_level_0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR
ABBREVIATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ATL,0,1610612737,1949,2018
BOS,0,1610612738,1946,2018
CLE,0,1610612739,1970,2018
NOP,0,1610612740,2002,2018
CHI,0,1610612741,1966,2018


In [116]:
teamcopy.set_index('ABBREVIATION', inplace=True) # change index and save
teamcopy.head()

Unnamed: 0_level_0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR
ABBREVIATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ATL,0,1610612737,1949,2018
BOS,0,1610612738,1946,2018
CLE,0,1610612739,1970,2018
NOP,0,1610612740,2002,2018
CHI,0,1610612741,1966,2018


### Subsetting rows conditionally

In [117]:
playercopy = players.copy()
playercopy.head(15)

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG
0,76001,"Abdelnaby, Alaa",Alaa Abdelnaby,0,1990,1994,HISTADD_alaa_abdelnaby,0,,,,,Y
1,76002,"Abdul-Aziz, Zaid",Zaid Abdul-Aziz,0,1968,1977,HISTADD_zaid_abdul-aziz,0,,,,,Y
2,76003,"Abdul-Jabbar, Kareem",Kareem Abdul-Jabbar,0,1969,1988,HISTADD_kareem_abdul-jabbar,0,,,,,Y
3,51,"Abdul-Rauf, Mahmoud",Mahmoud Abdul-Rauf,0,1990,2000,mahmoud_abdul-rauf,0,,,,,Y
4,1505,"Abdul-Wahad, Tariq",Tariq Abdul-Wahad,0,1997,2003,tariq_abdul-wahad,0,,,,,Y
5,949,"Abdur-Rahim, Shareef",Shareef Abdur-Rahim,0,1996,2007,shareef_abdur-rahim,0,,,,,Y
6,76005,"Abernethy, Tom",Tom Abernethy,0,1976,1980,HISTADD_tom_abernethy,0,,,,,Y
7,76006,"Able, Forest",Forest Able,0,1956,1956,HISTADD_frosty_able,0,,,,,Y
8,76007,"Abramovic, John",John Abramovic,0,1946,1947,HISTADD_brooms_abramovic,0,,,,,Y
9,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,OKC,thunder,Y


In [118]:
playercopy = playercopy[playercopy.TEAM_ABBREVIATION!=""]  # subset teams with TEAM_ABBREVIATION
playercopy.head() 

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG
9,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,OKC,thunder,Y
14,203112,"Acy, Quincy",Quincy Acy,1,2012,2018,quincy_acy,1610612751,Brooklyn,Nets,BKN,nets,Y
21,203500,"Adams, Steven",Steven Adams,1,2013,2018,steven_adams,1610612760,Oklahoma City,Thunder,OKC,thunder,Y
23,1628389,"Adebayo, Bam",Bam Adebayo,1,2017,2018,bam_adebayo,1610612748,Miami,Heat,MIA,heat,Y
27,201167,"Afflalo, Arron",Arron Afflalo,1,2007,2017,arron_afflalo,1610612753,Orlando,Magic,ORL,magic,Y


### Setting `Index`

In [119]:
temp = playercopy.set_index('TEAM_ABBREVIATION')
temp.head()

Unnamed: 0_level_0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_CODE,GAMES_PLAYED_FLAG
TEAM_ABBREVIATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
OKC,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,thunder,Y
BKN,203112,"Acy, Quincy",Quincy Acy,1,2012,2018,quincy_acy,1610612751,Brooklyn,Nets,nets,Y
OKC,203500,"Adams, Steven",Steven Adams,1,2013,2018,steven_adams,1610612760,Oklahoma City,Thunder,thunder,Y
MIA,1628389,"Adebayo, Bam",Bam Adebayo,1,2017,2018,bam_adebayo,1610612748,Miami,Heat,heat,Y
ORL,201167,"Afflalo, Arron",Arron Afflalo,1,2007,2017,arron_afflalo,1610612753,Orlando,Magic,magic,Y


In [120]:
temp.loc['OKC'].head() # row index is not unique

Unnamed: 0_level_0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_CODE,GAMES_PLAYED_FLAG
TEAM_ABBREVIATION,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
OKC,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,thunder,Y
OKC,203500,"Adams, Steven",Steven Adams,1,2013,2018,steven_adams,1610612760,Oklahoma City,Thunder,thunder,Y
OKC,2546,"Anthony, Carmelo",Carmelo Anthony,1,2003,2018,carmelo_anthony,1610612760,Oklahoma City,Thunder,thunder,Y
OKC,201147,"Brewer, Corey",Corey Brewer,1,2007,2018,corey_brewer,1610612760,Oklahoma City,Thunder,thunder,Y
OKC,2555,"Collison, Nick",Nick Collison,1,2003,2017,nick_collison,1610612760,Oklahoma City,Thunder,thunder,Y


In [121]:
temp.reset_index(inplace=True) # return row index to row numbers
temp.head()

Unnamed: 0,TEAM_ABBREVIATION,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_CODE,GAMES_PLAYED_FLAG
0,OKC,203518,"Abrines, Alex",Alex Abrines,1,2016,2018,alex_abrines,1610612760,Oklahoma City,Thunder,thunder,Y
1,BKN,203112,"Acy, Quincy",Quincy Acy,1,2012,2018,quincy_acy,1610612751,Brooklyn,Nets,nets,Y
2,OKC,203500,"Adams, Steven",Steven Adams,1,2013,2018,steven_adams,1610612760,Oklahoma City,Thunder,thunder,Y
3,MIA,1628389,"Adebayo, Bam",Bam Adebayo,1,2017,2018,bam_adebayo,1610612748,Miami,Heat,heat,Y
4,ORL,201167,"Afflalo, Arron",Arron Afflalo,1,2007,2017,arron_afflalo,1610612753,Orlando,Magic,magic,Y


### Getting columns

- One way to grab a column:  
    `teamcopy['MIN_YEAR']`

- *Dot notation* is easier to read (IMO):  
    `teamcopy.MIN_YEAR`

In [122]:
all(teamcopy['MIN_YEAR'] == teamcopy.MIN_YEAR)

True

### Changing `dtype` of columns

In [123]:
teamcopy = teams.copy()
print('old dtype:', teamcopy.MAX_YEAR.dtype)

old dtype: object


In [124]:
teamcopy.ABBREVIATION = teamcopy.ABBREVIATION.astype('category')
teamcopy.TEAM_ID      = teamcopy.TEAM_ID.astype('category')
teamcopy.MIN_YEAR     = teamcopy.MIN_YEAR.astype('int')
teamcopy.MAX_YEAR     = teamcopy.MAX_YEAR.astype('int')
print('new dtype:', teamcopy.MAX_YEAR.dtype)

playercopy.TEAM_ABBREVIATION = playercopy.TEAM_ABBREVIATION.astype('category')
playercopy.TEAM_ID           = playercopy.TEAM_ID.astype('category')

new dtype: int64


### Setting new columns

In [125]:
teamcopy['new_column_1'] = teamcopy.MAX_YEAR # works fine
teamcopy.new_column_2 = teamcopy.MAX_YEAR # does nothing
teamcopy.head()

  


Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION,new_column_1
0,0,1610612737,1949,2018,ATL,2018
1,0,1610612738,1946,2018,BOS,2018
2,0,1610612739,1970,2018,CLE,2018
3,0,1610612740,2002,2018,NOP,2018
4,0,1610612741,1966,2018,CHI,2018


- Existing columns can be set with dot notation

In [126]:
teamcopy.new_column_1 = 'ZZ'
teamcopy.head()

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION,new_column_1
0,0,1610612737,1949,2018,ATL,ZZ
1,0,1610612738,1946,2018,BOS,ZZ
2,0,1610612739,1970,2018,CLE,ZZ
3,0,1610612740,2002,2018,NOP,ZZ
4,0,1610612741,1966,2018,CHI,ZZ


In [127]:
teamcopy.drop('new_column_1', axis=1, inplace=True) # drop column

### Condition based slicing

Subset just the current teams

In [128]:
teamcopy = teamcopy[~pd.isna(teamcopy.ABBREVIATION)] # subset rows with team abbreviation
teamcopy.head()

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION
0,0,1610612737,1949,2018,ATL
1,0,1610612738,1946,2018,BOS
2,0,1610612739,1970,2018,CLE
3,0,1610612740,2002,2018,NOP
4,0,1610612741,1966,2018,CHI


In [129]:
tmp = teamcopy.ABBREVIATION.str.contains('^C') # str method can apply string function
Cteams = teamcopy.loc[tmp]
Cteams

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION
2,0,1610612739,1970,2018,CLE
4,0,1610612741,1966,2018,CHI
28,0,1610612766,1988,2018,CHA


### `loc` vs `iloc` indexing

In [130]:
Cteams

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION
2,0,1610612739,1970,2018,CLE
4,0,1610612741,1966,2018,CHI
28,0,1610612766,1988,2018,CHA


- `iloc` is row number (unrelated to index)

In [131]:
Cteams.iloc[2]

LEAGUE_ID               00
TEAM_ID         1610612766
MIN_YEAR              1988
MAX_YEAR              2018
ABBREVIATION           CHA
Name: 28, dtype: object

- `loc` is row name (by index)

In [132]:
Cteams.loc[2]

LEAGUE_ID               00
TEAM_ID         1610612739
MIN_YEAR              1970
MAX_YEAR              2018
ABBREVIATION           CLE
Name: 2, dtype: object

### Example usage

- Do I have players from all teams? (set comparisons)

In [133]:
set(teamcopy.ABBREVIATION.unique()) == set(playercopy.TEAM_ABBREVIATION.unique())

True

- List players groupped by teams (groupby)

In [134]:
print('iterable object:', playercopy.groupby('TEAM_CODE'), '\n\n')
for t, p in playercopy.groupby('TEAM_NAME'):
    print("***", t)
    print('; '.join(p.DISPLAY_LAST_COMMA_FIRST.values), '\n')

iterable object: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7efc2108e860> 


*** 76ers
Anderson, Justin; Bayless, Jerryd; Belinelli, Marco; Covington, Robert; Embiid, Joel; Fultz, Markelle; Holmes, Richaun; Ilyasova, Ersan; Jackson, Demetrius; Johnson, Amir; Korkmaz, Furkan; Luwawu-Cabarrot, Timothe; McConnell, T.J.; Redick, JJ; Saric, Dario; Simmons, Ben 

*** Bucks
Antetokounmpo, Giannis; Bledsoe, Eric; Brogdon, Malcolm; Brown, Sterling; Dellavedova, Matthew; Henson, John; Jennings, Brandon; Maker, Thon; Middleton, Khris; Muhammad, Shabazz; Munford, Xavier; Parker, Jabari; Plumlee, Marshall; Snell, Tony; Terry, Jason; Wilson, D.J.; Zeller, Tyler 

*** Bulls
Arcidiacono, Ryan; Asik, Omer; Blakeney, Antonio; Dunn, Kris; Eddie, Jarell; Felicio, Cristiano; Grant, Jerian; Holiday, Justin; Kilpatrick, Sean; LaVine, Zach; Lopez, Robin; Markkanen, Lauri; Nwaba, David; Payne, Cameron; Portis, Bobby; Valentine, Denzel; Vonleh, Noah; Zipser, Paul 

*** Cavaliers
Calderon, Jose; C

### Pandas (often) shows you views

- Python objects are often just _views_ of another instance

- In R, (almost) every assignment copies values

- Following says these are the same objects in memory:

In [135]:
temp = teams
print(id(temp) == id(teams))

True


- If you change one, you see the change in the other:

In [136]:
s1 = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
s2 = s1
print("id of s1:", id(s1))
print("id of s2:", id(s2))
print("s1 is s2:", s1 is s2)

id of s1: 139621351091784
id of s2: 139621351091784
s1 is s2: True


In [137]:
s1[0] = 10000

print("s1 changed:", s1[0])
print("s2 also   :", s2[0])
#print("s1 is s2:", s1[0] is s2[0])

s1 changed: 10000.0
s2 also   : 10000.0


Needs to be copied in order to make an independent variable.

In [138]:
abbr = teams.ABBREVIATION.copy()
abbr is teams.ABBREVIATION

False