## Data Munging?!
## Exploratory Data Analysis?!

# Data Munging & EDA!!

`Munging` is the colloquial term for going through raw data and processing it to a point of usefulness.<br>
`EDA` is the iterative process of learning about your data and how to process it.

You will find that a significant amount of time in most real-world data science/analysis projects involves acquiring, structuring, cleaning, and otherwise pre-processing your dataset of interest _before_ you can get into the actual analytics.

It is critical to understand the structure and reliability of your data, so the EDA/Munging process also exposes you to the strengths and weaknesses of your data set.

### What might `data munging` involve?
-Finding & cleaning `outliers` <br>
-Finding & cleaning `mis-typed` data elements<br>
-Finding & cleaning `unavailable/empty` data elements<br>

### It can also include...
-Combining data sets<br>
-Creating new data elements from existing data<br>
-Discovering new and unusual data problems to solve!

### We are going to look at a data set of NBA player data from the Kaggle website

In [1]:
# first let's import the standard libraries we've been working with

import numpy as np
import pandas as pd
import datascience as ds

## Let's start by simply opening the files and reading the data into tables.

## The Pandas .read_csv() feature will take a `Comma Separated Values` file and create a DataFrame from it.<br>
## We find .csv files frequently. They are concise, and can be easily read & written by a variety of programs.

## We can also see that .read_csv() can take many, many `arguments`<br>
## As with any feature, use a question mark to call up the syntax thusly: `pd.read_csv?`<br>
## For starters, we'll just be providing a file name. <br>
## You will need to ensure that the file is in the `filepath` or explicitly provide it.



In [2]:
pd.read_csv

<function pandas.io.parsers._make_parser_function.<locals>.parser_f(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)>

In [3]:
# Where will our read_csv look?
# os.getcwd() tells us the Current Working Diretory

import os
os.getcwd()

'/home/jovyan/Sports Analytics/Homework 3 (NBA)'

In [4]:
# the first file is Season_Stats.csv
# we are implicitly creating a Pandas DataTable called ann_stats with the contents of this file
# "nba-players-stats" is a subdirectory of "data8_sports"


ann_stats = pd.read_csv('Seasons_Stats.csv')

In [5]:
type(ann_stats)

pandas.core.frame.DataFrame

In [6]:
ann_stats

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0
5,5,1950.0,Ed Bartels,F,24.0,NYK,2.0,,,,...,0.667,,,,0.0,,,,2.0,4.0
6,6,1950.0,Ralph Beard,G,22.0,INO,60.0,,,,...,0.762,,,,233.0,,,,132.0,895.0
7,7,1950.0,Gene Berce,G-F,23.0,TRI,3.0,,,,...,0.000,,,,2.0,,,,6.0,10.0
8,8,1950.0,Charlie Black,F-C,28.0,TOT,65.0,,,,...,0.651,,,,163.0,,,,273.0,661.0
9,9,1950.0,Charlie Black,F-C,28.0,FTW,36.0,,,,...,0.632,,,,75.0,,,,140.0,382.0


In [7]:
# the feature .head() will return the top 5 rows of the DataTable
# .head() will take an integer and return that many rows, if desired.
# this is a good feature to get a quick look at the data as it includes the header line (column titles)

ann_stats.head(16)

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,0,1950.0,Curly Armstrong,G-F,31.0,FTW,63.0,,,,...,0.705,,,,176.0,,,,217.0,458.0
1,1,1950.0,Cliff Barker,SG,29.0,INO,49.0,,,,...,0.708,,,,109.0,,,,99.0,279.0
2,2,1950.0,Leo Barnhorst,SF,25.0,CHS,67.0,,,,...,0.698,,,,140.0,,,,192.0,438.0
3,3,1950.0,Ed Bartels,F,24.0,TOT,15.0,,,,...,0.559,,,,20.0,,,,29.0,63.0
4,4,1950.0,Ed Bartels,F,24.0,DNN,13.0,,,,...,0.548,,,,20.0,,,,27.0,59.0
5,5,1950.0,Ed Bartels,F,24.0,NYK,2.0,,,,...,0.667,,,,0.0,,,,2.0,4.0
6,6,1950.0,Ralph Beard,G,22.0,INO,60.0,,,,...,0.762,,,,233.0,,,,132.0,895.0
7,7,1950.0,Gene Berce,G-F,23.0,TRI,3.0,,,,...,0.0,,,,2.0,,,,6.0,10.0
8,8,1950.0,Charlie Black,F-C,28.0,TOT,65.0,,,,...,0.651,,,,163.0,,,,273.0,661.0
9,9,1950.0,Charlie Black,F-C,28.0,FTW,36.0,,,,...,0.632,,,,75.0,,,,140.0,382.0


# What can we say about this file at first glance?

## What does each row represent?
## Is the data complete? sparce? What's "NaN"?

## What about the "..." between PER and FT%?


In [8]:
# we can use the feature .columns to get a full view of the data headers

ann_stats.columns

Index(['Unnamed: 0', 'Year', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP',
       'PER', 'TS%', '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%',
       'BLK%', 'TOV%', 'USG%', 'blanl', 'OWS', 'DWS', 'WS', 'WS/48', 'blank2',
       'OBPM', 'DBPM', 'BPM', 'VORP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%',
       '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB',
       'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS'],
      dtype='object')

In [9]:
# We have 2 more data files, "player_data.csv" and "players.csv"

# Let's investigate their contents

player_data = pd.read_csv('player_data.csv')

In [10]:
player_data.head(10)

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
0,Alaa Abdelnaby,1991,1995,F-C,6-10,240.0,"June 24, 1968",Duke University
1,Zaid Abdul-Aziz,1969,1978,C-F,6-9,235.0,"April 7, 1946",Iowa State University
2,Kareem Abdul-Jabbar,1970,1989,C,7-2,225.0,"April 16, 1947","University of California, Los Angeles"
3,Mahmoud Abdul-Rauf,1991,2001,G,6-1,162.0,"March 9, 1969",Louisiana State University
4,Tariq Abdul-Wahad,1998,2003,F,6-6,223.0,"November 3, 1974",San Jose State University
5,Shareef Abdur-Rahim,1997,2008,F,6-9,225.0,"December 11, 1976",University of California
6,Tom Abernethy,1977,1981,F,6-7,220.0,"May 6, 1954",Indiana University
7,Forest Able,1957,1957,G,6-3,180.0,"July 27, 1932",Western Kentucky University
8,John Abramovic,1947,1948,F,6-3,195.0,"February 9, 1919",Salem International University
9,Alex Abrines,2017,2018,G-F,6-6,190.0,"August 1, 1993",


### player_data.csv appears to contain information about individual players

### Data fields include start/end year, position, height/weight, birthdate, and college attended

### What would it say for a player who didn't attend college?


In [11]:
# Space here to look for one of those records...

In [12]:
# Let's now look at the final file, players.csv

players = pd.read_csv('Players.csv')

In [13]:
players.head(10)

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
0,0,Curly Armstrong,180.0,77.0,Indiana University,1918.0,,
1,1,Cliff Barker,188.0,83.0,University of Kentucky,1921.0,Yorktown,Indiana
2,2,Leo Barnhorst,193.0,86.0,University of Notre Dame,1924.0,,
3,3,Ed Bartels,196.0,88.0,North Carolina State University,1925.0,,
4,4,Ralph Beard,178.0,79.0,University of Kentucky,1927.0,Hardinsburg,Kentucky
5,5,Gene Berce,180.0,79.0,Marquette University,1926.0,,
6,6,Charlie Black,196.0,90.0,University of Kansas,1921.0,Arco,Idaho
7,7,Nelson Bobb,183.0,77.0,Temple University,1924.0,Philadelphia,Pennsylvania
8,8,Jake Bornheimer,196.0,90.0,Muhlenberg College,1927.0,New Brunswick,New Jersey
9,9,Vince Boryla,196.0,95.0,University of Denver,1927.0,East Chicago,Indiana


## players.csv seems to have some _similar_ information as player_data...
## But they are distinct

## We can infer that the height field in player_data.csv "6-9", "6-10" are in English units of feet-inches
## While the same field in players.csv appears to be in Metric units of centimeters
## We will look at some individual records to test this hypothesis

## Notice also, the column title in players.csv of "collage" instead of "college"

## Finally, notice that players.csv has an index column "Unnamed" and so we see 2 indices on the leftmost columns, one from the file, and an imputed one from the DataFrame object.

# `Aside: Open .CSV files right in Jupyter Notebook


## Let's Look at some individual player records!

## We'll start with one of the all-time legends, Wilt Chamberlain


In [14]:
# Let's look for Wilt the Stilt's record in player_data...

player_data.loc['Wilt Chamberlain']

KeyError: 'the label [Wilt Chamberlain] is not in the [index]'

In [15]:
# We can see that we can give .loc[] an index and it returns the row associated with that index

player_data.loc[9]

name            Alex Abrines
year_start              2017
year_end                2018
position                 G-F
height                   6-6
weight                   190
birth_date    August 1, 1993
college                  NaN
Name: 9, dtype: object

In [16]:
# We can also give .loc[] a range
# This output should raise some questions...

player_data.loc[250:265]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
250,Kent Bazemore,2013,2018,G-F,6-5,201.0,"July 1, 1989",Old Dominion University
251,Ed Beach,1951,1951,F,6-3,200.0,"January 25, 1929",West Virginia University
252,Bradley Beal,2013,2018,G,6-5,207.0,"June 28, 1993",University of Florida
253,Al Beard,1968,1968,C,6-9,200.0,"April 27, 1942",Norfolk State University
254,Butch Beard,1970,1979,G,6-3,185.0,"May 4, 1947",University of Louisville
255,Ralph Beard,1950,1951,G,5-10,175.0,"December 2, 1927",University of Kentucky
256,Charles Beasley,1968,1971,G-F,6-5,190.0,"September 23, 1945",Southern Methodist University
257,Jerome Beasley,2004,2004,F,6-10,237.0,"May 17, 1980",University of North Dakota
258,John Beasley,1968,1974,F-C,6-9,225.0,"February 5, 1944",Texas A&M University
259,Malik Beasley,2017,2018,G,6-5,196.0,"November 26, 1996",Florida State University


### Meanwhile, we still haven't located The Big Dipper's record!

#### Let's explore ways to find it.

#### First, we can see the "name" field is First Last.

#### We use the feature .isin() 


In [17]:
# the column object from a DataFrame is a Series

type(player_data['name'])

pandas.core.series.Series

In [18]:
# When we search that Series for exact matches for "Wilt Chamberlain"
# we get a Series of boolean values back, with "False" in every element except the one with Wilt

#player_data['name'].isin(['Wilt Chamberlain'])
player_data['name'].isin(['Wilt Chamberlain'])[670:680]

670    False
671    False
672    False
673     True
674    False
675    False
676    False
677    False
678    False
679    False
Name: name, dtype: bool

In [19]:
# Putting this all together we see that .loc is now taking a Series of boolean values 
# and returning the rows with values of "True"

player_data.loc[player_data['name'].isin(['Wilt Chamberlain'])]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
673,Wilt Chamberlain,1960,1973,C,7-1,275.0,"August 21, 1936",University of Kansas


## All well and good. 
## What if we didn't know he was listed as "Wilt Chamberlain"?

### How can we search that list of players with less than a full name to go off?



In [20]:
# Here is a snippet of code that will return a list of all elements containing our string value of interest

matching = [s for s in player_data['name'] if "Wilt" in s]

In [21]:
# and we find only one other player with "Wilt" in their name

matching

['Wilt Chamberlain', 'Kyle Wiltjer']

In [22]:
# Is this case sensitive?
# inconclusive!

matching2 = [s for s in player_data['name'] if "wilt" in s]
print(matching2)

[]


### We can put these pieces together and return all the records containing our search string of interest

In [23]:
player_data.loc[player_data['name'].isin(s for s in player_data['name'] if "Wilt" in s)]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
673,Wilt Chamberlain,1960,1973,C,7-1,275.0,"August 21, 1936",University of Kansas
4446,Kyle Wiltjer,2017,2017,F,6-10,240.0,"October 20, 1992",Gonzaga University


In [24]:
# the player_data['name'] column seems well formed
# but we run into an issue with the similar column in players.csv...players['Player']

players.loc[players['Player'].isin(s for s in players['Player'] if "Wilt" in s)]

TypeError: argument of type 'float' is not iterable

In [25]:
players_list = players['Player']

In [26]:
type(players_list)

pandas.core.series.Series

In [27]:
players.head(10)

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
0,0,Curly Armstrong,180.0,77.0,Indiana University,1918.0,,
1,1,Cliff Barker,188.0,83.0,University of Kentucky,1921.0,Yorktown,Indiana
2,2,Leo Barnhorst,193.0,86.0,University of Notre Dame,1924.0,,
3,3,Ed Bartels,196.0,88.0,North Carolina State University,1925.0,,
4,4,Ralph Beard,178.0,79.0,University of Kentucky,1927.0,Hardinsburg,Kentucky
5,5,Gene Berce,180.0,79.0,Marquette University,1926.0,,
6,6,Charlie Black,196.0,90.0,University of Kansas,1921.0,Arco,Idaho
7,7,Nelson Bobb,183.0,77.0,Temple University,1924.0,Philadelphia,Pennsylvania
8,8,Jake Bornheimer,196.0,90.0,Muhlenberg College,1927.0,New Brunswick,New Jersey
9,9,Vince Boryla,196.0,95.0,University of Denver,1927.0,East Chicago,Indiana


In [28]:
# note the error
# you can confirm independently that both Wilt C and Kyle Wiltjer have entries in the file

matching3 = [s for s in players_list if "Wilt" in s]

TypeError: argument of type 'float' is not iterable

### What is causing the issue is a "Nan" entry being interpreted as a float type

### We need to clean that value up

### We can use the feature .isna() to identify that troublesome entry
### And, lo and behold, the entire row is blank!

### How do we drop a row?

In [29]:
players.loc[players_list.isna()]

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
223,223,,,,,,,


### Let's look at the surrounding rows.

In [30]:
players.loc[220:225]

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
220,220,D.C. Wilcutt,175.0,70.0,,1926.0,,
221,221,Bob Wood,175.0,70.0,Northern Illinois University,1921.0,,
222,222,Max Zaslofsky,188.0,77.0,St. John's University,1925.0,Brooklyn,New York
223,223,,,,,,,
224,224,Paul Arizin*,193.0,86.0,Villanova University,1928.0,Philadelphia,Pennsylvania
225,225,Ed Beach,190.0,90.0,West Virginia University,1929.0,,


### Let's make sure we have the correct index of the row we want to drop

### It looks like it should be record 223

In [31]:
players.loc[223]

Unnamed: 0     223
Player         NaN
height         NaN
weight         NaN
collage        NaN
born           NaN
birth_city     NaN
birth_state    NaN
Name: 223, dtype: object

In [32]:
# We can investigate the signature of drop using `?`

#players.drop?

In [33]:
# We can always re-import the file.
# Let's see what happens if we simply give .drop[] an index...

players2 = players.drop([223])

In [34]:
# We can look at that set of records now

players2.loc[220:225]

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
220,220,D.C. Wilcutt,175.0,70.0,,1926.0,,
221,221,Bob Wood,175.0,70.0,Northern Illinois University,1921.0,,
222,222,Max Zaslofsky,188.0,77.0,St. John's University,1925.0,Brooklyn,New York
224,224,Paul Arizin*,193.0,86.0,Villanova University,1928.0,Philadelphia,Pennsylvania
225,225,Ed Beach,190.0,90.0,West Virginia University,1929.0,,


In [35]:
# and let's go back to our original goal of finding all the entries with "Wilt" using that same script

players2.loc[players2['Player'].isin(s for s in players2['Player'] if "Wilt" in s)]


Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
494,494,Wilt Chamberlain*,216.0,124.0,University of Kansas,1936.0,Philadelphia,Pennsylvania
3918,3918,Kyle Wiltjer,208.0,108.0,Gonzaga University,1992.0,Portland,Oregon


In [36]:
player_data.loc[player_data['name'].isin(s for s in player_data['name'] if "Wilt" in s)]


Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
673,Wilt Chamberlain,1960,1973,C,7-1,275.0,"August 21, 1936",University of Kansas
4446,Kyle Wiltjer,2017,2017,F,6-10,240.0,"October 20, 1992",Gonzaga University


## SUCCESS!!

### But what do we notice about the GOAT?

### In this file it seems there's a star attached to his name

# STOP Feb 14

# Problem 1

## Let's go back to our football play-calling example

## Offense can call Run or Pass
## Defense can call Normal or Blitz

## Now Payoffs (Offense, Defense) are:
### {3, -3} for Run/Normal
### {5, -5} for Run/Blitz
### {8, -8} for Pass/Normal
### {-5, 5} for Pass/Blitz

## 1) You are in charge of the Offense. Your team of advance scouts tells you that the Defense calls Normal 60% of the time, and Blitz 40% of the time. What is your optimal mix of Run/Pass calls?

## 2) You are in charge of the Defense. Your scouts tell you the Offense calls Run 50% and Pass 50%. What is your optimal mix of Normal/Blitz defense calls?

## Your answers should be percent mix, with some notes/comments/math on how you got there. Hint: Consider how this would look in Decision Tree form.

# Answers...

In [None]:
#1. The best option is to run every single down. If you run on every play, you have an expected yards per play of 3.8,
#   which is more than enough to pick up a first down after three plays. If you pass on every play, you can expect 2.8 yards per
#   attempt, which will make you fall short of a first down after three plays
#2. If you run a normal defense on every play, you will be expected to give up 5.5 yards per play as opposed to 0 Yards per
#   play with constant blitzing. Therefore, blitzing every play is ideal.

# Problem 2

## Using some of the code examples above, and/or additional code, explore the NBA data files we worked on in class.

## Extract the records of any 3 of the following NBA greats:
### Wilt Chamberlain
### Jerry West ("The Logo")
### Kareem Abdul Jabbar
### Julius Erving
### Larry Bird
### Moses Malone
### Magic Johnson
### Michael Jordan
### Kobe Bryant
### Charles Barkley
### Shaquille O'Neal
### Karl Malone
### LeBron James
### Steph Curry

## Please leave the working code as your answer!

In [41]:
player_data.loc[player_data['name'].isin(['Kobe Bryant'])]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
528,Kobe Bryant,1997,2016,G-F,6-6,212.0,"August 23, 1978",


In [40]:
ann_stats.loc[ann_stats['Player'].isin(['Kobe Bryant'])]

Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
12900,12900,1997.0,Kobe Bryant,SG,18.0,LAL,71.0,6.0,1103.0,14.4,...,0.819,47.0,85.0,132.0,91.0,49.0,23.0,112.0,102.0,539.0
13479,13479,1998.0,Kobe Bryant,SG,19.0,LAL,79.0,1.0,2056.0,18.5,...,0.794,79.0,163.0,242.0,199.0,74.0,40.0,157.0,180.0,1220.0
14021,14021,1999.0,Kobe Bryant,SG,20.0,LAL,50.0,50.0,1896.0,18.9,...,0.839,53.0,211.0,264.0,190.0,72.0,50.0,157.0,153.0,996.0
14537,14537,2000.0,Kobe Bryant,SG,21.0,LAL,66.0,62.0,2524.0,21.7,...,0.821,108.0,308.0,416.0,323.0,106.0,62.0,182.0,220.0,1485.0
15028,15028,2001.0,Kobe Bryant,SG,22.0,LAL,68.0,68.0,2783.0,24.5,...,0.853,104.0,295.0,399.0,338.0,114.0,43.0,220.0,222.0,1938.0
15578,15578,2002.0,Kobe Bryant,SG,23.0,LAL,80.0,80.0,3063.0,23.2,...,0.829,112.0,329.0,441.0,438.0,118.0,35.0,223.0,228.0,2019.0
16070,16070,2003.0,Kobe Bryant,SG,24.0,LAL,82.0,82.0,3401.0,26.2,...,0.843,106.0,458.0,564.0,481.0,181.0,67.0,288.0,218.0,2461.0
16576,16576,2004.0,Kobe Bryant,SG,25.0,LAL,65.0,64.0,2447.0,23.7,...,0.852,103.0,256.0,359.0,330.0,112.0,28.0,171.0,176.0,1557.0
17159,17159,2005.0,Kobe Bryant,SG,26.0,LAL,66.0,66.0,2689.0,23.3,...,0.816,95.0,297.0,392.0,398.0,86.0,53.0,270.0,174.0,1819.0
17742,17742,2006.0,Kobe Bryant,SG,27.0,LAL,80.0,80.0,3277.0,28.0,...,0.85,71.0,354.0,425.0,360.0,147.0,30.0,250.0,233.0,2832.0


Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
2033,Magic Johnson,1980,1996,G-F,6-9,215.0,"August 14, 1959",Michigan State University


Unnamed: 0.1,Unnamed: 0,Year,Player,Pos,Age,Tm,G,GS,MP,PER,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
12900,12900,1997.0,Kobe Bryant,SG,18.0,LAL,71.0,6.0,1103.0,14.4,...,0.819,47.0,85.0,132.0,91.0,49.0,23.0,112.0,102.0,539.0
13479,13479,1998.0,Kobe Bryant,SG,19.0,LAL,79.0,1.0,2056.0,18.5,...,0.794,79.0,163.0,242.0,199.0,74.0,40.0,157.0,180.0,1220.0
14021,14021,1999.0,Kobe Bryant,SG,20.0,LAL,50.0,50.0,1896.0,18.9,...,0.839,53.0,211.0,264.0,190.0,72.0,50.0,157.0,153.0,996.0
14537,14537,2000.0,Kobe Bryant,SG,21.0,LAL,66.0,62.0,2524.0,21.7,...,0.821,108.0,308.0,416.0,323.0,106.0,62.0,182.0,220.0,1485.0
15028,15028,2001.0,Kobe Bryant,SG,22.0,LAL,68.0,68.0,2783.0,24.5,...,0.853,104.0,295.0,399.0,338.0,114.0,43.0,220.0,222.0,1938.0
15578,15578,2002.0,Kobe Bryant,SG,23.0,LAL,80.0,80.0,3063.0,23.2,...,0.829,112.0,329.0,441.0,438.0,118.0,35.0,223.0,228.0,2019.0
16070,16070,2003.0,Kobe Bryant,SG,24.0,LAL,82.0,82.0,3401.0,26.2,...,0.843,106.0,458.0,564.0,481.0,181.0,67.0,288.0,218.0,2461.0
16576,16576,2004.0,Kobe Bryant,SG,25.0,LAL,65.0,64.0,2447.0,23.7,...,0.852,103.0,256.0,359.0,330.0,112.0,28.0,171.0,176.0,1557.0
17159,17159,2005.0,Kobe Bryant,SG,26.0,LAL,66.0,66.0,2689.0,23.3,...,0.816,95.0,297.0,392.0,398.0,86.0,53.0,270.0,174.0,1819.0
17742,17742,2006.0,Kobe Bryant,SG,27.0,LAL,80.0,80.0,3277.0,28.0,...,0.85,71.0,354.0,425.0,360.0,147.0,30.0,250.0,233.0,2832.0


In [42]:
players.loc[players['Player'].isin(['Kobe Bryant'])]

Unnamed: 0.1,Unnamed: 0,Player,height,weight,collage,born,birth_city,birth_state
2456,2456,Kobe Bryant,198.0,96.0,,1978.0,Philadelphia,Pennsylvania


In [43]:
player_data.loc[player_data['name'].isin(['Michael Jordan'])]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
2116,Michael Jordan,1985,2003,G-F,6-6,195.0,"February 17, 1963",University of North Carolina


In [48]:
player_data.loc[player_data['name'].isin(['Magic Johnson'])]

Unnamed: 0,name,year_start,year_end,position,height,weight,birth_date,college
2033,Magic Johnson,1980,1996,G-F,6-9,215.0,"August 14, 1959",Michigan State University
