#### Pandas continued from class 12..

Table of Contents

    Introduction
    Creating Objects
    Viewing Data
    Selection
    Manipulating Data
    Grouping Data
    Merging, Joining and Concatenating
    Working with Date and Time
    Working With Text Data
    Working with CSV and Excel files
    Operations
    Visualization
    Applications and Projects
    Miscellaneous 

#### 3. Viewing Data in Pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas head() method is used to return top n (5 by default) rows of a data frame or series.

    Syntax: Dataframe.head(n=5)


    Parameters:
    n: integer value, number of rows to be returned

    Return type: Dataframe with top n rows 

In [2]:
import pandas as pd
data = pd.read_csv(r"nba.csv", encoding='utf-8')

In [3]:
data.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [4]:
#To display first 10 rows
data.head(10)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


In [6]:
# number of rows to return 
n = 9

series = data["Name"] 
  
# returning top n rows 
top = series.head(n = n) 
  
# display 
top 

0    Avery Bradley
1      Jae Crowder
2     John Holland
3      R.J. Hunter
4    Jonas Jerebko
5     Amir Johnson
6    Jordan Mickey
7     Kelly Olynyk
8     Terry Rozier
Name: Name, dtype: object

In [7]:
#We can also display the data from last 10 rows
data.tail(10)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
448,Gordon Hayward,Utah Jazz,20.0,SF,26.0,6-8,226.0,Butler,15409570.0
449,Rodney Hood,Utah Jazz,5.0,SG,23.0,6-8,206.0,Duke,1348440.0
450,Joe Ingles,Utah Jazz,2.0,SF,28.0,6-8,226.0,,2050000.0
451,Chris Johnson,Utah Jazz,23.0,SF,26.0,6-6,206.0,Dayton,981348.0
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


#### Calling on Series with n parameter()

In this example, the .tail() method is called on series with custom input of n parameter to return bottom 12 rows of the series.

In [8]:

# importing pandas module 
import pandas as pd 
  
# making data frame 
data = pd.read_csv("nba.csv") 
  
# number of rows to return 
n = 12
  
# creating series 
series = data["Salary"] 
  
# returning top n rows 
bottom = series.tail(n = n) 
  
# display 
bottom 


446    12000000.0
447     1175880.0
448    15409570.0
449     1348440.0
450     2050000.0
451      981348.0
452     2239800.0
453     2433333.0
454      900000.0
455     2900000.0
456      947276.0
457           NaN
Name: Salary, dtype: float64

#### Pandas Dataframe.describe() method

Pandas describe() is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values. When this method is applied to a series of string, it returns a different output which is shown in the examples below.

    Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)


    Parameters:
    percentile: list like data type of numbers between 0-1 to return the respective percentile
    include: List of data types to be included while describing dataframe. Default is None
    exclude: List of data types to be Excluded while describing dataframe. Default is None

    Return type: Statistical summary of data frame.

In [9]:

# importing pandas module  
import pandas as pd  
      
# making data frame  
data = pd.read_csv("nba.csv")  
    
# removing null values to avoid errors  
data.dropna(inplace = True)  
  
# percentile list 
perc =[.20, .40, .60, .80] 
  
# list of dtypes to include 
include =['object', 'float', 'int'] 
  
# calling describe method 
desc = data.describe(percentiles = perc, include = include) 
  
# display 
desc

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
count,364,364,364.0,364,364.0,364,364.0,364,364.0
unique,364,30,,5,,17,,115,
top,Ramon Sessions,New Orleans Pelicans,,SG,,6-9,,Kentucky,
freq,1,16,,87,,49,,22,
mean,,,16.82967,,26.615385,,219.785714,,4620311.0
std,,,14.994162,,4.233591,,24.793099,,5119716.0
min,,,0.0,,19.0,,161.0,,55722.0
20%,,,4.0,,23.0,,195.0,,947276.0
40%,,,9.0,,25.0,,212.0,,1638754.0
50%,,,12.0,,26.0,,220.0,,2515440.0


#### Describing series of strings

In this example, the describe method is called by the Name column to see the behaviour with object data type

In [10]:

# importing pandas module  
import pandas as pd  
  
#making data frame  
data = pd.read_csv(r"nba.csv")  
    
# removing null values to avoid errors  
data.dropna(inplace = True)  
  
# calling describe method 
desc = data["Name"].describe() 
  
# display 
desc 


count                364
unique               364
top       Ramon Sessions
freq                   1
Name: Name, dtype: object

#### Pandas Dataframe.to_numpy() – Convert dataframe to Numpy array

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). This data structure can be converted to NumPy ndarray with the help of Dataframe.to_numpy() method.

    Syntax: Dataframe.to_numpy(dtype = None, copy = False)

    Parameters:
    dtype: Data type which we are passing like str.
    copy: [bool, default False] Ensures that the returned value is a not a view on another array.


    Returns:
    numpy.ndarray 

In [15]:

# importing pandas 
import pandas as pd  
   
# reading the csv   
data = pd.read_csv("nba.csv")  
      
data.dropna(inplace = True) 
   
# creating DataFrame form weight column 
gfg = pd.DataFrame(data['Weight'].head()) 
print(gfg)
# using to_numpy() 
print(type(gfg.to_numpy())) 
print()

# using to_numpy() function 
print(gfg.to_numpy()) 

# using to_numpy() 
print(type(gfg.to_numpy())) 

   Weight
0   180.0
1   235.0
3   185.0
6   235.0
7   238.0
<class 'numpy.ndarray'>

[[180.]
 [235.]
 [185.]
 [235.]
 [238.]]
<class 'numpy.ndarray'>


In [18]:
#We can also specify dtypes while conversion

# importing pandas 
import pandas as pd  
   
# read csv file   
data = pd.read_csv("nba.csv")  
      
data.dropna(inplace = True) 
   
# creating DataFrame form weight column 
gfg = pd.DataFrame(data['Weight'].head()) 

# using to_numpy() 
print(type(gfg.to_numpy())) 

# providing dtype 
print(gfg.to_numpy(dtype ='float32')) 

# using to_numpy() 
print(type(gfg.to_numpy())) 

<class 'numpy.ndarray'>
[[180.]
 [235.]
 [185.]
 [235.]
 [238.]]
<class 'numpy.ndarray'>


#### Pandas Series.to_numpy()

Pandas Series.to_numpy() function is used to return a NumPy ndarray representing the values in given Series or Index.

This function will explain how we can convert the pandas Series to numpy Array. Although it’s very simple, but the concept behind this technique is very unique. Because we know the Series having index in the output. Whereas in numpy arrays we only have elements in the numpy arrays.

    Syntax: Series.to_numpy()

    Parameters:
    dtype: Data type which we are passing like str.
    copy : [bool, default False] Ensures that the returned value is a not a view on another array. 

Changing the Series into numpy array by using a method Series.to_numpy(). Always remember that when dealing with lot of data you should clean the data first to get the high accuracy. Although in this code we use the first five values of Weight column by using .head() method.

In [20]:
# importing pandas 
import pandas as pd  
  
# reading the csv   
data = pd.read_csv("nba.csv")  
     
data.dropna(inplace = True) 
  
# creating series form weight column 
gfg = pd.Series(data['Weight'].head()) 
  
print(gfg.to_numpy())

# using to_numpy() function 
print(type(gfg.to_numpy())) 


[180. 235. 185. 235. 238.]
<class 'numpy.ndarray'>


#### Pandas Series.as_matrix()

Pandas Series.as_matrix() function is used to convert the given series or dataframe object to Numpy-array representation.

    Syntax: Series.as_matrix(columns=None)


    Parameter :
    columns : If None, return all columns, otherwise, returns specified columns.

    Returns : values : ndarray


In [22]:

# importing pandas as pd 
import pandas as pd 
  
# Creating the Series 
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio']) 
  
# Create the Index 
index_ = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5']  
  
# set the index 
sr.index = index_ 
  
# Print the series 
print(sr) 


# return numpy array representation 
result = sr.as_matrix() 
  
# Print the result 
print(result) 


City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5         Rio
dtype: object
['New York' 'Chicago' 'Toronto' 'Lisbon' 'Rio']




#### 4. Selection of Data

#### Column Selection:
In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [23]:

# Import pandas package 
import pandas as pd 
  
# Define a dictionary containing employee data 
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data) 
  
# select two columns 
print(df[['Name', 'Qualification']]) 


     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd


#### Column Addition:
In Order to add a column in Pandas DataFrame, we can declare a new list as a column and add to a existing Dataframe.

In [24]:
# Import pandas package  
import pandas as pd 
  
# Define a dictionary containing Students data 
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Height': [5.1, 6.2, 5.1, 5.2], 
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']} 
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data) 
  
# Declare a list that is to be converted into a column 
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna'] 
  
# Using 'Address' as the column name 
# and equating it to the list 
df['Address'] = address 
  
# Observe the result 
print(df) 


     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna


#### Column Deletion:
In Order to delete a column in Pandas DataFrame, we can use the drop() method. Columns is deleted by dropping columns with column names.

In [28]:

# importing pandas module 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name" ) 
  
print(data.head())

# dropping passed columns 
data.drop(["Team", "Weight"], axis = 1, inplace = True) 

print(data.head())

                         Team  Number Position   Age Height  Weight  \
Name                                                                  
Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
Jae Crowder    Boston Celtics    99.0       SF  25.0    6-6   235.0   
John Holland   Boston Celtics    30.0       SG  27.0    6-5   205.0   
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   
Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

                         College     Salary  
Name                                         
Avery Bradley              Texas  7730337.0  
Jae Crowder            Marquette  6796117.0  
John Holland   Boston University        NaN  
R.J. Hunter        Georgia State  1148640.0  
Jonas Jerebko                NaN  5000000.0  
               Number Position   Age Height            College     Salary
Name                                                                     
Avery Bradley     0.0       PG  

#### Row Selection:

Pandas provide a unique method to retrieve rows from a Data frame.DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

In [29]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving row by loc method 
first = data.loc["Avery Bradley"] 
second = data.loc["R.J. Hunter"] 
  
print(first, "\n\n\n", second) 


Team        Boston Celtics
Number                   0
Position                PG
Age                     25
Height                 6-2
Weight                 180
College              Texas
Salary         7.73034e+06
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object


#### Row Addition:
In Order to add a Row in Pandas DataFrame, we can concat the old dataframe with new one.

In [30]:

# importing pandas module  
import pandas as pd  
    
# making data frame  
df = pd.read_csv("nba.csv", index_col ="Name")  
  
df.head(10) 
  
new_row = pd.DataFrame({'Name':'Geeks', 'Team':'Boston', 'Number':3, 
                        'Position':'PG', 'Age':33, 'Height':'6-2', 
                        'Weight':189, 'College':'MIT', 'Salary':99999}, 
                                                            index =[0]) 
# simply concatenate both dataframes 
df = pd.concat([new_row, df]).reset_index(drop = True) 
df.head(5) 


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Unnamed: 0,Age,College,Height,Name,Number,Position,Salary,Team,Weight
0,33.0,MIT,6-2,Geeks,3.0,PG,99999.0,Boston,189.0
1,25.0,Texas,6-2,,0.0,PG,7730337.0,Boston Celtics,180.0
2,25.0,Marquette,6-6,,99.0,SF,6796117.0,Boston Celtics,235.0
3,27.0,Boston University,6-5,,30.0,SG,,Boston Celtics,205.0
4,22.0,Georgia State,6-5,,28.0,SG,1148640.0,Boston Celtics,185.0


#### Row Deletion:
In Order to delete a row in Pandas DataFrame, we can use the drop() method. Rows is deleted by dropping Rows by index label.

In [33]:

# importing pandas module 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name" ) 

print("Before dropping")
print(data.head(10))

print("After dropping")

# dropping passed values 
data.drop(["Avery Bradley", "John Holland", "R.J. Hunter", 
                            "R.J. Hunter"], inplace = True) 
  
# display 
data.head(10) 


Before dropping
                         Team  Number Position   Age Height  Weight  \
Name                                                                  
Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
Jae Crowder    Boston Celtics    99.0       SF  25.0    6-6   235.0   
John Holland   Boston Celtics    30.0       SG  27.0    6-5   205.0   
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   
Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   
Amir Johnson   Boston Celtics    90.0       PF  29.0    6-9   240.0   
Jordan Mickey  Boston Celtics    55.0       PF  21.0    6-8   235.0   
Kelly Olynyk   Boston Celtics    41.0        C  25.0    7-0   238.0   
Terry Rozier   Boston Celtics    12.0       PG  22.0    6-2   190.0   
Marcus Smart   Boston Celtics    36.0       PG  22.0    6-4   220.0   

                         College      Salary  
Name                                          
Avery Bradley              Texas   77

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0
Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
Isaiah Thomas,Boston Celtics,4.0,PG,27.0,5-9,185.0,Washington,6912869.0
Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0


#### Pandas Extracting rows using .loc[]

Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.

    Syntax: pandas.DataFrame.loc[]


    Parameters:
    Index label: String or list of string of index label of rows

    Return type: Data frame or Series depending on parameters 

#### Extracting single Row

In this example, Name column is made as the index column and then two single rows are extracted one by one in the form of series using index label of rows.

In [34]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving row by loc method 
first = data.loc["Avery Bradley"] 
second = data.loc["R.J. Hunter"] 
  
print(first, "\n\n\n", second) 


Team        Boston Celtics
Number                   0
Position                PG
Age                     25
Height                 6-2
Weight                 180
College              Texas
Salary         7.73034e+06
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object


#### Extracting Multiple rows

In this example, Name column is made as the index column and then two single rows are extracted at the same time by passing a list as parameter.

In [35]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving rows by loc method 
rows = data.loc[["Avery Bradley", "R.J. Hunter"]] 
  
# checking data type of rows 
print(type(rows)) 
  
# display 
rows 


<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0


#### Extracting multiple rows with same index

In this example, Team name is made as the index column and one team name is passed to .loc method to check if all values with same team name have been returned or not.

In [36]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Team") 
  
# retrieving rows by loc method 
rows = data.loc["Utah Jazz"] 
  
# checking data type of rows 
print(type(rows)) 
  
# display 
rows 


<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Name,Number,Position,Age,Height,Weight,College,Salary
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Utah Jazz,Trevor Booker,33.0,PF,28.0,6-8,228.0,Clemson,4775000.0
Utah Jazz,Trey Burke,3.0,PG,23.0,6-1,191.0,Michigan,2658240.0
Utah Jazz,Alec Burks,10.0,SG,24.0,6-6,214.0,Colorado,9463484.0
Utah Jazz,Dante Exum,11.0,PG,20.0,6-6,190.0,,3777720.0
Utah Jazz,Derrick Favors,15.0,PF,24.0,6-10,265.0,Georgia Tech,12000000.0
Utah Jazz,Rudy Gobert,27.0,C,23.0,7-1,245.0,,1175880.0
Utah Jazz,Gordon Hayward,20.0,SF,26.0,6-8,226.0,Butler,15409570.0
Utah Jazz,Rodney Hood,5.0,SG,23.0,6-8,206.0,Duke,1348440.0
Utah Jazz,Joe Ingles,2.0,SF,28.0,6-8,226.0,,2050000.0
Utah Jazz,Chris Johnson,23.0,SF,26.0,6-6,206.0,Dayton,981348.0


In [39]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="College") 

rows = data.loc["Texas"] 
  
# checking data type of rows 
print(type(rows)) 
  
# display 
rows 

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Name,Team,Number,Position,Age,Height,Weight,Salary
College,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Texas,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,7730337.0
Texas,Cory Joseph,Toronto Raptors,6.0,PG,24.0,6-3,190.0,7000000.0
Texas,P.J. Tucker,Phoenix Suns,17.0,SF,31.0,6-6,245.0,5500000.0
Texas,Tristan Thompson,Cleveland Cavaliers,13.0,C,25.0,6-9,238.0,14260870.0
Texas,Myles Turner,Indiana Pacers,33.0,PF,20.0,6-11,243.0,2357760.0
Texas,Jordan Hamilton,New Orleans Pelicans,25.0,SG,25.0,6-7,220.0,1015421.0
Texas,LaMarcus Aldridge,San Antonio Spurs,12.0,PF,30.0,6-11,240.0,19689000.0
Texas,D.J. Augustin,Denver Nuggets,12.0,PG,28.0,6-0,183.0,3000000.0
Texas,Kevin Durant,Oklahoma City Thunder,35.0,SF,27.0,6-9,240.0,20158622.0


In [42]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Age") 

rows = data.loc[25] 
  
# checking data type of rows 
print(type(rows)) 
  
# display 
rows 

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Name,Team,Number,Position,Height,Weight,College,Salary
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
25.0,Avery Bradley,Boston Celtics,0.0,PG,6-2,180.0,Texas,7730337.0
25.0,Jae Crowder,Boston Celtics,99.0,SF,6-6,235.0,Marquette,6796117.0
25.0,Kelly Olynyk,Boston Celtics,41.0,C,7-0,238.0,Gonzaga,2165160.0
25.0,Thomas Robinson,Brooklyn Nets,41.0,PF,6-10,237.0,Kansas,981348.0
25.0,Cleanthony Early,New York Knicks,11.0,SF,6-8,210.0,Wichita State,845059.0
25.0,Derrick Williams,New York Knicks,23.0,PF,6-8,240.0,Arizona,4000000.0
25.0,Isaiah Canaan,Philadelphia 76ers,0.0,PG,6-0,201.0,Murray State,947276.0
25.0,Robert Covington,Philadelphia 76ers,33.0,SF,6-9,215.0,Tennessee State,1000000.0
25.0,Hollis Thompson,Philadelphia 76ers,31.0,SG,6-8,206.0,Georgetown,947276.0
25.0,Terrence Ross,Toronto Raptors,31.0,SF,6-7,195.0,Washington,3553917.0


#### Extracting rows between two index labels

In this example, two index label of rows are passed and all the rows that fall between those two index label have been returned (Both index labels Inclusive).

In [43]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving rows by loc method 
rows = data.loc["Avery Bradley":"Isaiah Thomas"] 
  
# checking data type of rows 
print(type(rows)) 
  
# display 
rows 


<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


#### Extracting rows using Pandas .iloc[]

Pandas provide a unique method to retrieve rows from a Data frame. Dataframe.iloc[] method is used when the index label of a data frame is something other than numeric series of 0, 1, 2, 3….n or in case the user doesn’t know the index label. Rows can be extracted using an imaginary index position which isn’t visible in the data frame.

    Syntax: pandas.DataFrame.iloc[]


    Parameters:
    Index Position: Index position of rows in integer or list of integer.

    Return type: Data frame or Series depending on parameters 

#### Extracting single row and comparing with .loc[]

In this example, same index number row is extracted by both .iloc[] and.loc[] method and compared. Since the index column by default is numeric, hence the index label will also be integers.

In [46]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file  
data = pd.read_csv("nba.csv") 
  
# retrieving rows by loc method  
row1 = data.loc[3] 
  
# retrieving rows by iloc method 
row2 = data.iloc[3] 

print(row1)
print()
print(row2)
# checking if values are equal 
row1 == row2 


Name           R.J. Hunter
Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: 3, dtype: object

Name           R.J. Hunter
Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: 3, dtype: object


Name        True
Team        True
Number      True
Position    True
Age         True
Height      True
Weight      True
College     True
Salary      True
Name: 3, dtype: bool

#### Extracting multiple rows with index

In this example, multiple rows are extracted first by passing a list and then by passing integers to extract rows between that range. After that, both the values are compared

In [48]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file  
data = pd.read_csv("nba.csv") 
  
# retrieving rows by loc method  
row1 = data.iloc[[4, 5, 6, 7]] 

print(row1)

# retrieving rows by loc method  
row2 = data.iloc[4:8] 

print(row2)

# comparing values 
row1 == row2 

            Name            Team  Number Position   Age Height  Weight  \
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   
5   Amir Johnson  Boston Celtics    90.0       PF  29.0    6-9   240.0   
6  Jordan Mickey  Boston Celtics    55.0       PF  21.0    6-8   235.0   
7   Kelly Olynyk  Boston Celtics    41.0        C  25.0    7-0   238.0   

   College      Salary  
4      NaN   5000000.0  
5      NaN  12000000.0  
6      LSU   1170960.0  
7  Gonzaga   2165160.0  
            Name            Team  Number Position   Age Height  Weight  \
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   
5   Amir Johnson  Boston Celtics    90.0       PF  29.0    6-9   240.0   
6  Jordan Mickey  Boston Celtics    55.0       PF  21.0    6-8   235.0   
7   Kelly Olynyk  Boston Celtics    41.0        C  25.0    7-0   238.0   

   College      Salary  
4      NaN   5000000.0  
5      NaN  12000000.0  
6      LSU   1170960.0  
7  Gonzaga   2165160.0  


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
4,True,True,True,True,True,True,True,False,True
5,True,True,True,True,True,True,True,False,True
6,True,True,True,True,True,True,True,True,True
7,True,True,True,True,True,True,True,True,True


#### Pandas Indexing using [ ], .loc[], .iloc[ ], .ix[ ]


    Dataframe.[ ] ; This function also known as indexing operator
    Dataframe.loc[ ] : This function is used for labels.
    Dataframe.iloc[ ] : This function is used for positions or integer based
    Dataframe.ix[] : This function is used for both label and integer based

Collectively, they are called the indexers

In [49]:
# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving columns by indexing operator 
first = data["Age"] 
  
print(first) 


Name
Avery Bradley              25.0
Jae Crowder                25.0
John Holland               27.0
R.J. Hunter                22.0
Jonas Jerebko              29.0
Amir Johnson               29.0
Jordan Mickey              21.0
Kelly Olynyk               25.0
Terry Rozier               22.0
Marcus Smart               22.0
Jared Sullinger            24.0
Isaiah Thomas              27.0
Evan Turner                27.0
James Young                20.0
Tyler Zeller               26.0
Bojan Bogdanovic           27.0
Markel Brown               24.0
Wayne Ellington            28.0
Rondae Hollis-Jefferson    21.0
Jarrett Jack               32.0
Sergey Karasev             22.0
Sean Kilpatrick            26.0
Shane Larkin               23.0
Brook Lopez                28.0
Chris McCullough           21.0
Willie Reed                26.0
Thomas Robinson            25.0
Henry Sims                 26.0
Donald Sloan               28.0
Thaddeus Young             27.0
                           ... 
Al-

In [50]:
# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving multiple columns by indexing operator 
first = data[["Age", "College", "Salary"]] 
 
first

Unnamed: 0_level_0,Age,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,Texas,7730337.0
Jae Crowder,25.0,Marquette,6796117.0
John Holland,27.0,Boston University,
R.J. Hunter,22.0,Georgia State,1148640.0
Jonas Jerebko,29.0,,5000000.0
Amir Johnson,29.0,,12000000.0
Jordan Mickey,21.0,LSU,1170960.0
Kelly Olynyk,25.0,Gonzaga,2165160.0
Terry Rozier,22.0,Louisville,1824360.0
Marcus Smart,22.0,Oklahoma State,3431040.0


In [51]:

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving row by loc method 
first = data.loc["Avery Bradley"] 
second = data.loc["R.J. Hunter"] 
  
print(first, "\n\n\n", second) 


Team        Boston Celtics
Number                   0
Position                PG
Age                     25
Height                 6-2
Weight                 180
College              Texas
Salary         7.73034e+06
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object


In [52]:
#Selecting two rows and three columns 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving two rows and three columns by loc method 
first = data.loc[["Avery Bradley", "R.J. Hunter"], 
                   ["Team", "Number", "Position"]] 
  
  
  
print(first) 

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
R.J. Hunter    Boston Celtics    28.0       SG


In [53]:
#Selecting all of the rows and some columns

import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
# retrieving all rows and some columns by loc method 
first = data.loc[:, ["Team", "Number", "Position"]] 
  
  
  
print(first) 


                                           Team  Number Position
Name                                                            
Avery Bradley                    Boston Celtics     0.0       PG
Jae Crowder                      Boston Celtics    99.0       SF
John Holland                     Boston Celtics    30.0       SG
R.J. Hunter                      Boston Celtics    28.0       SG
Jonas Jerebko                    Boston Celtics     8.0       PF
Amir Johnson                     Boston Celtics    90.0       PF
Jordan Mickey                    Boston Celtics    55.0       PF
Kelly Olynyk                     Boston Celtics    41.0        C
Terry Rozier                     Boston Celtics    12.0       PG
Marcus Smart                     Boston Celtics    36.0       PG
Jared Sullinger                  Boston Celtics     7.0        C
Isaiah Thomas                    Boston Celtics     4.0       PG
Evan Turner                      Boston Celtics    11.0       SG
James Young              

In [55]:

import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
#retrieving rows by iloc method  
row2 = data.iloc[3]  
print(row2) 


row2 = data.iloc[3:9]  
print(row2) 

row2 = data.iloc[19:25]  
print(row2) 

Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object
                         Team  Number Position   Age Height  Weight  \
Name                                                                  
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   
Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   
Amir Johnson   Boston Celtics    90.0       PF  29.0    6-9   240.0   
Jordan Mickey  Boston Celtics    55.0       PF  21.0    6-8   235.0   
Kelly Olynyk   Boston Celtics    41.0        C  25.0    7-0   238.0   
Terry Rozier   Boston Celtics    12.0       PG  22.0    6-2   190.0   

                     College      Salary  
Name                                      
R.J. Hunter    Georgia State   1148640.0  
Jonas Jerebko            NaN   5000000.0  
Amir Johns

In [56]:

import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
#retrieving multiple rows by iloc method  
row2 = data.iloc [[3, 5, 7]] 
  
row2 


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


In [57]:
#Selecting two rows and two columns 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
  
#retrieving two rows and two columns by iloc method  
row2 = data.iloc [[3, 4], [1, 2]] 
 
print(row2) 


               Number Position
Name                          
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF


In [58]:
#Selecting all the rows and a some columns

import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
   
#retrieving all rows and some columns by iloc method  
row2 = data.iloc [:, [1, 2]] 
   
print(row2) 


                         Number Position
Name                                    
Avery Bradley               0.0       PG
Jae Crowder                99.0       SF
John Holland               30.0       SG
R.J. Hunter                28.0       SG
Jonas Jerebko               8.0       PF
Amir Johnson               90.0       PF
Jordan Mickey              55.0       PF
Kelly Olynyk               41.0        C
Terry Rozier               12.0       PG
Marcus Smart               36.0       PG
Jared Sullinger             7.0        C
Isaiah Thomas               4.0       PG
Evan Turner                11.0       SG
James Young                13.0       SG
Tyler Zeller               44.0        C
Bojan Bogdanovic           44.0       SG
Markel Brown               22.0       SG
Wayne Ellington            21.0       SG
Rondae Hollis-Jefferson    24.0       SG
Jarrett Jack                2.0       PG
Sergey Karasev             10.0       SG
Sean Kilpatrick             6.0       SG
Shane Larkin    

#### Indexing a using Dataframe.ix[ ] :

In [59]:
#Selecting a single row using .ix[] as .loc[]

# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
   
# retrieving row by ix method 
first = data.ix["Avery Bradley"] 
  
   
   
print(first) 

Team        Boston Celtics
Number                   0
Position                PG
Age                     25
Height                 6-2
Weight                 180
College              Texas
Salary         7.73034e+06
Name: Avery Bradley, dtype: object


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  # Remove the CWD from sys.path while we load stuff.


In [60]:

# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
   
# retrieving row by ix method 
first = data.ix[1] 
  
   
   
print(first) 


Team        Boston Celtics
Number                  99
Position                SF
Age                     25
Height                 6-6
Weight                 235
College          Marquette
Salary         6.79612e+06
Name: Jae Crowder, dtype: object


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  


#### Methods for indexing in DataFrame

Function 	Description

Dataframe.head() 	Return top n rows of a data frame.

Dataframe.tail() 	Return bottom n rows of a data frame.

Dataframe.at[] 	Access a single value for a row/column label pair.

Dataframe.iat[] 	Access a single value for a row/column pair by integer position.

Dataframe.tail() 	Purely integer-location based indexing for selection by position.

DataFrame.lookup() 	Label-based “fancy indexing” function for DataFrame.

DataFrame.pop() 	Return item and drop from frame.

DataFrame.xs() 	Returns a cross-section (row(s) or column(s)) from the DataFrame.

DataFrame.get() 	Get item from object for given key (DataFrame column, Panel slice, etc.).

DataFrame.isin() 	Return boolean DataFrame showing whether each element in the DataFrame is contained in values.

DataFrame.where() 	Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

DataFrame.mask() 	Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other.

DataFrame.query() 	Query the columns of a frame with a boolean expression.

DataFrame.insert() 	Insert column into DataFrame at specified location.

#### Accessing a DataFrame with a boolean index :

In [61]:

# importing pandas as pd 
import pandas as pd 
   
# dictionary of lists 
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 
        'degree': ["MBA", "BCA", "M.Tech", "MBA"], 
        'score':[90, 40, 80, 98]} 
   
df = pd.DataFrame(dict, index = [True, False, True, False]) 
   
print(df) 


         name  degree  score
True   aparna     MBA     90
False  pankaj     BCA     40
True   sudhir  M.Tech     80
False   Geeku     MBA     98


In [62]:
# accessing a dataframe using .loc[] function  
print(df.loc[True]) 

        name  degree  score
True  aparna     MBA     90
True  sudhir  M.Tech     80


In [63]:
# accessing a dataframe using .iloc[] function  
print(df.iloc[True]) 

TypeError: Cannot index by location index with a non-integer key

In [64]:
# accessing a dataframe using .iloc[] function  
print(df.iloc[1]) 

name      pankaj
degree       BCA
score         40
dtype: object


In [65]:
# accessing a dataframe using .ix[] function 
print(df.ix[True]) 

        name  degree  score
True  aparna     MBA     90
True  sudhir  M.Tech     80


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  


In [66]:
# accessing a dataframe using .ix[] function 
print(df.ix[1]) 

name      pankaj
degree       BCA
score         40
dtype: object


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  


#### Applying a boolean mask to a dataframe :

In a dataframe we can apply a boolean mask in order to do that we, can use __getitems__ or [] accessor. We can apply a boolean mask by giving list of True and False of the same length as contain in a dataframe. When we apply a boolean mask it will print only that dataframe in which we pass a boolean value True

In [67]:

# importing pandas as pd 
import pandas as pd 
   
# dictionary of lists 
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 
        'degree': ["MBA", "BCA", "M.Tech", "MBA"], 
        'score':[90, 40, 80, 98]} 
   
df = pd.DataFrame(dict, index = [0, 1, 2, 3]) 
   
  
  
print(df[[True, False, True, False]]) 


     name  degree  score
0  aparna     MBA     90
2  sudhir  M.Tech     80


In [68]:

# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 
data = pd.read_csv("nba.csv") 
   
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6, 
                                 7, 8, 9, 10, 11, 12]) 
  
   
df[[True, False, True, False, True, 
    False, True, False, True, False, 
                True, False, True]] 


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
10,Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
12,Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0


#### Masking data based on column value :
In a dataframe we can filter a data based on a column value in order to filter data, we can apply certain condition on dataframe using different operator like ==, >, <, <=, >=. When we apply these operator on dataframe then it produce a Series of True and False

In [69]:

# importing pandas as pd 
import pandas as pd 
   
# dictionary of lists 
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 
        'degree': ["BCA", "BCA", "M.Tech", "BCA"], 
        'score':[90, 40, 80, 98]} 
  
# creating a dataframe  
df = pd.DataFrame(dict) 
   
# using a comparsion operator for filtering of data 
print(df['degree'] == 'BCA') 


0     True
1     True
2    False
3     True
Name: degree, dtype: bool


In [72]:

# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 
data = pd.read_csv("nba.csv", index_col ="Name") 
   
# using greater than operator for filtering of data 
print(data['Age'] > 25) 

data[data['Age'] > 25]


Name
Avery Bradley              False
Jae Crowder                False
John Holland                True
R.J. Hunter                False
Jonas Jerebko               True
Amir Johnson                True
Jordan Mickey              False
Kelly Olynyk               False
Terry Rozier               False
Marcus Smart               False
Jared Sullinger            False
Isaiah Thomas               True
Evan Turner                 True
James Young                False
Tyler Zeller                True
Bojan Bogdanovic            True
Markel Brown               False
Wayne Ellington             True
Rondae Hollis-Jefferson    False
Jarrett Jack                True
Sergey Karasev             False
Sean Kilpatrick             True
Shane Larkin               False
Brook Lopez                 True
Chris McCullough           False
Willie Reed                 True
Thomas Robinson            False
Henry Sims                  True
Donald Sloan                True
Thaddeus Young              True
     

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Isaiah Thomas,Boston Celtics,4.0,PG,27.0,5-9,185.0,Washington,6912869.0
Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0
Tyler Zeller,Boston Celtics,44.0,C,26.0,7-0,253.0,North Carolina,2616975.0
Bojan Bogdanovic,Brooklyn Nets,44.0,SG,27.0,6-8,216.0,,3425510.0
Wayne Ellington,Brooklyn Nets,21.0,SG,28.0,6-4,200.0,North Carolina,1500000.0
Jarrett Jack,Brooklyn Nets,2.0,PG,32.0,6-3,200.0,Georgia Tech,6300000.0
Sean Kilpatrick,Brooklyn Nets,6.0,SG,26.0,6-4,219.0,Cincinnati,134215.0


#### Masking data based on index value :

In a dataframe we can filter a data based on a column value in order to filter data, we can create a mask based on the index values using different operator like ==, >, <, etc

In [71]:

# importing pandas as pd 
import pandas as pd 
   
# dictionary of lists 
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 
        'degree': ["BCA", "BCA", "M.Tech", "BCA"], 
        'score':[90, 40, 80, 98]} 
   
df = pd.DataFrame(dict, index = [0, 1, 2, 3]) 
  
mask = df.index == 0
  
print(df[mask]) 


     name degree  score
0  aparna    BCA     90


In [74]:
# importing pandas package 
import pandas as pd 
   
# making data frame from csv file 
data = pd.read_csv("nba.csv") 
  
# giving a index to a dataframe 
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6, 
                                 7, 8, 9, 10, 11, 12]) 
  
# filtering data on index value 
mask = df.index > 7 
  
df[mask] 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0
10,Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
11,Isaiah Thomas,Boston Celtics,4.0,PG,27.0,5-9,185.0,Washington,6912869.0
12,Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0
