***Dealing with Rows and Columns in Pandas DataFrame***

(https://www.geeksforgeeks.org/dealing-with-rows-and-columns-in-pandas-dataframe/)

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

In [1]:
import pandas as pd
import numpy as np

In [2]:
data2 = {'Name':['Alex', 'Alina', 'Izra', 'Tona'],
        'Age':[27, 24, 22, 32],
        'Address':['Utha', 'Paris', 'London', 'Sydney'],
        'Qualification':['MS', 'MA', 'BS', 'Phd']}

In [3]:
df = pd.DataFrame(data2)

In [4]:
# select two columns
df[['Name','Age']]

Unnamed: 0,Name,Age
0,Alex,27
1,Alina,24
2,Izra,22
3,Tona,32


***Column Selection:***
    
In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [5]:
Field = ['Maths Major','Gene Therphy','Epidemiology','Astrophysic']
df['Field'] = Field

In [6]:
df

Unnamed: 0,Name,Age,Address,Qualification,Field
0,Alex,27,Utha,MS,Maths Major
1,Alina,24,Paris,MA,Gene Therphy
2,Izra,22,London,BS,Epidemiology
3,Tona,32,Sydney,Phd,Astrophysic


***Column Deletion:***
    
In Order to delete a column in Pandas DataFrame, we can use the drop() method. Columns is deleted by dropping columns with column names.

In [7]:
data3 = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv",index_col='Name')
data3

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...
Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [8]:
# dropping passed columns
data3.drop(['Weight','Team'],axis = 1,inplace=True)

In [9]:
data3

Unnamed: 0_level_0,Number,Position,Age,Height,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Avery Bradley,0.0,PG,25.0,6-2,Texas,7730337.0
Jae Crowder,99.0,SF,25.0,6-6,Marquette,6796117.0
John Holland,30.0,SG,27.0,6-5,Boston University,
R.J. Hunter,28.0,SG,22.0,6-5,Georgia State,1148640.0
Jonas Jerebko,8.0,PF,29.0,6-10,,5000000.0
...,...,...,...,...,...,...
Shelvin Mack,8.0,PG,26.0,6-3,Butler,2433333.0
Raul Neto,25.0,PG,24.0,6-1,,900000.0
Tibor Pleiss,21.0,C,26.0,7-3,,2900000.0
Jeff Withey,24.0,C,26.0,7-0,Kansas,947276.0


***Dealing with Rows:***
    
In order to deal with rows, we can perform basic operations on rows like selecting, deleting, adding and renaming.

(https://www.geeksforgeeks.org/dealing-with-rows-and-columns-in-pandas-dataframe/)

In [10]:
first = data3.loc['Avery Bradley']
second = data3.loc['R.J. Hunter']
print(first,'\n\n',second)

Number            0.0
Position           PG
Age              25.0
Height            6-2
College         Texas
Salary      7730337.0
Name: Avery Bradley, dtype: object 

 Number               28.0
Position               SG
Age                  22.0
Height                6-5
College     Georgia State
Salary          1148640.0
Name: R.J. Hunter, dtype: object


***Row Addition:***

In Order to add a Row in Pandas DataFrame, we can concat the old dataframe with new one.

In [14]:
new_row = pd.DataFrame({'Name':'Tiger James', 
                          'Team':'Boston', 
                          'Number':3,
                        'Position':'PG', 
                        'Age':33, 
                        'Height':'6-2',
                        'Weight':189, 
                        'College':'MIT',
                        'Salary':99999},index =[0])


In [15]:
 data4 = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv")

In [16]:
data4 =  pd.concat([new_row,data4]).reset_index(drop=True)

In [18]:
data4.head(5)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Tiger James,Boston,3.0,PG,33.0,6-2,189.0,MIT,99999.0
1,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
2,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
3,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
4,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0


***Row Deletion:***
    
In Order to delete a row in Pandas DataFrame, we can use the drop() method. Rows is deleted by dropping Rows by index label.

In [20]:
data5 = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv",index_col='Name')
data5

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...
Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [21]:
data5.drop(['Avery Bradley','John Holland','R.J. Hunter'],inplace=True)

In [22]:
data5

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
...,...,...,...,...,...,...,...,...
Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


***How to select multiple columns in a pandas dataframe***

In [23]:
data6 = {'Name':['Alex', 'Alina', 'Izra', 'Tona'],
        'Age':[27, 24, 22, 32],
        'Address':['Utha', 'Paris', 'London', 'Sydney'],
        'Qualification':['MS', 'MA', 'BS', 'Phd']}

In [24]:
data6_df = pd.DataFrame(data6) 
data6_df

Unnamed: 0,Name,Age,Address,Qualification
0,Alex,27,Utha,MS
1,Alina,24,Paris,MA
2,Izra,22,London,BS
3,Tona,32,Sydney,Phd


In [25]:
field = ['Astophysics','Physics','Cyber security','Epidemiology']

In [26]:
data6_df['Field'] = field

In [27]:
data6_df

Unnamed: 0,Name,Age,Address,Qualification,Field
0,Alex,27,Utha,MS,Astophysics
1,Alina,24,Paris,MA,Physics
2,Izra,22,London,BS,Cyber security
3,Tona,32,Sydney,Phd,Epidemiology


In [28]:
data6_df[['Name',"Field"]]

Unnamed: 0,Name,Field
0,Alex,Astophysics
1,Alina,Physics
2,Izra,Cyber security
3,Tona,Epidemiology


In [29]:
#Select Second to fourth column.
data6_df[data6_df.columns[1:4]]

Unnamed: 0,Age,Address,Qualification
0,27,Utha,MS
1,24,Paris,MA
2,22,London,BS
3,32,Sydney,Phd


In [30]:
data6_df.loc[0:2,'Name':'Age']

Unnamed: 0,Name,Age
0,Alex,27
1,Alina,24
2,Izra,22


In [31]:
# First filtering rows and selecting columns by label format and then Select all columns.
data6_df.loc[0,:]

Name                    Alex
Age                       27
Address                 Utha
Qualification             MS
Field            Astophysics
Name: 0, dtype: object

***Method #3: Using iloc[]***

(https://www.geeksforgeeks.org/how-to-select-multiple-columns-in-a-pandas-dataframe/)

In [32]:
data7 = {'Name':['Alex', 'Alina', 'Izra', 'Tona'],
        'Age':[27, 24, 22, 32],
        'Address':['Utha', 'Paris', 'London', 'Sydney'],
        'Qualification':['MS', 'MA', 'BS', 'Phd']}
data7_df = pd.DataFrame(data7)

In [33]:
# Remember that Python does not
# slice inclusive of the ending index.
# select all rows 
# select first two column
data7_df.iloc[:,0:2]

Unnamed: 0,Name,Age
0,Alex,27
1,Alina,24
2,Izra,22
3,Tona,32


#Select all or some columns, one to another using .iloc.
# iloc[row slicing, column slicing]
data7_df.iloc[0:3,1:5]

In [35]:
x = data7_df.iloc[0:2,:]
x

Unnamed: 0,Name,Age,Address,Qualification
0,Alex,27,Utha,MS
1,Alina,24,Paris,MA


(https://www.geeksforgeeks.org/python-pandas-extracting-rows-using-loc/)

In [36]:
data_loc = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv")

In [37]:
rows = data_loc.loc["Avery Bradley":"Isaiah Thomas"]
print(type(rows))

<class 'pandas.core.frame.DataFrame'>


In [38]:
rows

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary


 ***Extracting rows using Pandas .iloc[]***

In [39]:
row1 = data_loc.loc[3]
row2 = data_loc.iloc[3]

In [40]:
row1 == row2

Name        True
Team        True
Number      True
Position    True
Age         True
Height      True
Weight      True
College     True
Salary      True
Name: 3, dtype: bool

In [42]:
row3 = data_loc.iloc[4:8]
row3

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


In [44]:
row4 = data_loc.iloc[[4,5,6,7]]
row4

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


(https://www.geeksforgeeks.org/indexing-and-selecting-data-with-pandas/)

***Indexing and Selecting Data with Pandas***

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

In [45]:
ind = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv",index_col='Name')
ind1= pd.DataFrame(ind)

In [47]:
ind1.head()

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [48]:
first = ind1['Age']
first

Name
Avery Bradley    25.0
Jae Crowder      25.0
John Holland     27.0
R.J. Hunter      22.0
Jonas Jerebko    29.0
                 ... 
Shelvin Mack     26.0
Raul Neto        24.0
Tibor Pleiss     26.0
Jeff Withey      26.0
NaN               NaN
Name: Age, Length: 458, dtype: float64

In [49]:
mu = ind1[['Age','Team','Salary']]
mu

Unnamed: 0_level_0,Age,Team,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,Boston Celtics,7730337.0
Jae Crowder,25.0,Boston Celtics,6796117.0
John Holland,27.0,Boston Celtics,
R.J. Hunter,22.0,Boston Celtics,1148640.0
Jonas Jerebko,29.0,Boston Celtics,5000000.0
...,...,...,...
Shelvin Mack,26.0,Utah Jazz,2433333.0
Raul Neto,24.0,Utah Jazz,900000.0
Tibor Pleiss,26.0,Utah Jazz,2900000.0
Jeff Withey,26.0,Utah Jazz,947276.0


***Indexing a DataFrame using .loc[ ] :***
    
This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.

In [50]:
a = ind1.loc["Avery Bradley"]
b= ind1.loc["R.J. Hunter"]

In [51]:
print(a,'\n\n\n',b)

Team        Boston Celtics
Number                 0.0
Position                PG
Age                   25.0
Height                 6-2
Weight               180.0
College              Texas
Salary           7730337.0
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


In [52]:
first = ind1[['Age','Salary','Number']]
first

Unnamed: 0_level_0,Age,Salary,Number
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,7730337.0,0.0
Jae Crowder,25.0,6796117.0,99.0
John Holland,27.0,,30.0
R.J. Hunter,22.0,1148640.0,28.0
Jonas Jerebko,29.0,5000000.0,8.0
...,...,...,...
Shelvin Mack,26.0,2433333.0,8.0
Raul Neto,24.0,900000.0,25.0
Tibor Pleiss,26.0,2900000.0,21.0
Jeff Withey,26.0,947276.0,24.0


In [55]:
first1 = ind1.loc[['John Holland','Jeff Withey']]
first1

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [56]:
#Selecting two rows and three columns
row_col = ind1.loc[['John Holland','Jeff Withey'],['Team','Age','Number']]
row_col

Unnamed: 0_level_0,Team,Age,Number
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
John Holland,Boston Celtics,27.0,30.0
Jeff Withey,Utah Jazz,26.0,24.0


In [57]:
#Selecting all of the rows and some columns
#Dataframe.loc[:, ["column1", "column2", "column3"]]
row_col1 = ind1.loc[:,['Team','Number','Position']]
row_col1

Unnamed: 0_level_0,Team,Number,Position
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,Boston Celtics,0.0,PG
Jae Crowder,Boston Celtics,99.0,SF
John Holland,Boston Celtics,30.0,SG
R.J. Hunter,Boston Celtics,28.0,SG
Jonas Jerebko,Boston Celtics,8.0,PF
...,...,...,...
Shelvin Mack,Utah Jazz,8.0,PG
Raul Neto,Utah Jazz,25.0,PG
Tibor Pleiss,Utah Jazz,21.0,C
Jeff Withey,Utah Jazz,24.0,C


***Indexing a DataFrame using .iloc[ ] :***

This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In [58]:
c = ind1.iloc[3]
c

Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object

In [59]:
#n order to select multiple rows, we can pass a list of integer to .iloc[] function.
mu = ind1.iloc[[3,5,7]]
mu

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


In [60]:
#Selecting two rows and two columns
mu2 = ind1.iloc[[3,5],[1,2]]
mu2

Unnamed: 0_level_0,Number,Position
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
R.J. Hunter,28.0,SG
Amir Johnson,90.0,PF


In [83]:
#Selecting all the rows and a some columns
mu3 = ind1.iloc[:,[2,3]]
mu3

Unnamed: 0_level_0,Position,Age
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Avery Bradley,PG,25.0
Jae Crowder,SF,25.0
John Holland,SG,27.0
R.J. Hunter,SG,22.0
Jonas Jerebko,PF,29.0
...,...,...
Shelvin Mack,PG,26.0
Raul Neto,PG,24.0
Tibor Pleiss,C,26.0
Jeff Withey,C,26.0


***Boolean Indexing in Pandas***

(https://www.geeksforgeeks.org/boolean-indexing-in-pandas/)

In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data. 

Boolean indexing is a type of indexing that uses actual values of the data in the DataFrame. In boolean indexing, we can filter a data in four ways:

Accessing a DataFrame with a boolean index<br>
Applying a boolean mask to a dataframe<br>
Masking data based on column value<br>
Masking data based on an index value<br>

Accessing a DataFrame with a boolean index: 
    
In order to access a dataframe with a boolean index, we have to create a dataframe in which the index of dataframe contains a boolean value that is “True” or “False”.

In [84]:
name = {'name':["Kenny", "Ather", "Jenny", "Lisa"],
        'degree': ["MBA", "BS", "MS", "MBA"],
        'score':[8.9, 9.3, 8.4, 9.4]}
df = pd.DataFrame(name,index=[True,False,True,False])
df

Unnamed: 0,name,degree,score
True,Kenny,MBA,8.9
False,Ather,BS,9.3
True,Jenny,MS,8.4
False,Lisa,MBA,9.4


In [95]:
#Accessing a dataframe using .loc[] function
df.loc[True]

Unnamed: 0,name,degree,score
True,Kenny,MBA,8.9
True,Jenny,MS,8.4


In [96]:
#Accessing a Dataframe with a boolean index using .iloc[]
df.iloc[True] #Type Error

TypeError: Cannot index by location index with a non-integer key

In [97]:
df.iloc[1]

name      Ather
degree       BS
score       9.3
Name: False, dtype: object

Applying a boolean mask to a dataframe : 
    
In a dataframe, we can apply a boolean mask. In order to do that we can use __getitems__ or [] accessor. We can apply a boolean mask by giving a list of True and False of the same length as contain in a dataframe. When we apply a boolean mask it will print only that dataframe in which we pass a boolean value True.

In [98]:
dict1 = {'name':["Kenny", "Ather", "Jenny", "Lisa"],
        'degree': ["MBA", "BS", "MS", "MBA"],
        'score':[8.9, 9.3, 8.4, 9.4]}

In [99]:
df1 = pd.DataFrame(dict1,index=[0,1,2,3])
df1


Unnamed: 0,name,degree,score
0,Kenny,MBA,8.9
1,Ather,BS,9.3
2,Jenny,MS,8.4
3,Lisa,MBA,9.4


In [100]:
print(df[[True,False,True,False]])

       name degree  score
True  Kenny    MBA    8.9
True  Jenny     MS    8.4


In [101]:
nba = pd.read_csv(r"c:/Users/srini/OneDrive/kaggle/nba.csv")

In [102]:
df2 = pd.DataFrame(nba,index=[0,1,2,3,4,5,6,7,8,9,10,11,12])
df2

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


In [103]:
df2[[True,False,True,False,True,False,True,False,True,False,True,False,True]]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
10,Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
12,Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0


Masking data based on column value: 

In a dataframe we can filter a data based on a column value.  In order to filter data, we can apply certain conditions on the dataframe using different operators like ==, >, <, <=, >=. When we apply these operators to the dataframe then it produces a Series of True and False. To download the “nba.csv” CSV, click here.

In [104]:
dict2 = {'name':["Kenny", "Ather", "Jenny", "Lisa"],
        'degree': ["MBA", "BS", "MS", "MBA"],
        'score':[8.9, 9.3, 8.4, 9.4]}

In [105]:
df3 = pd.DataFrame(dict2)

In [106]:
print(df3['name']=='Lisa')

0    False
1    False
2    False
3     True
Name: name, dtype: bool


In [107]:
df3['score'] <= 8.9

0     True
1    False
2     True
3    False
Name: score, dtype: bool

Masking data based on index value : 

In a dataframe we can filter a data based on a column value. In order to filter data, we can create a mask based on the index values using different operators like ==, >, <, etc… . 

In [108]:
dict3 = {'name':["Kenny", "Ather", "Jenny", "Lisa"],
        'degree': ["MBA", "BS", "MS", "MBA"],
        'score':[8.9, 9.3, 8.4, 9.4]}
df4 = pd.DataFrame(dict3,index=[0,1,2,3])

In [109]:
mask = df4.index == 0
df4[mask]

Unnamed: 0,name,degree,score
0,Kenny,MBA,8.9


In [110]:
mask1 = df2.index > 7
df2[mask1]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0
10,Jared Sullinger,Boston Celtics,7.0,C,24.0,6-9,260.0,Ohio State,2569260.0
11,Isaiah Thomas,Boston Celtics,4.0,PG,27.0,5-9,185.0,Washington,6912869.0
12,Evan Turner,Boston Celtics,11.0,SG,27.0,6-7,220.0,Ohio State,3425510.0
