---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.13 (Pandas-05)</h1>

### _Indexing, Subsetting and Slicing Dataframes.ipynb_

## Motivation:
- The ability to select specific rows and columns to access and filter data based on specific conditions are two of the key features of Pandas.
    - **Selection** allows you to access specific rows or columns (a subset) of the data by their index and/or location in the DataFrame
        - In large datasets, you may be required to select the first/last N records
        - In large datasets, you may be required to select a range (n to m) of records
        - In large datasets, you may be required to select specific columns of your interest
        - In large datasets, you may be required to select specific range and specific columns of your interest
    - **Filtering** allows you to access specific rows or columns (a subset) of the data based on one or more conditions
        - In a medical dataset, you may be required to filter record of all those patients who suffer with a specific disease, or who have a specific blood group
        - In a medical dataset, you may be required to filter pregnant women who have anemia, and compare this subset to women who don’t have anemia.
        - In a travel dataset, you may be required to filter hotels inside Lahore city, sorted by their minimum per day cost
        - In a client dataset, you may be required filter the clients who use a Gmail account(may require a string filter)
        - In a client dataset, you may be required to filter the clients who belong to a specific countries (may require use of .isin() function)

## Learning agenda of this notebook
1. Understanding Indices of a Dataframe
    - Understand the Dataset
    - Changing the Column Indices of a Dataframe
    - Changing the Row Indices of a Dataframe
2. Selecting Row(s) and Column(s) of a Dataframe using `df[]` 
3. Selecting Rows and Columns using `iloc` Method
4. Selecting Rows and Columns using `loc` Method
5. Conditional Selection   
6. Selecting columns of a specific data type


## 1. Understanding Indices of a Dataframe

<img align="right" width="300" height="300"  src="images/series-anatomy.png"  >
<img align="left" width="500" height="500"  src="images/pandas.png"  >

###  a. Understand the Dataset
- Let us first understand the dataframe on which we are going to work in today's notebook

In [2]:
import numpy as np
import pandas as pd

df = pd.read_csv('datasets/groupdata.csv')
df.head()

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


In [3]:
df.shape

(16, 10)

In [2]:

import pandas as pd
import numpy as np
df = pd.read_csv('datasets/groupdatawithoutcollables.csv')
df.head()

Unnamed: 0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000
0,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
1,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
2,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
3,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
4,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,


In [3]:
# To read such files, you have to pass the parameter `header=None` to the `read_csv()` method
df = pd.read_csv('datasets/groupdatawithoutcollables.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0


>Let us suppose we have above dataframe, in which the column indices are just integer values associated with the position of every column. We want to assign some meaningful names to the columns for better understanding. There are many options or ways to do that.


In [4]:
df.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0


**Changing Column IndicesLabels:** Assign a list of column labels to the `columns` attribute of dataframe

In [8]:
col_names = ['reg no', 'aame', 'age', 'address', 'session', 'group', 'gender', 'math', 'english', 'scholarship']
df.columns = col_names
df.head(2)

Unnamed: 0,reg no,aame,age,address,session,group,gender,math,english,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


>Note that in the above dataframe, first column name has a space, which is a bit difficult to use sometimes, so if you want to change value of a specific column label, you can use the `df.rename()` method

In [10]:
# You pass a dictionary object to the columns argument to rename() method
# The key is the old column name, while the value is the new column name
df1 = df.rename(columns={'age': 'ages'}, inplace=False)
# df1 = df.rename(columns={'age' : 'ages'}, inplace=False)
df1.head(2)

Unnamed: 0,reg no,aame,ages,address,session,group,gender,math,english,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [11]:
df.head(1)

Unnamed: 0,reg no,aame,age,address,session,group,gender,math,english,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0


>Last but not the least, another way is to assign appropriate column labels to your dataframes by passing a list of column names to the `names` argument of the `df.read_csv()` method. Do it at your own :)

### c. Changing the Row Indices/Labels of a  Dataframe
- Every dataframe has row index associated with its rows
- These by default are integer values from 0,1,2,3...
- However, while creating a dataframe from scratch you may set them to some meaningful string values (seldom required).
- We have already seen this in our previous session
- Today, we will see two methods that work on row indices of a Pandas Dataframe named `df.set_index()` and `df.reset_index()`

In [13]:
df.head(2)

Unnamed: 0,reg no,aame,age,address,session,group,gender,math,english,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [14]:
df.index

RangeIndex(start=0, stop=16, step=1)

In [16]:
df = pd.read_csv('datasets/groupdata.csv')
df.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [18]:
df1 = df.set_index(keys='name', drop=False)
df1.head(2)

Unnamed: 0_level_0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Rauf,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
Arif,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [21]:
df1.index

Index(['Rauf', 'Arif', 'Shaista', 'Hadeed', 'Zara', 'Mohid', 'Zobia', 'Idrees',
       'Jamil', 'Shahid', 'Khurram', 'Maaz', 'Mujahid', 'Sara', 'Fatima',
       'Kakamanna'],
      dtype='object', name='name')

In [22]:
df.set_index(keys='roll no', drop=True, inplace=True)
df.head(1)

Unnamed: 0_level_0,name,age,address,session,group,gender,subj1,subj2,scholarship
roll no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0


In [23]:
df2 = df.reset_index()
df2.index

RangeIndex(start=0, stop=16, step=1)

In [24]:
df2.head(2)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [26]:
df3 = df2.reset_index()
df3.head(2)

Unnamed: 0,index,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [28]:
df4 = df3.reset_index()
df4.head(2)

Unnamed: 0,level_0,index,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,0,0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,1,1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [30]:
df = pd.read_csv('datasets/groupdata.csv')
df_sorted = df.sort_values('age')
df_sorted.head(2)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0


In [31]:
df_sorted[1:2]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0


In [32]:
s = df_sorted['name']
print(s)

5         Mohid
12      Mujahid
3        Hadeed
11         Maaz
13         Sara
14       Fatima
2       Shaista
10      Khurram
9        Shahid
4          Zara
6         Zobia
15    Kakamanna
1          Arif
7        Idrees
0          Rauf
8         Jamil
Name: name, dtype: object


In [33]:
age_sorted = df_sorted['age']
print(age_sorted)

5     16
12    18
3     20
11    25
13    28
14    33
2     35
10    35
9     38
4     40
6     40
15    42
1     51
7     51
0     52
8     53
Name: age, dtype: int64


In [34]:
s.head(2)

5       Mohid
12    Mujahid
Name: name, dtype: object

In [35]:
df['name'].head(3)

0       Rauf
1       Arif
2    Shaista
Name: name, dtype: object

In [36]:
d1 = df_sorted[['roll no', 'gender', 'age']]
d1

Unnamed: 0,roll no,gender,age
5,MS06,Female,16
12,MS13,Male,18
3,MS04,Male,20
11,MS12,Male,25
13,MS14,Female,28
14,MS15,Female,33
2,MS03,Female,35
10,MS11,Male,35
9,MS10,Male,38
4,MS05,Female,40


In [37]:
df_sorted[1:2]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0


In [41]:
df_sorted[1:3:1]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0


In [42]:
df_sorted[1:14:2]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,
14,MS15,Fatima,33,Sialkot,AFT,group C,Female,90.5,81.3,3500.0
10,MS11,Khurram,35,Islamabad,MOR,group B,Male,90.5,81.3,6000.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0


In [43]:
df_sorted[::5]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
14,MS15,Fatima,33,Sialkot,AFT,group C,Female,90.5,81.3,3500.0
6,MS07,Zobia,40,Sialkot,AFT,group B,Female,90.2,,4000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0


In [44]:
s = df_sorted.iloc[2,:]

In [45]:
s

roll no           MS04
name            Hadeed
age                 20
address         Lahore
session            MOR
group          group A
gender            Male
subj1             82.0
subj2             84.3
scholarship     4000.0
Name: 3, dtype: object

In [46]:
df_sorted.iloc[[2,4,1], :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
13,MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76.0,8000.0
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0


In [48]:
df_sorted.iloc[3:5, :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,
13,MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76.0,8000.0


In [49]:
df_sorted.iloc[:, 3]

5        Lahore
12       Lahore
3        Lahore
11      Karachi
13       Multan
14      Sialkot
2       Karachi
10    Islamabad
9        Lahore
4      Peshawer
6       Sialkot
15       Multan
1     Islamabad
7        Multan
0        Lahore
8       Karachi
Name: address, dtype: object

In [50]:
df_sorted.iloc[:, [1,4,7]]

Unnamed: 0,name,session,subj1
5,Mohid,MORNING,69.3
12,Mujahid,MORNING,
3,Hadeed,MOR,82.0
11,Maaz,AFTERNOON,90.5
13,Sara,AFTERNOON,84.1
14,Fatima,AFT,90.5
2,Shaista,AFTERNOON,64.9
10,Khurram,MOR,90.5
9,Shahid,AFTERNOON,90.5
4,Zara,AFT,65.9


In [51]:
df_sorted.iloc[:, 3:6].head(2)

Unnamed: 0,address,session,group
5,Lahore,MORNING,group C
12,Lahore,MORNING,group D


In [52]:
df_sorted.iloc[[3,0], [1,5]]

Unnamed: 0,name,group
11,Maaz,group C
5,Mohid,group C


In [53]:
df_sorted.iloc[0:5, 2:4]

Unnamed: 0,age,address
5,16,Lahore
12,18,Lahore
3,20,Lahore
11,25,Karachi
13,28,Multan


In [56]:
df_sorted.head(3)

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0


In [54]:
df_sorted.loc[2,:]

roll no             MS03
name             Shaista
age                   35
address          Karachi
session        AFTERNOON
group            group B
gender            Female
subj1               64.9
subj2               75.1
scholarship       8500.0
Name: 2, dtype: object

In [57]:
df_sorted.loc[[2,4,1], :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0


In [58]:
df_sorted.loc[5:2, :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,
13,MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76.0,8000.0
14,MS15,Fatima,33,Sialkot,AFT,group C,Female,90.5,81.3,3500.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0


In [59]:
df_sorted.loc[3:5, :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship


In [61]:
df_sorted.loc[3:5:-1, :]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
5,MS06,Mohid,16,Lahore,MORNING,group C,Female,69.3,78.6,


In [62]:
df_sorted.loc[:, 'name']

5         Mohid
12      Mujahid
3        Hadeed
11         Maaz
13         Sara
14       Fatima
2       Shaista
10      Khurram
9        Shahid
4          Zara
6         Zobia
15    Kakamanna
1          Arif
7        Idrees
0          Rauf
8         Jamil
Name: name, dtype: object

In [64]:
df_sorted.loc[:, ['name', 'address', 'scholarship']]

Unnamed: 0,name,address,scholarship
5,Mohid,Lahore,
12,Mujahid,Lahore,7000.0
3,Hadeed,Lahore,4000.0
11,Maaz,Karachi,
13,Sara,Multan,8000.0
14,Fatima,Sialkot,3500.0
2,Shaista,Karachi,8500.0
10,Khurram,Islamabad,6000.0
9,Shahid,Lahore,3800.0
4,Zara,Peshawer,3500.0


In [66]:
df_sorted.loc[:, 'address':'name']

5
12
3
11
13
14
2
10
9
4
6


In [65]:
df_sorted.loc[:, 'name':'address']

Unnamed: 0,name,age,address
5,Mohid,16,Lahore
12,Mujahid,18,Lahore
3,Hadeed,20,Lahore
11,Maaz,25,Karachi
13,Sara,28,Multan
14,Fatima,33,Sialkot
2,Shaista,35,Karachi
10,Khurram,35,Islamabad
9,Shahid,38,Lahore
4,Zara,40,Peshawer


In [70]:
df_sorted.loc[:, 'name':'address': 2]

Unnamed: 0,name,address
5,Mohid,Lahore
12,Mujahid,Lahore
3,Hadeed,Lahore
11,Maaz,Karachi
13,Sara,Multan
14,Fatima,Sialkot
2,Shaista,Karachi
10,Khurram,Islamabad
9,Shahid,Lahore
4,Zara,Peshawer


In [72]:
df_sorted.loc[[3, 0], ['name', 'address']]

Unnamed: 0,name,address
3,Hadeed,Lahore
0,Rauf,Lahore


In [71]:
df_sorted.loc[3:13, ['name', 'age', 'session']]

Unnamed: 0,name,age,session
3,Hadeed,20,MOR
11,Maaz,25,AFTERNOON
13,Sara,28,AFTERNOON


In [73]:
df_sorted.loc[5:8, ['name', 'age', 'session']]

Unnamed: 0,name,age,session
5,Mohid,16,MORNING
12,Mujahid,18,MORNING
3,Hadeed,20,MOR
11,Maaz,25,AFTERNOON
13,Sara,28,AFTERNOON
14,Fatima,33,AFT
2,Shaista,35,AFTERNOON
10,Khurram,35,MOR
9,Shahid,38,AFTERNOON
4,Zara,40,AFT


In [75]:
df.age.head()

0    52
1    51
2    35
3    20
4    40
Name: age, dtype: int64

In [76]:
list1 = []
for length in df.age:
    if length > 40:
        list1.append(True)
        
    else:
        list1.append(False)
print(list1)

[True, True, False, False, False, False, False, True, True, False, False, False, False, False, False, True]


In [77]:
df[list1]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [78]:
df[df['age'] > 40]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [79]:
df[df.gender == 'Male']

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
9,MS10,Shahid,38,Lahore,AFTERNOON,group D,Male,90.5,81.3,3800.0
10,MS11,Khurram,35,Islamabad,MOR,group B,Male,90.5,81.3,6000.0
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,
12,MS13,Mujahid,18,Lahore,MORNING,group D,Male,,76.5,7000.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [80]:
df.loc[df.age > 40]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
0,MS01,Rauf,52,Lahore,MORNING,group C,Male,78.3,84.4,5000.0
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [81]:
df.loc[df.age > 40, ['name', 'age']]

Unnamed: 0,name,age
0,Rauf,52
1,Arif,51
7,Idrees,51
8,Jamil,53
15,Kakamanna,42


In [83]:
df[(df.age < 40) & (df.address == 'Multan')]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
13,MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76.0,8000.0


In [85]:
df1 = df[(df.group == 'group A') & (df.gender == 'Male')]
df1

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [86]:
df2 = df[(df.address == 'Sialkot') | (df.address == 'Karachi')]
df1

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
3,MS04,Hadeed,20,Lahore,MOR,group A,Male,82.0,84.3,4000.0
15,MS16,Kakamanna,42,Multan,AFTERNOON,group A,Male,90.5,81.3,3800.0


In [87]:
df3 = df[(df.address != 'Karachi') & (df.scholarship > 7000)]

In [88]:
df3

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
7,MS08,Idrees,51,Multan,MORNING,group D,Male,84.1,76.0,8000.0
13,MS14,Sara,28,Multan,AFTERNOON,group A,Female,84.1,76.0,8000.0


In [89]:
df[df.address.isin(['Karachi', 'Peshawer', 'Islamabad'])]

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
1,MS02,Arif,51,Islamabad,AFT,group A,Male,70.5,60.5,6000.0
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
4,MS05,Zara,40,Peshawer,AFT,group D,Female,65.9,72.8,3500.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
10,MS11,Khurram,35,Islamabad,MOR,group B,Male,90.5,81.3,6000.0
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,


In [90]:
dfgb = df.groupby('address')
dfgb

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002944BF63350>

In [91]:
df.groupby('address').get_group('Karachi')

Unnamed: 0,roll no,name,age,address,session,group,gender,subj1,subj2,scholarship
2,MS03,Shaista,35,Karachi,AFTERNOON,group B,Female,64.9,75.1,8500.0
8,MS09,Jamil,53,Karachi,AFT,group C,Male,90.5,81.3,3500.0
11,MS12,Maaz,25,Karachi,AFTERNOON,group C,Male,90.5,81.3,


In [92]:
df.groupby('address').get_group('Karachi').scholarship.max()

8500.0