A Boolean mask is an array which can be of one dimension like a series, or two dimensions like a data frame, where each of the values in the array are either true or false. This array is essentially overlaid on top of the data structure that we're querying. And any cell aligned with the true value will be admitted into our final result, and any cell aligned with a false value will not.

In [41]:
import pandas as pd
import numpy as np

In [16]:
df = pd.read_csv('Admission_Predict.csv', index_col=0)
df.columns = [x.upper().strip() for x in df.columns]
df.index.name = df.index.name.upper().strip()
df.columns.name = 'Scores & Rating'.upper()
df.head()

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


Boolean Mask - is broadcasting comparison operator over a column.  
Underneath, pandas is applying the comparison operator you specified through vectorization (so efficiently and in parallel) to all of the values in the array you specified which, in this case, is the chance of admit column of the dataframe.

In [46]:
mask = df['CHANCE OF ADMIT'] > 0.7
mask

SERIAL NO.
1       True
2       True
3       True
4       True
5      False
       ...  
396     True
397     True
398     True
399    False
400     True
Name: CHANCE OF ADMIT, Length: 400, dtype: bool

In [19]:
df.where(mask)  # but wherever False is there NaN is displayed

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337.0,118.0,4.0,4.5,4.5,9.65,1.0,0.92
2,324.0,107.0,4.0,4.0,4.5,8.87,1.0,0.76
3,316.0,104.0,3.0,3.0,3.5,8.00,1.0,0.72
4,322.0,110.0,3.0,3.5,2.5,8.67,1.0,0.80
5,,,,,,,,
...,...,...,...,...,...,...,...,...
396,324.0,110.0,3.0,3.5,3.5,9.04,1.0,0.82
397,325.0,107.0,3.0,3.0,3.5,9.11,1.0,0.84
398,330.0,116.0,4.0,5.0,4.5,9.45,1.0,0.91
399,,,,,,,,


In [45]:
df.loc[2, 'SOP'] = np.NaN
df

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.00,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.80
5,314,103,2,2.0,3.0,8.21,0,0.65
...,...,...,...,...,...,...,...,...
396,324,110,3,3.5,3.5,9.04,1,0.82
397,325,107,3,3.0,3.5,9.11,1,0.84
398,330,116,4,5.0,4.5,9.45,1,0.91
399,312,103,3,3.5,4.0,8.78,0,0.67


In [47]:
df.where(mask).dropna()  # 2nd row is also dropped

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337.0,118.0,4.0,4.5,4.5,9.65,1.0,0.92
3,316.0,104.0,3.0,3.0,3.5,8.00,1.0,0.72
4,322.0,110.0,3.0,3.5,2.5,8.67,1.0,0.80
6,330.0,115.0,5.0,4.5,3.0,9.34,1.0,0.90
7,321.0,109.0,3.0,3.0,4.0,8.20,1.0,0.75
...,...,...,...,...,...,...,...,...
395,329.0,111.0,4.0,4.5,4.0,9.23,1.0,0.89
396,324.0,110.0,3.0,3.5,3.5,9.04,1.0,0.82
397,325.0,107.0,3.0,3.0,3.5,9.11,1.0,0.84
398,330.0,116.0,4.0,5.0,4.5,9.45,1.0,0.91


This can be also done by passing mask to DataFrame. Index operator is overloaded here and does function of where() and dropna() internally -

In [18]:
df[mask]

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.00,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.80
6,330,115,5,4.5,3.0,9.34,1,0.90
...,...,...,...,...,...,...,...,...
395,329,111,4,4.5,4.0,9.23,1,0.89
396,324,110,3,3.5,3.5,9.04,1,0.82
397,325,107,3,3.0,3.5,9.11,1,0.84
398,330,116,4,5.0,4.5,9.45,1,0.91


In [26]:
df[['GRE SCORE','TOEFL SCORE']]

SCORES & RATING,GRE SCORE,TOEFL SCORE
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1
1,337,118
2,324,107
3,316,104
4,322,110
5,314,103
...,...,...
396,324,110
397,325,107
398,330,116
399,312,103


In [27]:
df[df['GRE SCORE']>320]

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
4,322,110,3,3.5,2.5,8.67,1,0.80
6,330,115,5,4.5,3.0,9.34,1,0.90
7,321,109,3,3.0,4.0,8.20,1,0.75
...,...,...,...,...,...,...,...,...
395,329,111,4,4.5,4.0,9.23,1,0.89
396,324,110,3,3.5,3.5,9.04,1,0.82
397,325,107,3,3.0,3.5,9.11,1,0.84
398,330,116,4,5.0,4.5,9.45,1,0.91


In [32]:
df[(df['GRE SCORE']>320) & (df['CHANCE OF ADMIT']<0.9)]

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2,324,107,4,4.0,4.5,8.87,1,0.76
4,322,110,3,3.5,2.5,8.67,1,0.80
7,321,109,3,3.0,4.0,8.20,1,0.75
10,323,108,3,3.5,3.0,8.60,0,0.45
11,325,106,3,3.5,4.0,8.40,1,0.52
...,...,...,...,...,...,...,...,...
383,324,110,4,4.5,4.0,9.15,1,0.82
393,326,112,4,4.0,3.5,9.12,1,0.84
395,329,111,4,4.5,4.0,9.23,1,0.89
396,324,110,3,3.5,3.5,9.04,1,0.82


In [33]:
df[df['GRE SCORE'].gt(320) & df['CHANCE OF ADMIT'].lt(0.9)]  # Same as above

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2,324,107,4,4.0,4.5,8.87,1,0.76
4,322,110,3,3.5,2.5,8.67,1,0.80
7,321,109,3,3.0,4.0,8.20,1,0.75
10,323,108,3,3.5,3.0,8.60,0,0.45
11,325,106,3,3.5,4.0,8.40,1,0.52
...,...,...,...,...,...,...,...,...
383,324,110,4,4.5,4.0,9.15,1,0.82
393,326,112,4,4.0,3.5,9.12,1,0.84
395,329,111,4,4.5,4.0,9.23,1,0.89
396,324,110,3,3.5,3.5,9.04,1,0.82


Chaining - 

In [36]:
df[df['CHANCE OF ADMIT'].gt(0.7).lt(0.9)]  # Between 0.7 & 0.9

SCORES & RATING,GRE SCORE,TOEFL SCORE,UNIVERSITY RATING,SOP,LOR,CGPA,RESEARCH,CHANCE OF ADMIT
SERIAL NO.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
5,314,103,2,2.0,3.0,8.21,0,0.65
8,308,101,2,3.0,4.0,7.90,0,0.68
9,302,102,1,2.0,1.5,8.00,0,0.50
10,323,108,3,3.5,3.0,8.60,0,0.45
11,325,106,3,3.5,4.0,8.40,1,0.52
...,...,...,...,...,...,...,...,...
387,302,101,2,2.5,3.5,7.96,0,0.46
388,307,105,2,2.0,3.5,8.10,0,0.53
389,296,97,2,1.5,2.0,7.80,0,0.49
391,314,102,2,2.0,2.5,8.24,0,0.64
