In [1]:
import pandas as pd
df = pd.read_csv('sampleData.csv', index_col = 'Name')

In [2]:
print(df)

         Age Gender
Name               
Shreya    16      F
Raj       10      M
Jaideep   48      M
Jita      45      F


This is what you want

In [10]:
df_F = df[df['Gender'] == 'F']
print(df_F)

        Age Gender
Name              
Shreya   16      F
Jita     45      F


Let's unpack this line of code. We use df['colName'] to spit out just a given column, like so

In [11]:
print(df['Age'])

Name
Shreya     16
Raj        10
Jaideep    48
Jita       45
Name: Age, dtype: int64


In general, square brackets are used for indexing, like when you want to find the 1st element of an array: arr[0]. So what does df[df['Gender'] == 'F'] do? Well, it's two steps. First, df['Gender'] == 'F' does something called 'mask.' Let's see an example.

In [12]:
print(df['Gender'] == 'F')

Name
Shreya      True
Raj        False
Jaideep    False
Jita        True
Name: Gender, dtype: bool


It basically forms a 'mask dataframe,' i.e. a dataframe where each element is a boolean, according to the condition you gave. You have similar behaviour in numpy, another package you'll use a lot (probably more than Pandas). It makes fast arrays, basically.

In [14]:
import numpy as np
a = np.array([1, 2, 3])
print(a > 1)
print(a[a > 1])

[False  True  True]
[2 3]


Ok, let's unpack that further. When you say df[mask], it basically produces a 'view' of the dataframe where the condition is true. Note that this does not produce a new dataframe, as that would take time, but rather uses pointers in the backend to refer to the same root data, but only the values you want. This is only really important if you're mutating the data; if you do it to the view, it will happen on the original data! Here's some other examples with numpy and pandas.

In [17]:
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print('array: ', a)
print()
print('dimensions: ', a.shape)
print()
print('data type: ', type(a))
mask = a >= 3
print('mask: ', mask)
print()
print('dimensions: ', mask.shape)
print()
print('data type: ',type(mask))
print('Masked array: ', a[mask])

array:  [[1 2 3]
 [4 5 6]
 [7 8 9]]

dimensions:  (3, 3)



AttributeError: 'numpy.ndarray' object has no attribute 'type'