# Selecting Data from Pandas DataFrame

Import the libraries required to used Pandas and Numpy. Set up the dataframe to use for the example:

In [27]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [28]:
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

To select a single column, which yields a Series, equivalent to df.A

In [9]:
df['A']

2013-01-01    0.589059
2013-01-02   -1.184502
2013-01-03   -1.454402
2013-01-04   -0.745229
2013-01-05    0.480240
2013-01-06   -0.160388
Freq: D, Name: A, dtype: float64

We can select using [], which will slice the rows of the DataFrame:

In [10]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,0.589059,-0.846438,2.080108,-0.218274
2013-01-02,-1.184502,-0.5072,-0.020703,1.368786
2013-01-03,-1.454402,-1.344647,-0.394066,1.20551


Similarly, we can slice the rows using the row labels as shown below:

In [11]:
df['20130102':'20130104']

Unnamed: 0,A,B,C,D
2013-01-02,-1.184502,-0.5072,-0.020703,1.368786
2013-01-03,-1.454402,-1.344647,-0.394066,1.20551
2013-01-04,-0.745229,0.297799,-2.454678,0.694041


To get a cross section using a label, we can use the code shown below. This says that we want to look at rows, and we select the first row using the 0 index:

In [12]:
df.loc[dates[0]]

A    0.589059
B   -0.846438
C    2.080108
D   -0.218274
Name: 2013-01-01 00:00:00, dtype: float64

We can select data on a multi-axis by label as shown below. The colon will select all the data rows, and the labels return just the column that is required:

In [13]:
df.loc[:,['A','B']]

Unnamed: 0,A,B
2013-01-01,0.589059,-0.846438
2013-01-02,-1.184502,-0.5072
2013-01-03,-1.454402,-1.344647
2013-01-04,-0.745229,0.297799
2013-01-05,0.48024,0.286422
2013-01-06,-0.160388,0.920942


To find a unique set of values for a given column label, which is output as an array, we can use the following:

In [25]:
data = {'id':['s1','s2','s3','s4'], 'grade':['D','C','D','HD']}
df1 = pd.DataFrame(data)
df1.grade.unique()

array(['D', 'C', 'HD'], dtype=object)

Suppose that you want to filter a DataFrame based on some value that you find in the column. This can be done as follows:

In [26]:
logic_filter = df1.grade == 'D'
# This is the actual filter that is applied to the
# data frame of interest
df1_filtered = df1[logic_filter]
df1_filtered

Unnamed: 0,grade,id
0,D,s1
2,D,s3
