In this chapter, we will discuss how to slice and dice the date and generally get the subset of pandas object.

The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy access to Pandas data structures across a wide range of use cases. However, since the type of the data to be accessed isn’t known in advance, directly using standard operators has some optimization limits. For production code, we recommend that you take advantage of the optimized pandas data access methods explained in this chapter.

Pandas now supports three types of Multi-axes indexing; the three types are mentioned in the following table −

1	
.loc()

Label based

2	
.iloc()

Integer based

3	
.ix()

Both Label and Integer based

# .loc()
Pandas provide various methods to have purely label based indexing. When slicing, the start bound is also included. Integers are valid labels, but they refer to the label and not the position.

.loc() has multiple access methods like −

A single scalar label
A list of labels
A slice object
A Boolean array
loc takes two single/list/range operator separated by ','. The first one indicates the row and the second one indicates columns.

Example 1

In [1]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
print(df)

#select all rows for a specific column
df.loc[:,'A']

          A         B         C         D
a -0.225271  0.117702 -0.539066  1.151399
b -1.247912 -0.828440  2.158662  1.364249
c  0.862515 -1.052551  0.073169 -0.921468
d -0.952506 -0.366799 -0.260485  0.126411
e  1.703367  1.216417 -0.902387 -2.680444
f  1.987069  0.361955  1.819959  0.457157
g  0.178733 -0.663798  0.818324 -0.370963
h  0.074708  1.054870  1.410869  0.478743


a   -0.225271
b   -1.247912
c    0.862515
d   -0.952506
e    1.703367
f    1.987069
g    0.178733
h    0.074708
Name: A, dtype: float64

In [2]:
#Example 2
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select all rows for multiple columns, say list[]
df.loc[:,['A','C']]

Unnamed: 0,A,C
a,2.17517,2.852102
b,-0.875031,-0.54744
c,0.505911,-0.662089
d,0.095074,0.674136
e,-1.380061,-0.274305
f,-0.601845,-0.053307
g,1.31834,-0.689044
h,-0.539856,0.509986


In [3]:
#Example 3

# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select few rows for multiple columns, say list[]
df.loc[['a','b','f','h'],['A','C']]

Unnamed: 0,A,C
a,-0.877129,-0.161434
b,-1.544516,-0.556277
f,-0.018301,0.080603
h,-1.202558,0.90268


In [6]:
#Example 4
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select range of rows for all columns
df.loc['a':'f']

Unnamed: 0,A,B,C,D
a,-1.38656,0.716452,2.577491,0.809334
b,-1.219756,1.233136,-0.260231,-1.873198
c,-1.213395,-0.573719,-0.225647,-0.375869
d,0.308215,-0.178929,1.863875,-0.178112
e,-1.25555,-1.256461,1.307408,1.056106
f,1.187327,-0.593057,1.913548,-0.516641


In [7]:
#Example 5
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
print(df)
# for getting values with a boolean array
df.loc['a']>0

          A         B         C         D
a  0.796608 -0.608522  0.949524  1.119190
b -0.200277  0.571536 -0.986507  1.424567
c  0.688527  0.960687 -0.037651  0.111898
d -0.426208 -0.941254  0.857964 -0.067710
e  0.017653  1.861774  0.624158  0.729584
f -0.696024 -1.963397  1.283643 -1.044218
g  0.430902 -1.214745 -0.859636 -0.437161
h  1.099372  1.475312  0.570415 -1.252370


A     True
B    False
C     True
D     True
Name: a, dtype: bool

# .iloc()
Pandas provide various methods in order to get purely integer based indexing. Like python and numpy, these are 0-based indexing.

The various access methods are as follows −

An Integer
A list of integers
A range of values

In [8]:
#Example 1
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
print(df)

# select all rows for a specific column
df.iloc[:4]

          A         B         C         D
0 -0.357511  0.034452  1.756359 -2.335632
1 -1.146670  0.242640  0.451434 -0.035425
2 -1.207105  1.617244 -1.238540 -0.145982
3 -1.007825  0.485325 -0.632722  1.194796
4 -1.435574  1.775170 -1.964928  1.188430
5 -1.618130  0.456586 -0.321466 -1.492244
6  0.461759  0.857473 -0.554806 -0.315197
7  0.072434  1.098325 -0.253323 -0.095784


Unnamed: 0,A,B,C,D
0,-0.357511,0.034452,1.756359,-2.335632
1,-1.14667,0.24264,0.451434,-0.035425
2,-1.207105,1.617244,-1.23854,-0.145982
3,-1.007825,0.485325,-0.632722,1.194796


In [9]:
#Example 2

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Integer slicing
print(df.iloc[:4])
df.iloc[1:5, 2:4]

          A         B         C         D
0 -1.262382  0.184245 -0.955784 -0.230638
1  0.771104 -3.483214 -0.536727  1.018314
2 -0.842408  0.162352 -0.498157  0.956367
3  0.104973 -0.634997 -1.173179  0.357169


Unnamed: 0,C,D
1,-0.536727,1.018314
2,-0.498157,0.956367
3,-1.173179,0.357169
4,-0.794378,-0.764915


In [10]:
#Example 3

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Slicing through list of values
print(df.iloc[[1, 3, 5], [1, 3]])
print( df.iloc[1:3, :])
print( df.iloc[:,1:3])

          B         D
1 -0.671326  0.153058
3  0.603813  0.236907
5  0.333258  0.857431
          A         B         C         D
1 -0.467807 -0.671326  0.070518  0.153058
2  0.097475 -1.054914 -0.028851 -0.358185
          B         C
0 -0.590960 -0.332111
1 -0.671326  0.070518
2 -1.054914 -0.028851
3  0.603813  0.868552
4  2.463631  0.032912
5  0.333258 -0.985017
6 -1.124917  1.191145
7 -0.193673  1.377186


# .ix()
Besides pure label based and integer based, Pandas provides a hybrid method for selections and subsetting the object using the .ix() operator.

In [11]:
#Example 1

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Integer slicing
df.ix[:4]

#panda 0.24

AttributeError: 'DataFrame' object has no attribute 'ix'

Note − .iloc() & .ix() applies the same indexing options and Return value.