<h1 align="center">5.3 Essential Functionality Part II

<b>Function Application and Mapping

In [5]:
import pandas as pd
import numpy as np

NumPy ufuncs (element-wise array methods) also work with pandas objects

In [8]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,-0.742329,-0.149573,1.186291
Ohio,-0.538021,-1.504591,0.732917
Texas,-0.249982,-1.363503,-1.443409
Oregon,-1.833501,-0.70029,0.459912


In [7]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,0.165347,0.567339,0.476201
Ohio,0.910677,1.41183,0.436988
Texas,1.292484,0.091367,0.068809
Oregon,0.552683,0.134628,1.051822


Another frequent operation is applying a function on one-dimensional arrays to eachcolumn or row. DataFrame’s apply method does exactly this

In [9]:
f = lambda x: x.max() - x.min()
frame.apply(f)

b    1.583519
d    1.355017
e    2.629700
dtype: float64

In [14]:
frame.apply(f, axis=1)

Utah      1.928620
Ohio      2.237507
Texas     1.193427
Oregon    2.293413
dtype: float64

Element-wise  Python  functions  can  be  used,  too. 

Suppose  you  wanted  to  compute  aformatted string from each floating-point value in frame. You can do this with applymap

In [11]:
format = lambda x: '%.2f' % x
frame.applymap(format)

Unnamed: 0,b,d,e
Utah,-0.74,-0.15,1.19
Ohio,-0.54,-1.5,0.73
Texas,-0.25,-1.36,-1.44
Oregon,-1.83,-0.7,0.46


The  reason  for  the  name  applymap  is  that  Series  has  a  map  method  for  applying  anelement-wise function

In [12]:
frame['e'].map(format)

Utah       1.19
Ohio       0.73
Texas     -1.44
Oregon     0.46
Name: e, dtype: object

<b>Sorting and Ranking

To sort lexicographically by row or column index, use the sort_index method, which returns a new sorted object

In [27]:
obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

In [28]:
frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
                     index=['three', 'one'],
                     columns=['d', 'a', 'b', 'c'])
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [29]:
frame.sort_index()

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [30]:
frame.sort_index(axis=1)

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


In [31]:
frame.sort_index(axis=1, ascending=False)

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5


To sort a Series by its values, use its sort_values method,NaN values are placed at last

In [32]:
obj.sort_values()

d    0
a    1
b    2
c    3
dtype: int64

In [34]:
frame.sort_values(by='b',ascending=False)

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [33]:
frame.sort_values(by=['d','b'])

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


Ranking assigns ranks from one through the number of valid data points in an array. 

The  rank  methods  for  Series  and  DataFrame  are  the  place  to  look.

By  default  rankbreaks ties by assigning each group the mean rank

In [37]:
obj_x=pd.Series([1,2,1,5,4,8,9,5,6])

In [38]:
obj_x.rank()

0    1.5
1    3.0
2    1.5
3    5.5
4    4.0
5    8.0
6    9.0
7    5.5
8    7.0
dtype: float64

In [39]:
obj_x.rank(method='dense')

0    1.0
1    2.0
2    1.0
3    4.0
4    3.0
5    6.0
6    7.0
7    4.0
8    5.0
dtype: float64

DataFrame can compute ranks over the rows or the columns

In [53]:
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [55]:
frame.rank(method='max',axis=0) #method does not work for dataframe they are only ranked by axis

Unnamed: 0,d,a,b,c
three,1.0,1.0,1.0,1.0
one,2.0,2.0,2.0,2.0


<b>Axis Indexes with Duplicate Labels

In [57]:
obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])

In [59]:
obj['a']

a    0
a    1
dtype: int64