 <span style="color:deeppink">
        <h1>Arithmetic and data alignment  </h>

One of the most important pandas features is the behavior of arithmetic between objects with different indexes. When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. Let’s look at a simple example: 

In [1]:
 from pandas import Series, DataFrame

In [2]:
import pandas as pd 

In [3]:
 s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s1    

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64

In [4]:
 s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

In [5]:
s1 + s2 

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In the case of DataFrame, alignment is performed on both the rows and the columns: 

In [6]:
import numpy as np

In [7]:
 df1 = DataFrame(np.arange(9).reshape((3, 3)), columns=list('bcd'), index=['Ohio', 'Texas', 'Colorado'])
df1

Unnamed: 0,b,c,d
Ohio,0,1,2
Texas,3,4,5
Colorado,6,7,8


In [8]:
 df2 = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),  index=['Utah', 'Ohio', 'Texas', 'Oregon'])
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [9]:
 df1 + df2 

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


<span style="color:deeppink">
        <h1>Arithmetic methods with fill values  </h>

In arithmetic operations between differently-indexed objects, you might want to fill with a special value, like 0, when an axis label is found in one object but not the other: 

In [10]:
df1 = DataFrame(np.arange(12.).reshape((3, 4)), columns=list('abcd'))
df1

Unnamed: 0,a,b,c,d
0,0.0,1.0,2.0,3.0
1,4.0,5.0,6.0,7.0
2,8.0,9.0,10.0,11.0


In [11]:
 df2 = DataFrame(np.arange(20.).reshape((4, 5)), columns=list('abcde'))
df2

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,4.0
1,5.0,6.0,7.0,8.0,9.0
2,10.0,11.0,12.0,13.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [12]:
 df1 + df2 

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,
1,9.0,11.0,13.0,15.0,
2,18.0,20.0,22.0,24.0,
3,,,,,


<span style="black">
         Using the      
<span style="color:violet"> 
        add 
<span style=" color:black">
       method on df1, I pass df2 and an argument to
<span style="color:violet"> 
      fill_value 
<span style=" color:black">
    :

In [14]:
df1.add(df2, fill_value=1) 

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,5.0
1,9.0,11.0,13.0,15.0,10.0
2,18.0,20.0,22.0,24.0,15.0
3,16.0,17.0,18.0,19.0,20.0


<span style="color:deeppink">
        <h1>Operations between DataFrame and Series </h>

As with NumPy arrays, arithmetic between DataFrame and Series is well-defined. First, as a motivating example, consider the difference between a 2D array and one of its rows: 

In [15]:
 arr = np.arange(12.).reshape((3, 4))
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [16]:
 arr[0] 

array([0., 1., 2., 3.])

In [17]:
 arr - arr[0] 

array([[0., 0., 0., 0.],
       [4., 4., 4., 4.],
       [8., 8., 8., 8.]])

In [18]:
 frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),  index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [19]:
 series = frame.ix[0]
series   

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64

In [20]:
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [21]:
 frame - series 

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union:

In [22]:
 series2 = Series(range(3), index=['b', 'e', 'f'])

In [23]:
 frame + series2 

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


If you want to instead broadcast over the columns, matching on the rows, you have to use one of the arithmetic methods. For example: 

In [24]:
 series3 = frame['d']
series3    

Utah       1.0
Ohio       4.0
Texas      7.0
Oregon    10.0
Name: d, dtype: float64

In [25]:
 frame 

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [26]:
series3

Utah       1.0
Ohio       4.0
Texas      7.0
Oregon    10.0
Name: d, dtype: float64

In [27]:
 frame.sub(series3, axis=0)

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohio,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0


<span style="color:deeppink">
        <h1>Function application and mapping </h>

NumPy ufuncs (element-wise array methods) work fine with pandas objects: 

In [28]:
 frame = DataFrame(np.random.randn(4, 3), columns=list('bde'),index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,0.24573,1.433597,0.723296
Ohio,0.381927,1.105224,1.545824
Texas,-1.181612,-1.860355,0.498627
Oregon,1.400022,0.985845,-0.267597


In [29]:
 np.abs(frame) 

Unnamed: 0,b,d,e
Utah,0.24573,1.433597,0.723296
Ohio,0.381927,1.105224,1.545824
Texas,1.181612,1.860355,0.498627
Oregon,1.400022,0.985845,0.267597


<span style="black">
           Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s   
<span style="color:violet"> 
        apply
<span style=" color:black">
      method does exactly this:

In [31]:
 f = lambda x: x.max() - x.min()

In [32]:
 frame.apply(f) 

b    2.581634
d    3.293952
e    1.813422
dtype: float64

In [33]:
 frame.apply(f, axis=1) 

Utah      1.187867
Ohio      1.163897
Texas     2.358983
Oregon    1.667620
dtype: float64

<span style="black">
          The function passed to 
<span style="color:violet"> 
        apply
<span style=" color:black">
    need not return a scalar value, it can also return a Series with multiple values:

In [35]:
 def f(x): 
         return Series([x.min(), x.max()], index=['min', 'max'])
frame.apply(f)


Unnamed: 0,b,d,e
min,-1.181612,-1.860355,-0.267597
max,1.400022,1.433597,1.545824


<span style="black">
             Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in
<span style="color:violet"> 
           frame
<span style=" color:black">
 . You can do this with
<span style="color:violet"> 
    applymap
<span style=" color:black">        
        : 

In [37]:
 format = lambda x: '%.2f' % x

In [38]:
 frame.applymap(format)

Unnamed: 0,b,d,e
Utah,0.25,1.43,0.72
Ohio,0.38,1.11,1.55
Texas,-1.18,-1.86,0.5
Oregon,1.4,0.99,-0.27


<span style="black">
 The reason for the name 
<span style="color:violet"> 
 applymap         
<span style=" color:black">
 is that Series has a 
<span style="color:violet"> 
   map
<span style=" color:black">        
 method for applying an element-wise function: 

In [39]:
 frame['e'].map(format) 

Utah       0.72
Ohio       1.55
Texas      0.50
Oregon    -0.27
Name: e, dtype: object

<span style="color:deeppink">
        <h1>Sorting and ranking </h>

<span style="black">
  Sorting a data set by some criterion is another important built-in operation. To sort lexicographically by row or column index, use the
<span style="color:violet"> 
 sort_index          
<span style=" color:black"> 
   method, which returns a new, sorted object: 

In [40]:
 obj = Series(range(4), index=['d', 'a', 'b', 'c'])

In [41]:
 obj.sort_index() 

a    1
b    2
c    3
d    0
dtype: int64

In [42]:
 frame = DataFrame(np.arange(8).reshape((2, 4)), index=['three', 'one'], columns=['d', 'a', 'b', 'c'])
frame

Unnamed: 0,d,a,b,c
three,0,1,2,3
one,4,5,6,7


In [43]:
 frame.sort_index() 

Unnamed: 0,d,a,b,c
one,4,5,6,7
three,0,1,2,3


In [44]:
 frame.sort_index(axis=1) 

Unnamed: 0,a,b,c,d
three,1,2,3,0
one,5,6,7,4


In [45]:
frame.sort_index(axis=1, ascending=False) 

Unnamed: 0,d,c,b,a
three,0,3,2,1
one,4,7,6,5
