# Operating on Data in Pandas
This preservation and alignment of indices and columns means that operations on data in Pandas will always maintain the data context, which prevents the types of silly errors that might come up when working with heterogeneous and/or misaligned data in raw NumPy arrays.

In [2]:
import pandas as pd
import numpy as np

## Ufuncs: Index Preservation
perform unifunction on pd DataFrame will preserve the index

### Index preservation in Series 

In [10]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4), name = 'A')
print(ser)
print("-----")
print(np.exp(ser))

0    6
1    3
2    7
3    4
Name: A, dtype: int32
-----
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
Name: A, dtype: float64


### Index Preservation in DataFrame

In [11]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
print(df)
print("-----")
print(np.sin(df * np.pi / 4))

   A  B  C  D
0  6  9  2  6
1  7  4  3  7
2  7  2  5  4
-----
          A             B         C             D
0 -1.000000  7.071068e-01  1.000000 -1.000000e+00
1 -0.707107  1.224647e-16  0.707107 -7.071068e-01
2 -0.707107  1.000000e+00 -0.707107  1.224647e-16


## UFuncs: Index Alignment
any missing values are filled in with NaN by default.

If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators. More details can be found in the section Handling missing value.

### Index alignment in Series - example 1

In [13]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

#note Pandas sort the index silently.
# any missing values are filled in with NaN by default.
# The resulting array contains the *union* of indices of the two input arrays.
population / area 

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

### Index alignment in Series - example 2

In [16]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[2, 1, 3])
A + B

0    NaN
1    7.0
2    7.0
3    NaN
dtype: float64

### Index alignment in DataFrame

A similar type of alignment takes place for *both* columns and indices when performing operations on ``DataFrame``s:

In [17]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
A

Unnamed: 0,A,B
0,0,11
1,11,16


In [18]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,9,2,6
1,3,8,2
2,4,2,6


In [19]:
A + B

Unnamed: 0,A,B,C
0,2.0,20.0,
1,19.0,19.0,
2,,,


The following table lists Python operators and their equivalent Pandas object methods. it is helpful when the ufunc need special treatment.

| Python Operator | Pandas Method(s)                      |
|-----------------|---------------------------------------|
| ``+``           | ``add()``                             |
| ``-``           | ``sub()``, ``subtract()``             |
| ``*``           | ``mul()``, ``multiply()``             |
| ``/``           | ``truediv()``, ``div()``, ``divide()``|
| ``//``          | ``floordiv()``                        |
| ``%``           | ``mod()``                             |
| ``**``          | ``pow()``                             |


## Ufuncs: Operations Between DataFrame and Series
In Pandas, the broadcasting convention similarly operates row-wise by default:

### Row-wise Boardcasting - example 1

In [20]:
A = rng.randint(10, size=(3, 4))
df = pd.DataFrame(A, columns=list('QRST'))

In [21]:
df - df.iloc[0] # the boardcasting is opertae row-wise

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-1,0,-5,8
2,4,1,-2,0


### Row-wise Boardcasting - example 2

In [22]:
halfrow = df.iloc[0, ::2]
halfrow

Q    4
S    6
Name: 0, dtype: int32

In [23]:
# pandas uses Nan when there is missing values. 
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,-1.0,,-5.0,
2,4.0,,-2.0,


### Column-wise Boardcasting attempt
If you would instead like to operate column-wise, you can use the object methods mentioned earlier, while specifying the ``axis`` keyword

In [24]:
# it won't give you the desired result. Pandas will convert df['R'] to row and still operate in a row-wise manner
df - df['R'] 

Unnamed: 0,Q,R,S,T,0,1,2
0,,,,,,,
1,,,,,,,
2,,,,,,,


In [25]:
df.subtract(df['R'], axis = 0) # in order to get the desired result, we need to specily the axis explicitly

Unnamed: 0,Q,R,S,T
0,-4,0,-2,-7
1,-5,0,-7,1
2,-1,0,-5,-8
