<a href="https://colab.research.google.com/github/steven1174/Data_Science_Handbook/blob/main/03.Operating%20on%20Data%20in%20Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Operating on Data in Pandas
Pandas includes a couple useful twists, however: for unary operations like negation and trigonometric functions, these ufuncs will preserve index and column labels in the output, and for binary operations such as addition and multiplication, Pandas will
automatically align indices when passing the objects to the ufunc. This means that keeping the context of data and combining data from different sources—both potentially error-prone tasks with raw NumPy arrays—become essentially foolproof ones with Pandas. We will additionally see that there are well-defined operations between one-dimensional Series structures and two-dimensional DataFrame structures.
## Ufuncs: Index Preservation

In [None]:
import pandas as pd
import numpy as np

In [None]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0,10,4))
ser

0    6
1    3
2    7
3    4
dtype: int64

In [None]:
df = pd.DataFrame(rng.randint(0,10,(3,4)), columns = ['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


In [None]:
np.exp(ser)

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64

In [None]:
np.sin(df*np.pi/4)

Unnamed: 0,A,B,C,D
0,-1.0,0.7071068,1.0,-1.0
1,-0.707107,1.224647e-16,0.707107,-0.7071068
2,-0.707107,1.0,-0.707107,1.224647e-16


## UFuncs: Index Alignment
### Index alignment in Series

In [None]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,'New York': 19651127}, name='population')

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64

In [None]:
population / area

Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64

In [None]:
area.index.intersection(population.index)

Index(['Texas', 'California'], dtype='object')

In [None]:
area.index.union(population.index)

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

In [None]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])

In [None]:
A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [None]:
A.add(B,fill_value = 0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

### Index alignment in DataFrame

In [None]:
A = pd.DataFrame(rng.randint(0,20,(2,2)),columns=list('AB'))
A

Unnamed: 0,A,B
0,9,15
1,14,14


In [None]:
B = pd.DataFrame(rng.randint(0,10,(3,3)),columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,7,3,1
1,5,5,9
2,3,5,1


In [None]:
A + B

Unnamed: 0,A,B,C
0,12.0,22.0,
1,19.0,19.0,
2,,,


In [None]:
fill = A.stack().mean()
fill

13.0

In [None]:
A.add(B, fill_value= fill)

Unnamed: 0,A,B,C
0,12.0,22.0,14.0
1,19.0,19.0,22.0
2,18.0,16.0,14.0


<html>
<head>
<style>
table {
  font-family: Times New Roman, sans-serif;
  width: 100%;
}
td, th {
  border: 2px solid #dddddd;
  text-align: center;
  padding: 8px;
}
tr:nth-child(even) {
  background-color: #dddddd;
}
</style>
</head>

<body>
<table>
  <tr>  
    <th>Python operator</th>
    <th>Pandas method(s)</th>
  </tr>
  <tr>
    <td>+</td>
    <td>add()</td>
  </tr>
  <tr>
    <td>-</td>
    <td>sub(),subtract()</td>
  </tr>
  <tr>
    <td>*</td>
    <td>mul(),multiply()</td>
  </tr>
  <tr>
    <td>/</td>
    <td>truediv(), div(), divide()</td>
  </tr>
  <tr>
    <td>//</td>
    <td>floordiv()</td>
  <tr>
    <td>**</td>
    <td>pow()</td>
  </tr>
  <tr>
    <td>%</td>
    <td>mod()</td>
  </tr>
</table>

</body>
</html>

## Ufuncs: Operations Between DataFrame and Series

In [40]:
A = rng.randint(10, size=(3, 4))
A

array([[9, 1, 9, 3],
       [7, 6, 8, 7],
       [4, 1, 4, 7]])

In [42]:
A - A[0]

array([[ 0,  0,  0,  0],
       [-2,  5, -1,  4],
       [-5,  0, -5,  4]])

In [43]:
df = pd.DataFrame(A,columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,9,1,9,3
1,7,6,8,7
2,4,1,4,7


In [45]:
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-2,5,-1,4
2,-5,0,-5,4


In [46]:
df.subtract(df['R'], axis= 0)

Unnamed: 0,Q,R,S,T
0,8,0,8,2
1,1,0,2,1
2,3,0,3,6


In [49]:
halfrow = df.iloc[0,1::2]
halfrow

R    1
T    3
Name: 0, dtype: int64

In [50]:
df - halfrow

Unnamed: 0,Q,R,S,T
0,,0.0,,0.0
1,,5.0,,4.0
2,,0.0,,4.0
