## Chapter15: Operating on Data in Pandas

---
* Author:  [Yuttapong Mahasittiwat](mailto:khala1391@gmail.com)
* Technologist | Data Modeler | Data Analyst
* [YouTube](https://www.youtube.com/khala1391)
* [LinkedIn](https://www.linkedin.com/in/yuttapong-m/)
---

Source: [**Python Data Science Handbook** by **VanderPlas**](https://jakevdp.github.io/PythonDataScienceHandbook/)

In [4]:
import numpy as np
import pandas as pd
print("numpy version :",np.__version__)
print("pandas version :",pd.__version__)

numpy version : 1.26.4
pandas version : 2.2.1


## Ufuncs: index preservation

In [7]:
rng = np.random.default_rng(42)
ser = pd.Series(rng.integers(0,10,4))
ser

0    0
1    7
2    6
3    4
dtype: int64

In [14]:
df = pd.DataFrame(rng.integers(0,10,size=(3,4)),
                 columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,4,8,0,6
1,2,0,5,9
2,7,7,7,7


In [16]:
np.exp(ser)

0       1.000000
1    1096.633158
2     403.428793
3      54.598150
dtype: float64

In [24]:
np.sin(df*np.pi/4)

Unnamed: 0,A,B,C,D
0,1.224647e-16,-2.449294e-16,0.0,-1.0
1,1.0,0.0,-0.707107,0.707107
2,-0.7071068,-0.7071068,-0.707107,-0.707107


## Ufuncs: index alignment

### index alignment in series

In [26]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                   'California': 423967}, name='area')
population = pd.Series({'California': 39538223, 'Texas': 29145505,
                        'Florida': 21538187}, name='population')

In [42]:
area.shape, population.shape

((3,), (3,))

In [46]:
population/area

Alaska              NaN
California    93.257784
Florida             NaN
Texas         41.896072
dtype: float64

In [50]:
area.index, population.index

Alaska        1723337
Texas          695662
California     423967
Name: area, dtype: int64

In [52]:
area.index.union(population.index)

Index(['Alaska', 'California', 'Florida', 'Texas'], dtype='object')

In [56]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A+B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [68]:
A.add(B, fill_value=0)  # fill nan with 0, otherwise sum with nan equal to nan

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

### index alignment in dataframes

In [70]:
A = pd.DataFrame(rng.integers(0, 20, (2, 2)),
                 columns=['a', 'b'])
A

Unnamed: 0,a,b
0,10,2
1,16,9


In [74]:
B = pd.DataFrame(rng.integers(0, 10, (3, 3)),
                 columns=['b', 'a', 'c'])
B

Unnamed: 0,b,a,c
0,4,4,2
1,0,5,8
2,0,8,8


In [78]:
A+B, B+A

(      a    b   c
 0  14.0  6.0 NaN
 1  21.0  9.0 NaN
 2   NaN  NaN NaN,
       a    b   c
 0  14.0  6.0 NaN
 1  21.0  9.0 NaN
 2   NaN  NaN NaN)

In [80]:
A.add(B, fill_value=A.values.mean())

Unnamed: 0,a,b,c
0,14.0,6.0,11.25
1,21.0,9.0,17.25
2,17.25,9.25,17.25


## Ufuncs: operations between dataframes and series

In [106]:
A = rng.integers(10,size=(3,4))
A

array([[4, 3, 2, 5],
       [6, 9, 4, 1],
       [8, 6, 7, 0]], dtype=int64)

In [110]:
A-A[0]

array([[ 0,  0,  0,  0],
       [ 2,  6,  2, -4],
       [ 4,  3,  5, -5]], dtype=int64)

In [114]:
df = pd.DataFrame(A, columns=['Q', 'R', 'S', 'T'])
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,2,6,2,-4
2,4,3,5,-5


In [128]:
df.subtract(df['R'],axis=0)

Unnamed: 0,Q,R,S,T
0,1,0,-1,2
1,-3,0,-5,-8
2,2,0,1,-6


In [138]:
df

Unnamed: 0,Q,R,S,T
0,4,3,2,5
1,6,9,4,1
2,8,6,7,0


In [140]:
halfrow = df.iloc[0,::2]
halfrow

Q    4
S    2
Name: 0, dtype: int64

In [142]:
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,2.0,,2.0,
2,4.0,,5.0,
