### Arithmetic and Data Alignment
- perform arithmetic operations on objects(series, dataframe) while maintaining data alignment
- ensures operations align elements based on their indices, handling missing data gracefully

1. Arithmetic in Pandas can be performed on: 
    - Scalars: 
    - Two Pandas Object: element wise operations btw Series or Dataframe

2. Data Alignment is based on: 
    - Row Indices for Series
    - Row and column indices for DataFrames
    - [ If an index in one object is missing in the other, the result will have `NaN` in those positions, indicating missing data ]

In [1]:
import numpy as np 
import pandas as pd 

In [6]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=["a", "c", "d", "e"])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=["a", "c", "e", "f", "g"])

print(s1)
print(s2)

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64
a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64


In [7]:
# adding
s1 + s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [8]:
# for dataFrame, alignment is performed on both rows and columns: 

df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), 
                    columns=list("bcd"), 
                    index=["Ohio", "Texas", "Colorado"])

df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), 
                    columns=list("bde"), 
                    index=["Utah", "Ohio", "Texas", "Oregon"])

print(df1)
print(df2)

            b    c    d
Ohio      0.0  1.0  2.0
Texas     3.0  4.0  5.0
Colorado  6.0  7.0  8.0
          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0


In [9]:
# adding
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


adding DataFrame objects with no column or row labels in common, the result will contain all nulls

In [11]:
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})

print(df1)
print(df2)

print(df1 + df2)

   A
0  1
1  2
   B
0  3
1  4
    A   B
0 NaN NaN
1 NaN NaN


### Arithmetic methods with fill values
This methods ensure that missing values (NaN) are replaced with a specified value before performing the arithmetic operation

When to use fill_value
- to avoid `NaN` results from missing indices or columns
- to ensure smooth operations when combining incomplete datasets


| Operation       | Standard Method | Flexible Method | Description                        |
|-----------------------|---------------------|----------------------|------------------------------------------|
| Addition             | `+`                | `add()`             | Element-wise addition.                   |
| Subtraction          | `-`                | `sub()`             | Element-wise subtraction.                |
| Multiplication       | `*`                | `mul()`             | Element-wise multiplication.             |
| Division             | `/`                | `div()` or `truediv()` | Element-wise true division.             |
| Floor Division       | `//`               | `floordiv()`        | Element-wise floor division.             |
| Modulus              | `%`                | `mod()`             | Element-wise modulus operation.          |
| Exponentiation       | `**`               | `pow()`             | Element-wise exponentiation.             |


Flexible methods (`add()`, `sub()`, etc.) allow the following:
1. **Index Alignment**: Aligns indices before performing operations.
2. **Missing Data Handling**: Use `fill_value` to replace `NaN` before operations.


In [18]:
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)), columns=list("abcd"))

df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)), columns=list("abcde"))

df2.loc[1, 'b'] = np.nan # set a particular value to NA (null)

print(df1, "\n", df2)

     a    b     c     d
0  0.0  1.0   2.0   3.0
1  4.0  5.0   6.0   7.0
2  8.0  9.0  10.0  11.0 
       a     b     c     d     e
0   0.0   1.0   2.0   3.0   4.0
1   5.0   NaN   7.0   8.0   9.0
2  10.0  11.0  12.0  13.0  14.0
3  15.0  16.0  17.0  18.0  19.0


In [19]:
df1 + df2

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,
1,9.0,,13.0,15.0,
2,18.0,20.0,22.0,24.0,
3,,,,,


In [20]:

# using 'add' method on df1, pass df2 and an argument to fill_value, which substitutes the passed value for any missing values in the operation

df1.add(df2, fill_value=0)

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,4.0
1,9.0,5.0,13.0,15.0,9.0
2,18.0,20.0,22.0,24.0,14.0
3,15.0,16.0,17.0,18.0,19.0


The **reverse methods** are used when the **left-hand operand** is not a DataFrame or Series.
- This is useful in operations where you need custom behavior for scalar or non-Pandas objects.


| **Method**   | **Example Operation**   | **Description**                                  |
|--------------|--------------------------|--------------------------------------------------|
| `radd()`     | `other + DataFrame`     | Reverse addition.                               |
| `rsub()`     | `other - DataFrame`     | Reverse subtraction.                            |
| `rmul()`     | `other * DataFrame`     | Reverse multiplication.                         |
| `rdiv()`     | `other / DataFrame`     | Reverse division.                               |
| `rfloordiv()`| `other // DataFrame`    | Reverse floor division.                         |
| `rmod()`     | `other % DataFrame`     | Reverse modulus operation.                      |
| `rpow()`     | `other ** DataFrame`    | Reverse exponentiation.                         |

Why Use Reverse Methods?
- The operation starts with a scalar or another object, not the DataFrame
- Order matters, such as in subtraction, division, or modulus operations


In [22]:
# ex: 
print(1 / df1)

df1.rdiv(1)

       a         b         c         d
0    inf  1.000000  0.500000  0.333333
1  0.250  0.200000  0.166667  0.142857
2  0.125  0.111111  0.100000  0.090909


Unnamed: 0,a,b,c,d
0,inf,1.0,0.5,0.333333
1,0.25,0.2,0.166667,0.142857
2,0.125,0.111111,0.1,0.090909


In [26]:
# reverse 

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
print(df.radd(10))
print(df.rsub(10)) # 10 - df != df - 10
print(df.rdiv(10)) # 10 / df
print(df.rmul(10)) # 10 * df
print(df.rmod(10)) # 10 % df
print(df.rpow(10)) # 10 ** df

    A   B
0  11  13
1  12  14
   A  B
0  9  7
1  8  6
      A         B
0  10.0  3.333333
1   5.0  2.500000
    A   B
0  10  30
1  20  40
   A  B
0  0  1
1  0  2
     A      B
0   10   1000
1  100  10000


When reindexing a Series or DataFrame, you can also specify a different fill value

In [23]:
df1.reindex(columns=df2.columns, fill_value=0)

Unnamed: 0,a,b,c,d,e
0,0.0,1.0,2.0,3.0,0
1,4.0,5.0,6.0,7.0,0
2,8.0,9.0,10.0,11.0,0


### Operations between DataFrame and Series

In [27]:
arr = np.arange(12.).reshape((3,4))
arr

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [28]:
arr[0]

array([0., 1., 2., 3.])

In [30]:
arr - arr[0] # sub is performed once for each row, known as broadcasting

array([[0., 0., 0., 0.],
       [4., 4., 4., 4.],
       [8., 8., 8., 8.]])

operations btw a DataFrame and a Series are similar

In [32]:
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
                    columns=list("bde"),
                    index=["Utah", "Ohio", "Texas", "Oregon"])

series = frame.iloc[0]

print(frame,'\n')
print(series)

          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0 

b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64


By default, arithmetic between DataFrame and Series matches the index of the Series
on the columns of the DataFrame, { broadcasting down the rows }

In [33]:
frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form their union

In [34]:
series2 = pd.Series(np.arange(3), index=['b', 'e', 'f'])
series2

b    0
e    1
f    2
dtype: int64

In [35]:
frame + series2

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


Broadcast over the columns, matching on the rows, you have to use one of the arithmetic methods and specify to match over the index

In [39]:
series3 = frame['d']
print(frame)
print(series3)

          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0
Utah       1.0
Ohio       4.0
Texas      7.0
Oregon    10.0
Name: d, dtype: float64


In [40]:
frame.sub(series3, axis='index')

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohio,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0
