# Operating on data
_________________________

* Pandas inherits much of NumPy functionality (quick element-wise operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric functions, exponential and logarithmic functions, etc.)
* `ufuncs` are key to operations
* There are well-defined operations between one-dimensional ``Series`` structures and two-dimensional ``DataFrame`` structures

## 1. Ufuncs: Index preservation
______________________

* Any NumPy ufunc will work on ``Series`` and ``DataFrame`` objects

In [None]:
import pandas as pd
import numpy as np

In [None]:
rng = np.random.RandomState(42)

In [None]:
ser = pd.Series(rng.randint(0, 10, 4))
ser

In [None]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
df

If we apply a NumPy ufunc on either of these objects, the result will be another Pandas object *with the indices preserved:*

In [None]:
np.exp(ser)

Or, for a slightly more complex calculation:

In [None]:
np.sin(df * np.pi / 4)

## 2. UFuncs: Index alignment
____________________________

* For binary operations on two ``Series`` or ``DataFrame`` objects, Pandas alignі indices in the process of performing the operation

#### 2.1. Index alignment in ``Series``
___________________

In [None]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

In [None]:
population / area

* The resulting array contains the *union* of indices of the two input arrays, which could be determined using standard Python set arithmetic on these indices

In [None]:
area.index.union(population.index)

* Any item for which one or the other does not have an entry is marked with ``NaN``, or "Not a Number" ( mark of missing data)
* This index matching is implemented this way for any of Python's built-in arithmetic expressions

In [None]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B

* If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators (using optional explicit specification of the fill value for any elements that might be missing)

In [None]:
A.add(B, fill_value=0)

#### 2.2. Index alignment in ``DataFrame``

* A similar type of alignment takes place for *both* columns and indices when performing operations on ``DataFrame``s:

In [None]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
A

In [None]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))
B

In [None]:
A + B

* Indices are aligned correctly irrespective of their order in the two objects
* Indices in the result are **sorted**
* As was the case with ``Series``, the associated object's arithmetic method can be used and any desired ``fill_value`` can be used in place of missing entries

In [None]:
fill = A.stack().mean() #the mean of all values in ``A``
A.add(B, fill_value=fill)

Table of Python operators and their equivalent Pandas object methods:

| Python Operator | Pandas Method(s)                      |
|-----------------|---------------------------------------|
| ``+``           | ``add()``                             |
| ``-``           | ``sub()``, ``subtract()``             |
| ``*``           | ``mul()``, ``multiply()``             |
| ``/``           | ``truediv()``, ``div()``, ``divide()``|
| ``//``          | ``floordiv()``                        |
| ``%``           | ``mod()``                             |
| ``**``          | ``pow()``                             |


## 3. Ufuncs: Operations between ``DataFrame`` and ``Series``
___________________________

* When performing operations between a ``DataFrame`` and a ``Series``, the index and column alignment is similarly maintained
* Operations between a ``DataFrame`` and a ``Series`` are similar to operations between a two-dimensional and one-dimensional NumPy array

In [None]:
A = rng.randint(10, size=(3, 4))
A

In [None]:
A - A[0]

* According to NumPy's broadcasting rules, subtraction between a two-dimensional array and one of its rows is applied row-wise

* In Pandas, the convention similarly operates row-wise by default

In [None]:
df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]

* If you would instead like to operate column-wise, you can use the object methods mentioned earlier, while specifying the ``axis`` keyword:

In [None]:
df.subtract(df['R'], axis=0)

* These ``DataFrame``/``Series`` operations, like the operations discussed above, will automatically align  indices between the two elements:

In [None]:
halfrow = df.iloc[0, ::2]
halfrow

In [None]:
df - halfrow

This preservation and alignment of indices and columns means that operations on data in Pandas will always maintain the data context, which prevents the types of silly errors that might come up when working with heterogeneous and/or misaligned data in raw NumPy arrays.