# How Pandas apply works


In [1]:
import json
import numpy as np
import pandas as pd

# pd.DataFrame.apply(arg: pd.Series)

The argument passed to the ```apply``` function is **Pandas [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html)** (MUST Understand this).

* [pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)

> Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1)

* [pandas.Series.loc](https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html#)

> ```.loc[]``` is primarily label based, but may also be used with a boolean array.
> * A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
> * A list or array of labels, e.g. ['a', 'b', 'c'].
> * A slice object with labels, e.g. 'a':'f'.


# Argument of ```apply```

## axis=1

With ```axis=1```, each row in the DataFrame is passed as a Series whose index is column labels.

In [2]:
d = {'a': 1, 'b': 2, 'c': 3}
series = pd.Series(data=d, index=['a', 'b', 'c'])
print(f"{series.index}\n{series}")

Index(['a', 'b', 'c'], dtype='object')
a    1
b    2
c    3
dtype: int64


### Example

```apply``` gets each row in ```df``` as a series whose index is ```"col1", "col2", "col3"```.

In [3]:
x = np.arange(12).reshape((4, 3))
df = pd.DataFrame.from_records(x, columns=("col1", "col2", "col3"))
df

Unnamed: 0,col1,col2,col3
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11


In [4]:
def func(arg: pd.Series):
    print(f"arg/row is a series whose indices is {arg.index}")
    print(f"arg is \n{arg}\n")

    return arg

In [5]:
df.apply(func=func, axis=1, result_type="expand")

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
arg is 
col1    0
col2    1
col3    2
Name: 0, dtype: int64

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
arg is 
col1    3
col2    4
col3    5
Name: 1, dtype: int64

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
arg is 
col1    6
col2    7
col3    8
Name: 2, dtype: int64

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
arg is 
col1     9
col2    10
col3    11
Name: 3, dtype: int64



Unnamed: 0,col1,col2,col3
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11


# Return value of ```apply```

* [pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)

> result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None  
> These only act **when axis=1 (columns)**:
> * ‘expand’ : list-like results will be turned into columns.
> * ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
> * ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
> 
> The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However **if the apply function returns a Series these are expanded to columns**.

## axis=1

Need to return a **[Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html) or dictionary** to have column names in the result.


### Return series

In [6]:
def func(arg: pd.Series):
    print(f"arg/row is a series whose indices is {arg.index}")
    columns = ["col1", "col3"]
    values = arg.loc[columns]
    print(f"return arg{columns} as series:\n{values} \nas type:{type(values)}\n") 
    return values

In [7]:
df.apply(func=func, axis=1, result_type="expand")

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
return arg['col1', 'col3'] as series:
col1    0
col3    2
Name: 0, dtype: int64 
as type:<class 'pandas.core.series.Series'>

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
return arg['col1', 'col3'] as series:
col1    3
col3    5
Name: 1, dtype: int64 
as type:<class 'pandas.core.series.Series'>

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
return arg['col1', 'col3'] as series:
col1    6
col3    8
Name: 2, dtype: int64 
as type:<class 'pandas.core.series.Series'>

arg/row is a series whose indices is Index(['col1', 'col2', 'col3'], dtype='object')
return arg['col1', 'col3'] as series:
col1     9
col3    11
Name: 3, dtype: int64 
as type:<class 'pandas.core.series.Series'>



Unnamed: 0,col1,col3
0,0,2
1,3,5
2,6,8
3,9,11


### Return a dictionary

In [8]:
def func(arg: pd.Series):
    dictionary = {
        "column_03": arg.loc["col3"],
        "column_01": arg.loc["col1"]
    }
    print(f"return dictionary:{json.dumps(dictionary, indent=4, default=str)}\n") 
    return dictionary

In [9]:
df.apply(func=func, axis=1, result_type="expand")

return dictionary:{
    "column_03": "2",
    "column_01": "0"
}

return dictionary:{
    "column_03": "5",
    "column_01": "3"
}

return dictionary:{
    "column_03": "8",
    "column_01": "6"
}

return dictionary:{
    "column_03": "11",
    "column_01": "9"
}



Unnamed: 0,column_03,column_01
0,2,0
1,5,3
2,8,6
3,11,9
