# Functions

Applying made functions using the [`.apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) to a Pandas column or Series can be challenging to understand. You will use a made up DataFrame to learn how to do this step by step. First you will apply a function to a Series (and it will return a Series) by dealing with the Series alone. You will then learn how to do the same without separating the Series from the DataFrame. 

In [1]:
import pandas as pd

In [2]:
data = pd.DataFrame({'EmployeeName': ['Callen Dunkley', 'Sarah Rayner', 'Jeanette Sloan', 'Kaycee Acosta', 'Henri Conroy', 'Emma Peralta', 'Martin Butt', 'Alex Jensen', 'Kim Howarth', 'Jane Burnett'],
                    'Department': ['Accounting', 'Engineering', 'Engineering', 'HR', 'HR', 'HR', 'Data Science', 'Data Science', 'Accounting', 'Data Science'],
                    'HireDate': [2010, 2018, 2012, 2014, 2014, 2018, 2020, 2018, 2020, 2012],
                    'Sex': ['M', 'F', 'F', 'F', 'M', 'F', 'M', 'M', 'M', 'F'],
                    'Birthdate': ['04/09/1982', '14/04/1981', '06/05/1997', '08/01/1986', '10/10/1988', '12/11/1992', '10/04/1991', '16/07/1995', '08/10/1992', '11/10/1979'],
                    'Weight': [78, 80, 66, 67, 90, 57, 115, 87, 95, 57],
                    'Height': [176, 160, 169, 157, 185, 164, 195, 180, 174, 165],
                    'Kids': [2, 1, 0, 1, 1, 0, 2, 0, 3, 1]
                    })
data

Unnamed: 0,EmployeeName,Department,HireDate,Sex,Birthdate,Weight,Height,Kids
0,Callen Dunkley,Accounting,2010,M,04/09/1982,78,176,2
1,Sarah Rayner,Engineering,2018,F,14/04/1981,80,160,1
2,Jeanette Sloan,Engineering,2012,F,06/05/1997,66,169,0
3,Kaycee Acosta,HR,2014,F,08/01/1986,67,157,1
4,Henri Conroy,HR,2014,M,10/10/1988,90,185,1
5,Emma Peralta,HR,2018,F,12/11/1992,57,164,0
6,Martin Butt,Data Science,2020,M,10/04/1991,115,195,2
7,Alex Jensen,Data Science,2018,M,16/07/1995,87,180,0
8,Kim Howarth,Accounting,2020,M,08/10/1992,95,174,3
9,Jane Burnett,Data Science,2012,F,11/10/1979,57,165,1


We will create a function that separates the fullname column and returns the first part (the firstname).

In [3]:
def get_name(name):
    return name.split(' ')[0]

In [4]:
# apply function
data.EmployeeName.apply(get_name)

0      Callen
1       Sarah
2    Jeanette
3      Kaycee
4       Henri
5        Emma
6      Martin
7        Alex
8         Kim
9        Jane
Name: EmployeeName, dtype: object

Let's extend the function so that it can return either the firstname or the lastname.

In [5]:
# create a function that returns either the firstname or lastname given a series with full name
def get_name2(name, first= True):
    if first:
        return name.split(' ')[0]
    else:
        return name.split(' ')[1]

By default, assumes that the first parameter (in this case *name*) corresponds to the value in each row of the column. the easiest way to provide argument for the second parameter (*first*) is by simply supplying the name of the parameter and the argument after the function.

In [6]:
# apply second function by keyword, i.e. using kwargs
data.EmployeeName.apply(get_name2, first= False)

0    Dunkley
1     Rayner
2      Sloan
3     Acosta
4     Conroy
5    Peralta
6       Butt
7     Jensen
8    Howarth
9    Burnett
Name: EmployeeName, dtype: object

Next, we want to apply a version of the function without having to singling out the column or Series.

In [7]:
# create function that  returns either the firstname or lastname given a column in a dataframe containing full name
def get_name4(row, first= True):
    if first:
        return row['EmployeeName'].split(' ')[0]
    else:
        return row['EmployeeName'].split(' ')[1]

The function above is somewhat of a compromise. We have *hardwired* (i.e. fixed) the name of the column within the function to be 'EmployeeName'. The function will look at the each row in the 'EmployeeName'.

Because you are applying the function to the entire DataFrame, rather than to just a column or Series, we need to tell Pandas along which axis we want to apply the function

In [10]:
# apply above function
data.apply(get_name4, first= False, axis= 'columns')

0    Dunkley
1     Rayner
2      Sloan
3     Acosta
4     Conroy
5    Peralta
6       Butt
7     Jensen
8    Howarth
9    Burnett
dtype: object

In the final iteration of the function. The function will be designed so that it can be applied to any column, regardless of its name. To do that, you need to introduce a new parameter, called *col* that will accept as an argument the name of a column or Series.

In [11]:
# create function that  returns either the firstname or lastname given ANY column in a dataframe containing full name
def get_name5(row, col, first= True):
    if first:
        return row[col].split(' ')[0]
    else:
        return row[col].split(' ')[1]

In [12]:
# Apply above function
data.apply(get_name5, col= 'EmployeeName', first= False, axis= 'columns')

0    Dunkley
1     Rayner
2      Sloan
3     Acosta
4     Conroy
5    Peralta
6       Butt
7     Jensen
8    Howarth
9    Burnett
dtype: object

To end, we can use our function to generate to new columns in our DataFrame.

In [42]:
# Apply above function to create a firstname entry in the dataframe
data['Firstname'] = data.apply(get_name5, col= 'EmployeeName', first= False, axis= 1)

In [43]:
# Apply above function to create a lastname entry in the dataframe
data['Lastname'] = data.apply(get_name5, col= 'EmployeeName', axis= 1)

In [44]:
# show results
data.head(4)

Unnamed: 0,EmployeeName,Department,HireDate,Sex,Birthdate,Weight,Height,Kids,Firstname,Lastname
0,Callen Dunkley,Accounting,2010,M,04/09/1982,78,176,2,Dunkley,Callen
1,Sarah Rayner,Engineering,2018,F,14/04/1981,80,160,1,Rayner,Sarah
2,Jeanette Sloan,Engineering,2012,F,06/05/1997,66,169,0,Sloan,Jeanette
3,Kaycee Acosta,HR,2014,F,08/01/1986,67,157,1,Acosta,Kaycee


### Additional References

- TBD