# Using Apply On A DataFrame

## Notebook Outline:
* <a href='#introtoapply'>Introduction to .apply()</a>
* <a href='#examplebabyname'>An example on our baby boy name data</a>
* <a href='#examplelaborsheet'>Another example on our labor sheet data</a>

<a name=introtoapply></a>
# Introduction to .apply()

The .apply() method we are going to learn about is exactly like the .apply() we learned about int our groupby lectures.  The only difference is that this the method for dataframes instead of groupby objects.

The docs are here: <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html>

We are going to go back to our baby boy name data for our first example. So, let's load the data!

<a name=examplebabyname></a>
# An example of using .apply() on our baby boy name data:

In [None]:
import pandas as pd
import os

filepath = os.path.join(os.getcwd(), 'data', 'Most_Popular_Baby_Boy_Names__1980-2013.csv')
nameData = pd.read_csv(filepath)

#### Let's take a look at these name values again - we will use .unique() to do so:

In [None]:
nameData['Name'].unique()

#### Most likely, we don't want Michael and MICHAEL to count as two different names - let's fix that using .apply(), lambda, and .lower() to fix this problem!

The basic idea here is want to lower case all of the names, this way all the 'Michael's will be changed to 'michael' and the 'MICHAEL's will be changed to 'michael'. Now every Michael will be counted as the same name! We can lower case string using the .lower() method, let's review this below.

In [None]:
name = 'MICHAEL'
print(name.lower())

# Note you can call .lower() directly on the string as well:
print('MICHAEL'.lower())

#### Now let's use .apply() to _apply_ the .lower() method to the all the names in the 'Names' column
We have to wrap it in a lamabda function though! We can only pass _functions_ to the apply() method, we can not pass other methods. 

As a review, let's first write function that will convert a string to lower case.

In [None]:
def lowerString(aString):
    lowerString = aString.lower()
    return lowerString

lowerString('MICHAEL')

#### Let's now apply this function to the entire column of names

In [None]:
nameData['Name'].apply(lowerString)

#### Now, let's do the same, but use a _lambda_ function instead!

In [None]:
nameData['Name'].apply(lambda x: x.lower())

#### Now let's actually update the values in the dataset:

In [None]:
nameData.loc[:, 'Name'] = nameData['Name'].apply(lambda x: x.lower())

In [None]:
nameData['Name'].unique()

#### Now we are free to groupby the names!

In [None]:
nameData.groupby('Name')['Rank'].mean().sort_values()

#### There is also another way to convert strings in a column to lower case that is better: 
We can access the string properties of a column with strings in it by using the .str attribute (very similar to how we accessed the datetime properties using a .dt attribute.

In [None]:
nameData['Name'] = nameData['Name'].str.lower()
nameData.head(3)

<a name=examplelaborsheet></a>
# Another example of apply on our labor sheet data

In [None]:
filepath = (baseFilepath + 'LaborSheetData.csv')
laborSheetData = pd.read_csv(filepath, parse_dates=[[2, 3], 13])
laborSheetData.head()

#### Let's calculate the difference between 'TimeStamp' and 'Date_Hour' using .apply() and lambda. Because we are applying the apply() to the whole dataframe, we have to specific the axis that we will be apply the function across - this is confusing, so said another way if you want to apply the function every column then axis will equal 0, if you want to apply the function to every row, axis will equal 1


In [None]:
laborSheetData.loc[:, 'lateEnteringData'] = laborSheetData.apply(lambda x: x['TimeStamp'] - x['Date_Hour'], axis=1)

In [None]:
laborSheetData.head(3)

In [None]:
laborSheetData.loc[:, 'lateEnteringData'] = laborSheetData['TimeStamp'] - laborSheetData['Date_Hour']

In [None]:
laborSheetData.head(3)

In [None]:
laborSheetData.loc[laborSheetData['Store']==10606, :].groupby('Manager')['Sales'].mean().sort_values()

### In Class Exercise
Please create a cell below and use apply to manipulate the data in the laborSheetData DataFrame

## Question or Comments About This Notebook?
Feel free to contact me via my LinkedIn: https://www.linkedin.com/in/william-j-henry <br>
You can also email me at will@henryanalytics.com <br>