# Table of contents

1. [Calculating Age Based on Todays Date](#introduction)
    1. [Completing the Function](#paragraph1)

2. [Creating a function with multiple optional parameters and outputs](#paragraph2)
    1. [Function with True/False Parameter](#paragraph3)
    2. [Function with a Column Parameter](#paragraph4)
    3. [Function with a Multiple Outputs](#paragraph5)
3. [Index](#paragraph6)

## Calculating Age Based on Todays Date <a name="introduction"></a>

If you are ever tasked with calculating age, you can create a function like below.

In [1]:
import pandas as pd
import numpy as np

Lets generate some random dates to work with. You don't need to copy this code as it is used to generate a dataframe with a column of random dates

In [2]:
# Generate some random dates to work with
def random_dates(start, end, n=10):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

start = pd.to_datetime('1950-01-01')
end = pd.to_datetime('1999-01-01')
list_of_dates = random_dates(start, end)

df = pd.DataFrame(list_of_dates, columns=['Birthday'])
df

Unnamed: 0,Birthday
0,1955-09-20 02:55:02
1,1984-12-19 02:02:49
2,1992-08-01 11:02:08
3,1978-10-14 04:56:37
4,1983-04-13 22:06:53
5,1976-08-14 22:50:27
6,1951-01-20 20:24:45
7,1987-10-01 07:25:01
8,1977-01-25 16:38:40
9,1974-01-31 06:30:08


Now that we have a dataframe to work with, first create a variable called `end_date` which contains todays date:

In [3]:
end_date = pd.Timestamp.now().normalize()
end_date

Timestamp('2020-07-18 00:00:00')

Now we can create our function and then `apply()` it to our date column:

## Completing the Function <a name="paragraph1"></a>

In [4]:
# Function to calculate age based on today's date
def calculate_age(born):
    return end_date.year - born.year - ((end_date.month, end_date.day) < (born.month, born.day))

In [5]:
# Apply the above function and create a new column for age
df['Age'] = df['Birthday'].apply(calculate_age)

In [6]:
df

Unnamed: 0,Birthday,Age
0,1955-09-20 02:55:02,64
1,1984-12-19 02:02:49,35
2,1992-08-01 11:02:08,27
3,1978-10-14 04:56:37,41
4,1983-04-13 22:06:53,37
5,1976-08-14 22:50:27,43
6,1951-01-20 20:24:45,69
7,1987-10-01 07:25:01,32
8,1977-01-25 16:38:40,43
9,1974-01-31 06:30:08,46


## Creating a function with multiple optional parameters and outputs <a name="paragraph2"></a>

When creating a function, sometimes parameters are optional if the user needs to specify a specific option. 

* For example, if you are calculating total profit a month you can have an optional parameter that allows you to split the results by gender, if required by the user.

* A function can also return multiple results; this could be handy if there are multiple different dataframes you need in the output.

We'll be using the [superstore dataset](https://github.com/kn-kn/setup-py-functions/blob/master/Sample_Superstore_Data.xls) for our example.

In [7]:
import pandas as pd

In [8]:
# Import Data
df = pd.read_excel("Sample_Superstore_Data.xls")

In [9]:
df.head(3)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2013-152156,2013-11-09,2013-11-12,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2013-152156,2013-11-09,2013-11-12,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2013-138688,2013-06-13,2013-06-17,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714


## Function with True/False Parameter <a name="paragraph3"></a>

Lets create a function that has multiple parameters. It will read in the superstore data and calculate total profit.

In [10]:
def superstore_profit():
    df = pd.read_excel("Sample_Superstore_Data.xls")
    df_profit = df['Profit'].sum()

Now we can add a parameter that will allow the user to choose if they want to calculate profit *but* by region.

By default, we will leave `region=False`. If the user sets it to `True`, then the function will calculate profit based on region. If the user leaves this parameter blank, then the function automatically assumes the default answer, `False`

In [11]:
def superstore_profit(region=False):
    df = pd.read_excel("Sample_Superstore_Data.xls")
    
    # If region is True, then do this: 
    if region:
        df_profit = df.groupby('Region')['Profit'].sum()
    # Else if its False, then do this:
    else:
        df_profit = df['Profit'].sum()
        
    print(df_profit)

Lets see what happens when we:

* Leave the parameter empty
* Specify `True`
* Specify `False`

In [12]:
# Leave it empty
superstore_profit()

286397.0216999999


In [13]:
# Specify it True
superstore_profit(region=True)

Region
Central     39706.3625
East        91522.7800
South       46749.4303
West       108418.4489
Name: Profit, dtype: float64


In [14]:
# Specify it False
superstore_profit(region=False)

286397.0216999999


## Function with a Column Parameter <a name="paragraph4"></a>

We can create a parameter that focuses on a name of a column instead.

Lets say sometimes you would rather calculate `sales` instead of `profit`. You can this to be a parameter the user inputs if they desire to.

Lets start by adding the new parameter to our previously created function:

In [15]:
def superstore_profit(region=False, aggregate='Profit'):
    df = pd.read_excel("Sample_Superstore_Data.xls")
    
    if region:
        df_profit = df.groupby('Region')[aggregate].sum()
    else:
        df_profit = df[aggregate].sum()
        
    print(df_profit)

What you see above is I created a new parameter called `aggregate` and gave it the default value of `Profit`. 

If the user does not input anything, profit will be calculated. If the user chooses to, they can specify a different column to calculate instead. Lets see what happens when I input:

* Nothing
* `Sales`
* `Quantity`

In [16]:
superstore_profit(region=True)

Region
Central     39706.3625
East        91522.7800
South       46749.4303
West       108418.4489
Name: Profit, dtype: float64


In [17]:
superstore_profit(region=True, aggregate='Sales')

Region
Central    501239.8908
East       678781.2400
South      391721.9050
West       725457.8245
Name: Sales, dtype: float64


In [18]:
superstore_profit(region=True, aggregate='Quantity')

Region
Central     8780
East       10618
South       6209
West       12266
Name: Quantity, dtype: int64


## Function with a Multiple Outputs <a name="paragraph5"></a>

Now finally, lets say we need to calculate both sales and profit and have them in different dataframes.

Lets change our previous function we've created with these changes:

* Remove the `aggregate` parameter
* Instead of `print()`ing our result, we will use `return` instead
* Calculate both the total `profit` and `sales` in our function

In [19]:
def superstore_profit(region=False):
    df = pd.read_excel("Sample_Superstore_Data.xls")
    
    if region:
        df_profit = df.groupby('Region')['Profit'].sum()
        df_sales = df.groupby('Region')['Sales'].sum()
    else:
        df_profit = df['Profit'].sum()
        df_sales = df['Sales'].sum()
    
    # Returns both dataframes
    return df_profit, df_sales

You can run the above function as is and get an output like this:

In [20]:
superstore_profit()

(286397.0216999999, 2297200.8603000003)

In [21]:
superstore_profit(region=True)

(Region
 Central     39706.3625
 East        91522.7800
 South       46749.4303
 West       108418.4489
 Name: Profit, dtype: float64,
 Region
 Central    501239.8908
 East       678781.2400
 South      391721.9050
 West       725457.8245
 Name: Sales, dtype: float64)

Or you can specify them to a variable, keep in mind the order as `df_profit` is the first frame to be returned:

In [22]:
df_my_profit, df_my_sales = superstore_profit(region=True)

In [23]:
df_my_profit

Region
Central     39706.3625
East        91522.7800
South       46749.4303
West       108418.4489
Name: Profit, dtype: float64

In [24]:
df_my_sales

Region
Central    501239.8908
East       678781.2400
South      391721.9050
West       725457.8245
Name: Sales, dtype: float64

## Index<a name="paragraph6"></a>

Here are the functions that were created in this notebook summarized:

**Calculating Age**

In [None]:
# Todays date
end_date = pd.Timestamp.now().normalize()
end_date

# Function to calculate age based on today's date
def calculate_age(born):
    return end_date.year - born.year - ((end_date.month, end_date.day) < (born.month, born.day))

# Example of applying the function to create an "Age" column
df['Age'] = df['Birthday'].apply(calculate_age)

**Function with Multiple Parameters**

In [None]:
def superstore_profit(region=False, aggregate='Profit'):
    df = pd.read_excel("Sample_Superstore_Data.xls")
    
    if region:
        df_profit = df.groupby('Region')[aggregate].sum()
    else:
        df_profit = df[aggregate].sum()
        
    print(df_profit)

**Function with Multiple Outputs**

In [None]:
def superstore_profit(region=False):
    df = pd.read_excel("Sample_Superstore_Data.xls")
    
    if region:
        df_profit = df.groupby('Region')['Profit'].sum()
        df_sales = df.groupby('Region')['Sales'].sum()
    else:
        df_profit = df['Profit'].sum()
        df_sales = df['Sales'].sum()
    
    # Returns both dataframes
    return df_profit, df_sales