# Functions, Rise & Assert Statements

## UBC MDS Extended Learning

### November 27

In [None]:
# Change to a docker file

!pip install numpy
!pip install pandas

In [1]:
import numpy as np
import pandas as pd

## Functions

A function is a relationship or mapping between one or more inputs and a set of outputs.

In mathematics, we represent a function typically like this:

> $z = f(x,y)$

> $ y = mx + b$

* $z$ is the output.
* $x$ and $y$ are the inputs.
* $f()$ represents "what happens" in the function.

For example:
> $z = log(x)$

When $x = 3$, then:  
> $z = log(3)$

In [34]:
# z = log(3)

def log_function(number):
    z = np.log(number)
    print(f"The log of {number} is {z}.")

In [35]:
log_function(number = 3)

The log of 3 is 1.0986122886681098.


In [4]:
log_function(3)

The log of 3 is 1.0986122886681098.


What is the error here?

Try saving log_function(5) and store it in a variable named `log_five`. 

In [36]:
log_five = log_function(5)

The log of 5 is 1.6094379124341003.


In [47]:
log_five

Here is when we realize we need our `return` statement. This is the only thing that will allow us to "save" the output that we want.

**NOTE** - **Do not** `return` a `print` statement as it will return a null object.

In [53]:
def log_function(happy_number):
    z = np.log(happy_number)
    print(f"The log of {happy_number} is {z}.")
    return z

log_five = log_function(5)

The log of 5 is 1.6094379124341003.


In [62]:
log_function(4)

The log of 4 is 1.3862943611198906.


1.3862943611198906

A return statement in a Python function serves two purposes:  
* It immediately terminates the function and passes execution control back to the caller.  
* It provides a mechanism by which the function can pass data back to the caller.  

In general:
* $f$ is a function that operates on inputs.
* Functions map the inputs to outputs.
* Programming functions are more generalized and versatile than mathematical functions.


When coding, a function is a self-contained block of code that encapsulates a specific task or group of tasks.

**Example:**

In [9]:
a = ['foo', 'bar', 'baz', 'qux']
a

['foo', 'bar', 'baz', 'qux']

In [10]:
len(a)

4

A built-in functions performs a specific task!

The code that accomplishes this task is defined **somewhere** - but you don’t need to know where. You don't even need to know how the code works.

You need to understand the function’s interface: 
> * What arguments (if any) it takes 
> * What values (if any) it returns

Then you call the function and pass the appropriate arguments.

For DS, you will not just use built-in functions - You will write our own functions!!

When you define your own Python function, it works just the same. From somewhere in your code, you’ll call your Python function and program execution will transfer to the body of code that makes up the function. 

BUT this time, since you wrote the code, and you know where it lives, you can see it!
Why bother defining functions? There are several very good reasons. Let’s go over a few now:

### Abstraction and Reusability
Suppose you write some code that does something useful task. And it is a task that you do several times.
You could "Copy-paste" but…
Later on, you’ll probably modify the code, or maybe you find a bug or you need to update it…
If you copied-pasted, you’ll need to make the necessary changes in every location.


Instead, use a function!! The abstraction of functionality into a function definition is an example of the **DRY** Principle of software development. This is arguably the strongest motivation for using functions.

### Modularity
Functions allow complex processes to be broken up into smaller steps.
Imagine, for example, that you have a program that reads in a file, processes the file contents, and then writes an output file. Your code could look like this:

```
# Main program
# Code to read file in
<statement>
<statement>
<statement>
<statement>
# Code to process file
<statement>
<statement>
<statement>
<statement>
# Code to write file out
<statement>
<statement>
<statement>
<statement>
```

Alternatively, you could structure the code more like the following:

```
# Main program
read_file()
process_file()
write_file()
```

PS. Here, you do have three scripts where you have defined the functions.  For example:
```
def read_file():
# Code to read file in
<statement>
<statement>
<statement>
<statement>
```

### Function Calls and Definition 

A programming function is written as:

```
def <function_name>([<parameters>]):
    '''
    Docstrings
    '''
    <statement(s)>
    <return>
```

| Component | Meaning|
|----| ----|
|def | Keyword that informs Python a function is being defined|
|<function_name> | A valid Python identifier that names the function |
|<parameter(s)> | An optional, comma-separated list of arguments that can be passed to the function |
|:| Punctuation that denotes the end of the function header |
|'''Docstrings'''| Documentation regarding the function |
| <statement(s)> | A block of valid Python statements |
| return | What the output is expected to be |

## Assert Statements

Python's **assert** statements are a debugging aid that test a condition.

* If the condition is `true`, it does nothing and your program continues to execute.
* If the assert condition evaluates to `false`, it raises an `AssertionError` exception with an optional error message.
* Assertions are internal self-checks for your program. They work by declaring some conditions as impossible in your code. If one of these conditions doesn't hold that means there's a bug in the program.
* If your program is bug-free, these conditions will never occur.
* If the condictions occur, the program will crash with an assertion error telling you exactly which “impossible” condition was triggered. This makes it easier to track down and fix bugs in your programs.

#### Python’s assert statement is a debugging aid. An assertion error should never be raised unless there’s a bug in your program.

### Example: 

Suppose you were building an online store with Python. You're working to add a discount coupon functionality to the system and eventually write the following apply_discount function:

In [63]:
def apply_discount(product, discount):
    discount = discount/100
    price = int(product['price'] * (1.0 - discount))
    return price

What are possible errors?

In [1]:
shoes = {'name': 'Fancy Shoes', 'price': 320}


In [65]:
#
# 25% off -> $111.75
#
apply_discount(shoes, 25)

240

I make an error, maybe I put an additional `1` at the beginning of the discount, or maybe I apply a negative discount.

In [68]:
apply_discount(shoes, discount = -25)

400

**Solution:** Use an assert statement that guarantees that, no matter what, discounted prices cannot be lower than $0 and they cannot be higher than the original price of the product.
Let’s make sure this actually works as intended if we call this function to apply a valid discount:

In [8]:
def test_apply_discount():
    shoes = {'name': 'Fancy Shoes', 'price': 320}
    
    assert 0 <= apply_discount(shoes, 25), "Am I giving the customer extra money????"
    assert apply_discount(shoes, 25) <= shoes['price'], "Am I giving the customer extra money????"
    
    return 

test_apply_discount()

In [4]:
def apply_discount(product, discount):
    '''
    This is where you write your documentation
    Inputs:
    -------
    product: *dict* name of the product
    discount: int percentage 0 to 100 that you want to discount
    
    Returns:
    -------
    price int, new price after discount
    
    Example:
    apply_discount(shoes, 25)
    '''
    discount = discount/100
    price = int(product['price'] * (1.0 - discount))

    
    return price

In [21]:
def my_crazy_sum(x, y):
    return (x * y)

In [22]:
my_crazy_sum(10, 6)

60

In [24]:
def test_my_crazy_sum():
    assert my_crazy_sum(2, 2) == 4, "wrong"
    assert my_crazy_sum(10, 4) == 14, "wrong"
    return

test_my_crazy_sum()

AssertionError: wrong

In [75]:
apply_discount(shoes, 125)

AssertionError: Am I giving the customer extra money????

In [17]:
apply_discount(shoes, -25)

AssertionError: Am I charging the customer extra?

In [82]:
?apply_discount

### Question 4(b) Assignment 6

In [27]:
import pandas as pd
raw = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(raw)
data

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0
2,4801,Hayley,Kitsilano,part-time,30.0
3,4540,Mike,Sunset,full-time,25.5
4,3581,Tiffany,Arbutus-ridge,part-time,32.0
5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,1934,Rocio,Kitsilano,full-time,27.0
7,4944,Eric,West end,part-time,28.0
8,1983,Monique,Kitsilano,part-time,25.5
9,1266,Emma,Arbutus-ridge,full-time,23.0


Sample dataframe idea: Sometimes, I want to group by neighbourhood, sometimes by type
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
  .

In [37]:
data.groupby('neighbourhood').get_group('Kitsilano').reset_index(drop=True).sample(1)

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,4801,Hayley,Kitsilano,part-time,30.0


In [40]:
def sample_dataframe(data, grouping_col, N = 1):
    """
    Description
    
    Parameters
    ----------
    data : pandas.core.frame.DataFrame
        The dataframe to sample from
    grouping_col : str
        The column to filter our condition on
    N : int, optional
        The number of rows to sample from each group (The default value is 1
        which implies a single observation)
        
    Returns
    -------
    pandas.core.frame.DataFrame 
        The new sampled dataframe 
        
    Examples
    --------
    >>> sample_dataframe(pokemon, 'legendary'])
    """
    
    df_grouped = data.groupby(grouping_col)
    
    sampled_df = None
    
    for group, rows in df_grouped: 
        group_sampling =  df_grouped.get_group(group).sample(N)
        sampled_df = pd.concat([sampled_df, group_sampling])
    
    return sampled_df

In [38]:
data

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0
2,4801,Hayley,Kitsilano,part-time,30.0
3,4540,Mike,Sunset,full-time,25.5
4,3581,Tiffany,Arbutus-ridge,part-time,32.0
5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,1934,Rocio,Kitsilano,full-time,27.0
7,4944,Eric,West end,part-time,28.0
8,1983,Monique,Kitsilano,part-time,25.5
9,1266,Emma,Arbutus-ridge,full-time,23.0


In [43]:
sample_dataframe(data, 'type', N = 2)

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
9,1266,Emma,Arbutus-ridge,full-time,23.0
4,3581,Tiffany,Arbutus-ridge,part-time,32.0
7,4944,Eric,West end,part-time,28.0


In [24]:
sample = sample_dataframe(data, 'type', N = 3)
sample[sample['type'] == 'full-time'].shape

(3, 5)

What should have happened before I started even coding my function:

How can I make sure that I don't make a mistake or create a bug while coding?

What can I ask myself regarding: What can go wrong?


.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
.  
  .

Our user may not know the right syntax and might input weird things such as:

In [25]:
data.head(2)

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0


In [26]:
sample_dataframe(raw, grouping_col='neighbourhood', N = 2)

AttributeError: 'dict' object has no attribute 'groupby'

In [27]:
sample_dataframe(data, grouping_col='neighbourhood', N = "Two")

TypeError: not all arguments converted during string formatting

## Things to Do:
### Raise Statements - To help the USER

In [73]:
data = data.reset_index()
data

Unnamed: 0,level_0,index,employee_id,name,neighbourhood,type,hourly_rate
0,0,0,1873,Josh,Sunset,full-time,25.0
1,1,1,4913,Laura,West end,part-time,27.0
2,2,2,4801,Hayley,Kitsilano,part-time,30.0
3,3,3,4540,Mike,Sunset,full-time,25.5
4,4,4,3581,Tiffany,Arbutus-ridge,part-time,32.0
5,5,5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,6,6,1934,Rocio,Kitsilano,full-time,27.0
7,7,7,4944,Eric,West end,part-time,28.0
8,8,8,1983,Monique,Kitsilano,part-time,25.5
9,9,9,1266,Emma,Arbutus-ridge,full-time,23.0


In [76]:
9 in data['name']

True

In [83]:
'Tiffany' in data['name'].value

AttributeError: 'Series' object has no attribute 'value'

In [81]:
'Tiffany' in data['name'].values

True

In [58]:
def sample_dataframe(data, grouping_col, N = 1):
    
    # Checks if a dataframe is the type of object being passed into the data argument
    if not isinstance(data, pd.DataFrame): 
        raise TypeError("The data argument is not of type DataFrame")
     
    # Checks if N is of type int
    if type(N) != int: 
        raise TypeError("The N argument is not of type int")
    df_grouped = data.groupby(grouping_col)
    
    sampled_df = None
    
    for group, rows in df_grouped: 
        group_sampling =  df_grouped.get_group(group).sample(N)
        sampled_df = pd.concat([sampled_df, group_sampling])
    
    return sampled_df

In [59]:
sample_dataframe(data, grouping_col='neighbourhood', N = "two")

TypeError: The N argument is not of type int

Homework: Look at try except statements

### Assert Statements - To help the DEVELOPER

In [30]:
data

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0
2,4801,Hayley,Kitsilano,part-time,30.0
3,4540,Mike,Sunset,full-time,25.5
4,3581,Tiffany,Arbutus-ridge,part-time,32.0
5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,1934,Rocio,Kitsilano,full-time,27.0
7,4944,Eric,West end,part-time,28.0
8,1983,Monique,Kitsilano,part-time,25.5
9,1266,Emma,Arbutus-ridge,full-time,23.0


In [52]:
sample_dataframe(data, 'type', 1).groupby('type').ngroups

2

In [32]:
def test_sample_dataframe():
    raw = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

    helper_data = pd.DataFrame.from_dict(raw)
        
    # Tests that the expected number of rows and columns are correct
    assert sample_dataframe(data, 'type', 1).shape == (2, 5)
    
    # Tests that the expected number of groups is 4
    assert sample_dataframe(data, 'type', 1).groupby('type').ngroups == 2
      
    sampler = sample_dataframe(helper_data, 'type', 3)
    
    # Tests that the datatype of a column is a number
    # This one is probably not useful but since a lot have asked how you could assert a column's dtype,
    # I've included the code
    
    assert sampler['type'].dtype == 'O', "type is not a string"
    
    return

In [33]:
test_sample_dataframe()

## Explaining for loops in groupby objects.

### Office Hours November 24

In [None]:
raw = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(raw)
data

In [None]:
data.groupby('type').get_group('full-time')

In [None]:
grouped_df = data.groupby('type')

In [None]:
grouped_df

In [None]:
for group, rows in grouped_df:
    print(rows)

In [None]:
for group, rows in grouped_df:
    print(rows.sample(1))

In [None]:
for group, rows in grouped_df:
    print(group)

In [None]:
for group, rows in grouped_df:
    print(print(group), rows.sample(1))

In [None]:
raw.items()

In [None]:
raw

In [None]:
for key, value in raw.items():
    print("new iteration")
    print(key, value)
    

In [None]:
def my_function(z, y):
    return z + y

In [None]:
z = my_function(5, 3)

In [None]:
z

In [None]:
my_list = [0, 1, 2, 3, 4]

In [None]:
[i+5 for i in my_list]

In [None]:
new_list = []
for number in my_list:
    new_list.append(number + 5)

new_list

In [None]:
new_list2 = [number + 5 for number in my_list if number > 2]
new_list2