# Importing Functions

## UBC MDS Extended Learning

### February 27, 2023

In [None]:
import numpy as np
import pandas as pd

## Functions

In [None]:
def log_function(happy_number):
    z = np.log(happy_number)
    print(f"The log of {happy_number} is {z}.")
    return z

log_five = log_function(5)

### Modularity
Functions allow complex processes to be broken up into smaller steps.
Imagine, for example, that you have a program that reads in a file, processes the file contents, and then writes an output file. Your code could look like this:

```
# Main program
# Code to read file in
<statement>
<statement>
<statement>
<statement>
# Code to process file
<statement>
<statement>
<statement>
<statement>
# Code to write file out
<statement>
<statement>
<statement>
<statement>
```

Ideally, instead of the above, you would have a more modular code that looks like:

```
# Main program
read_file()
process_file()
write_file()
```

PS. Here, you do have three scripts where you have defined the functions.  For example:
```
def read_file():
# Code to read file in
<statement>
<statement>
<statement>
<statement>
```

### Function Calls and Definition 

A programming function is written as:

```
def <function_name>([<parameters>]):
    '''
    Docstrings
    '''
    <statement(s)>
    <return>
```

### How do I import a function?

- You have been importing whole sets of functions when you do:
```python
import pandas as pd
```

- But if you only wanted to import **one** function from pandas, for example, `pd.read_csv` you could do:

```
from pandas import read_csv
```

What is the difference? 

Anytime you call read_csv with the first option, you need to do:
```python
pd.read_csv
```

With the second option, you can do:
```python
read_csv
```

Now, notice that you are never passing the extension `.py`

In [2]:
import pandas as pd
a_dict = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(a_dict)
data.head()

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0
2,4801,Hayley,Kitsilano,part-time,30.0
3,4540,Mike,Sunset,full-time,25.5
4,3581,Tiffany,Arbutus-ridge,part-time,32.0


### When using your own functions, how to call them

- I have created the `sampling.py` in my local computer. 
- There is a function called `sample_dataframe`

Explore doing
1) 
```python
import sampling
sampling.sample_dataframe(data, 'neighbourhood')
```

2)
```python
from sampling import sample_dataframe()
sample_dataframe(data, 'neighbourhood')
```

3) 
```python
import sampling as sp
sp.sample_dataframe(data, 'neighbourhood')
```

Is `sampling.py` formatted properly?

Verify with `flake`

In [3]:
!flake8 sampling.py

sampling.py:4:43: E251 unexpected spaces around keyword / parameter equals
sampling.py:4:45: E251 unexpected spaces around keyword / parameter equals
sampling.py:8:1: W293 blank line contains whitespace
sampling.py:18:1: W293 blank line contains whitespace
sampling.py:23:1: W293 blank line contains whitespace
sampling.py:31:1: W293 blank line contains whitespace
sampling.py:33:1: W293 blank line contains whitespace
sampling.py:35:1: W293 blank line contains whitespace
sampling.py:36:35: W291 trailing whitespace
sampling.py:37:25: E222 multiple spaces after operator
sampling.py:39:1: W293 blank line contains whitespace


We can see there are many errors, are we going to fix them manually?

- I don't think so...

- Use black instead...

In [4]:
# Once this is run, the previous will not find errors anymore.
# !black sampling.py

### Testing files 

- Testing files, for order, usually live in another folder called tests
- For now, we will have them on the same directory.

- We are going to use a library called `pytest`
- `pytest` looks for all testing files within a directory and assess them.

In [5]:
!pytest test_sampling.py

platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/haritoto/Desktop/UBC/Instructor/UBC_EL_PPDS
plugins: anyio-3.5.0
collected 6 items                                                              [0m

test_sampling.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[31mF[0m[31m                                                  [100%][0m

[31m[1m________________________________ test_sd_cherry ________________________________[0m

    [94mdef[39;49;00m [92mtest_sd_cherry[39;49;00m():
        raw = {[33m'[39;49;00m[33mid[39;49;00m[33m'[39;49;00m: [[94m1873[39;49;00m, [94m4913[39;49;00m, [94m4801[39;49;00m, [94m4540[39;49;00m, [94m3581[39;49;00m,
                       [94m4534[39;49;00m, [94m1934[39;49;00m, [94m4944[39;49;00m, [94m1983[39;49;00m, [94m1266[39;49;00m],
               [33m'[39;49;00m[33mname[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mEnglish Oak[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mHigan

## Explaining for loops in groupby objects.

In [None]:
a_dict = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(a_dict)
data

In [None]:
data.groupby('type').get_group('full-time')

In [None]:
grouped_df = data.groupby('type')

In [None]:
grouped_df

In [None]:
for group, rows in grouped_df:
    print(rows)

In [None]:
for group, rows in grouped_df:
    print(rows.sample(1))

In [None]:
for group, rows in grouped_df:
    print(group)

In [None]:
for group, rows in grouped_df:
    print(print(group), rows.sample(1))