# UBC
## Programming in Python for DS

### Week 7
Instructor: Socorro Dominguez-Vidana

- [] Describe what Python libraries are, as well as explain when and why they are useful.
- [] Identify where code can be improved concerning variable names, magic numbers, comments and whitespace.
- [] Write code that is human readable and follows the black style guide.
- [] Import files from other directories.
- [] Use pytest to check a function's tests.
- [] When running pytest, explain how pytest finds the associated test functions.
- [] Explain how the Python debugger can help rectify your code.

### Python libraries 
- Collections of pre-written code or modules that provide various functionalities to perform specific tasks.
- These libraries contain reusable classes (for example `DataFrame`), functions (e.g. `groupby()`, `sortby()`) and constants that can be imported and utilized in your Python programs.
- Libraries save time and effort by offering ready-made solutions to common problems. 

#### Importing a python library
```python
import pandas as pd
```
- `import` is a key word
- `pandas` is the library we have used all this time
- `as` we are stating an alias
- `pd` so that we can write `pd.read_csv` instead of `pandas.read_csv`

In [None]:
from pandas import read_csv

```python
from exponent_list import exponent_list
exponent_list(3,4,5)
```

### Code Improvements

- Code you write will be used in the future. Make sure to aim for best practices:
    - **Variable Names:** Use descriptive and meaningful names that indicate the purpose of the variable. Avoid single-letter names or ambiguous terms.
    - **Magic Numbers:** Replace hardcoded numbers with named constants or variables to enhance readability and maintainability.
    - **Comments:** Add comments to clarify complex code sections, explain the intent behind the logic, or provide context where the code might be unclear.
    - **Whitespace:** Ensure consistent indentation, use whitespace to enhance code readability, and follow the [PEP 8](https://peps.python.org/pep-0008/) style guide.

- You cannot always do it perfectly. But to help you we have `flake8` and `black`

 ```python
flake8 <file>.py
black <file>.py
```

- `flake8` will help you find style errors
- `black` will correct style errors

Let's see how to use them in the terminal and in here:

In [None]:
!flake8 sampling.py

In [None]:
# !black sampling.py

### Import functions from other files

So far, we have defined functions in the notebook and used them there: 

```python
def sample_dataframe(data, grouping_col, N = 1):
    df_grouped = data.groupby(grouping_col)
    
    sampled_df = None
    
    for group, rows in df_grouped: 
        group_sampling =  df_grouped.get_group(group).sample(N)
        sampled_df = pd.concat([sampled_df, group_sampling])
    
    return sampled_df
sample_dataframe(data, 'column', N=3)
```

But that breaks the flow of the notebook.

What if we could write the function elsewhere but use it in our notebook to not break the flow?

#### How to call your own functions

- You have been importing whole sets of functions when you do:
```python
import pandas as pd
```

- But if you only wanted to import **one** function from pandas, for example, `pd.read_csv` you could do:

```python
from pandas import read_csv
```

# What about personal functions / scripts

- I have created the `sampling.py` in my local computer. 
- There is a function called `sample_dataframe`

Explore doing

1) 
```python
import sampling
sampling.sample_dataframe(data, 'neighbourhood')
```
- `import` keyword to bring from a different file

2) 
```python
from sampling import sample_dataframe
sample_dataframe(data, 'neighbourhood')
```
- `from` keyword to say that from a particular script, we are only handpicking what we want to use

3) 
```python
import sampling as sp
sp.sample_dataframe(data, 'neighbourhood')
sp.sample_dataframe2(data, 'neighbourhood')
```
- `as` keyword for creating an alias

##### Example

In [None]:
# Toy Dataset 
import pandas as pd
a_dict = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(a_dict)
data.head()

In [None]:
import sampling as sp

sp.sample_dataframe(data, 'neighbourhood')

Is `sampling.py` formatted properly?

Verify with `flake`

In [None]:
!flake8 sampling.py

### About Docstrings
A programming function is written as:

```
def <function_name>([<parameters>]):
    '''
    Docstrings
    '''
    <statement(s)>
    <return>
```

How do we know how to use the function? The Docstrings section is extremely important.

In [None]:
### Accessing the docstrings or documentation

?sp.sample_dataframe

### Testing files with `pytest`

- `Pytest` is a testing framework in Python.
    - To test a function, create test functions with names starting with "test_" in a file, and then run pytest in the terminal to execute the tests.

**Notes to that:**
- Testing files, for order, usually live in another folder called tests
- For now, we will have them on the same directory.

- `pytest` identifies test functions by looking for functions whose names start with `"test_"`. When you run `pytest`, it automatically discovers and executes these test functions.
    - We do not currently want that because the assignments have a different kind of test (the autograder) - so you **must** specify which file you want to test:

```python
pytest sampling.py
```

In [None]:
!pytest test_sampling.py

In [None]:
pwd

## Explaining for loops in groupby objects.

In [None]:
import pandas as pd
a_dict = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(a_dict)
data

In [None]:
data.groupby('type')

In [None]:
data.groupby('type').get_group('full-time')

In [None]:
grouped_df = data.groupby('type')

In [None]:
grouped_df

```python
for key, value in my_dict.items():
        print(key)
        print(my_dict[key])
        print(value)
```

In [None]:
grouped_df.groups

In [None]:
for group, rows in grouped_df:
    print("new iteration")
    print(group)
    print(rows)

In [None]:
for group, rows in grouped_df:
    print(rows.sample(1))

In [None]:
for group, rows in grouped_df:
    print(group)

In [None]:
for group, rows in grouped_df:
    print(print(group), rows.sample(1))