<a href="https://colab.research.google.com/github/mikel-k-khui/GDG-PyYYC-Python-Bootcamp/blob/main/Python_Bootcamp_Week_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Bootcamp Week2

Python fundamentals 4: testing, type annotation, Python best practices


## 1. Testing

Why do you need to test your code? Well, you have to make sure your code does what it is supposed to do.
- The test is used to validate the output against a known response
- You have to make sure your function/test return the same result each time no matter how many times it runs

When do you need to test your code:
- You should **always** test your code.
- When you write any code, you should first manually check if its working, and then write a test to test its working 
    - Note that there is also an opposite process which is to write your test first and then write your code, this process is called *Test Driven Development (TDD)*
    - TDD can be very effective, since by writing your tests first, you should think about the edge cases first and your code will produce less bugs
- You should test any edge cases that you can think of. Missing handle of the edge cases are mostly the cause of the bugs.

### Automated vs. Manual Testing

- Manual Testing: 
    - To have a complete set of manual tests, all you need to do is make a list of all the features your application has, the different types of input it can accept, and the expected results. 
    - However, every time you make a change to your code, you need to go through every single item on that list and check it.

- Automated Testing: use script to execute your test plan. 

- How do we do automated testing? We use the `assert` keyword

In [None]:
assert 1 + 2 + 3 == 6, "Should be 6"

In [None]:
assert 1 == 6, "Should be 6"

AssertionError: ignored

In [None]:
def test_sum(a_list, target):
    assert sum(a_list) == target, f"Should be {target}"

In [None]:
test_sum([1,2,3], 6)

In [None]:
test_sum([1,2], 6)

AssertionError: ignored

### Unit Tests vs. Integration Tests

- Unit test and integration test are two very important terminology in the world of testing

- Unit test: test a single component/function

- Integration test: test multiple components

- For example: Suppose we have a calculator that does simple `+-*/` calculation
    - We have the following unit test:
        - Turn on the calculator, display works
        - Punch in numbers, display shows correct numbers
        - Simple calculation works, such as `1+1=2`, `1-1=0`, `1*1=1`, `1/1=1`
    - We have the following integration test:
        - Complex calculation works, such as `5*(20+3) - 8 = 107`
        
        

### Test framework

- Although we can write test code from scratch, there are two famous test frameworks in Python:
    - Unittest (built into Python standard library since Python 2.1)
    - Pytest
    - nose/nose2 (not so popular nowadays, you may see it in some leacy code)

#### Unittest framework

How to use unittest:

1. `import unittest`
2. Created a test class to inherit from `TestCase`
3. You can use `self.assert_` to assert different cases, such as *equal*, *in*, etc

For example:

```python
import unittest


class TestSum(unittest.TestCase):

    def test_sum(self):
        self.assertEqual(sum([1, 2, 3]), 6, "Should be 6")

    def test_sum_tuple(self):
        self.assertEqual(sum([1, 2, 2]), 6, "Should be 6")

if __name__ == '__main__':
    # This means that if you execute the script alone by running python test.py at 
    # the command line, it will call unittest.main()
    unittest.main()
```

- When you would like to do something prior to the tests (such as setup some data or functions), you can use `setUp` method.
- When you would like to clean up after the tests are done (such as delete some content, reset some configurations, etc), you can use the `tearDown` method.
- A good example of the test class looks like:

```python
    class TestXxx(unittest.TestCase):

        def setUp(self):
            """ setup for test cases """
            pass

        def tearDown(self):
            """ CLeanup for test cases """
            pass

        def test_yyy_description1(self):
            """ test description 1 """
            pass

        def test_yyy_description2(self):
            """ test description 2 """
            pass
```
    
- Unittest has many builtin methods for quick testing, such as test equals, element inside list, a detailed list can be seen in the Reference lists
- You can also test if there is an error in the code (i.e. test expected failures happens) using `self.assertRaises`

```python
    def test_bad_type(self):
        data = "banana"
        with self.assertRaises(TypeError):
            result = sum(data)
```

#### Pytest  framework

Pytest is built upon unittest, so ALL your unittest code can work seemlessly with pytest.

Benefits of Pytest over unittest:

1. pytest test cases are a series of functions in a Python file starting with the name `test_`, so its function based rather than class based.
2. Support for the built-in `assert` statement instead of using special `self.assert*()` methods
3. Support for filtering for test cases
4. Ability to rerun from the last failing test
5. Better debug information
6. An ecosystem of hundreds of plugins to extend the functionality

How to use Pytest:

```python
def test_sum(a_list, target):
    assert sum(a_list) == target, f"Should be {target}"
```

Similarly, pytest can also assert exception as follows:

```python
def test_zero_division():
    with pytest.raises(ZeroDivisionError):
        1 / 0
```

### Test file structure

There are basically two ways you can structure your files for your project:

First one is to have your test files in a test folder outside of the applicaiton code (**Preferred**):

```
project/
│
├── my_sum/
│   └── __init__.py
|
└── tests/
     └── test_my_sum.py
```

The second one is to have your test files in your application code folder:

```
project/
│
└── my_sum/
      ├── __init__.py
      |
      └── tests/
            └── test_my_sum.py   
```

### Test execution

The Python application that executes your test code, checks the assertions, and gives you test results in your console is called the **test runner**.

How do you run the test in command line:

- To run the test: `python -m unittest test`  (suppose your test file is named as `test.py`)
    - Note that you can also run specific test function with `python -m unittest test.TestClass.test_method`
- Add `-v` flag for print more info into the consol: `python -m unittest -v test`
    - Note you can also add multiple `v` flags for more detailed info
- Use auto-discovery for patterns rather than file name: `python -m unittest discover`
    - It will detect and run all the files named `test*.py` in the current directory (i.e. `test_sum.py` etc)

### Read test output

Suppose you have the following output:

```
$ python -m unittest test
F.
======================================================================
FAIL: test_list_fraction (test.TestSum)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 21, in test_list_fraction
    self.assertEqual(result, 1)
AssertionError: Fraction(9, 10) != 1

----------------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=1)
```

What does it mean?

- `F` means the test failed, `.` means the test passed
- The `Traceback` shows you where the error happened, and line number indicates which line the error happened
- The `AssertionError` shows what error happened: expected result *(1)* and the actual result *(Fraction(9, 10))*

**NOTE** you can configure your IDE to run tests too, details in the Reference links

### Fixtures

- The data that you create as an input is known as a **fixture**. It’s common practice to create fixtures and reuse them.
- You can write your own function from scratch to create fixtures, or you can use thrid party packages
- pytest framework has builtin functions to mark your functions as fixtures
- `Factory Boy` is a famous package for creating fixtures with ORMs (to initiate some data in database)


### Mocks

- Mocks are another important part of unit testing. 
- As of Python 3.3 and later, mock became a standard library in python, you can import it in `unittest.mock`
- Since unit testing is to test a specific function, sometimes we don't care about the other parts of the functions and that's why we mock them.
- We also use mocks to control your code's behavior during testing. In another word, your tests execute predictably only so far as the environments are behaving as you expected.

For exmaple:

In [None]:
from datetime import datetime
from unittest.mock import Mock

# Save a couple of test days
tuesday = datetime(year=2019, month=1, day=1)
saturday = datetime(year=2019, month=1, day=5)

# Mock datetime to control today's date
datetime = Mock()

def is_weekday():
    today = datetime.today()
    # Python's datetime library treats Monday as 0 and Sunday as 6
    return (0 <= today.weekday() < 5)

In [None]:
# Mock .today() to return Tuesday
datetime.today.return_value = tuesday

# Test Tuesday is a weekday
assert is_weekday()

# Mock .today() to return Saturday
datetime.today.return_value = saturday

# Test Saturday is not a weekday
assert not is_weekday()

### References

- [Getting Started With Testing in Python](https://realpython.com/python-testing/)
- [python Unittest](https://docs.python.org/3/library/unittest.html)
- [Unittest Cheatsheet](https://gist.github.com/mogproject/fc7c4e94ba505e95fa03)
- [pytest Cheatsheet](https://www.valentinog.com/blog/pytest/)
- [Test directory structure](https://docs.pytest.org/en/reorganize-docs/new-docs/user/directory_structure.html)
- [How to run test in Pycharm](https://www.jetbrains.com/help/pycharm/performing-tests.html)
- [Python testing in VScode](https://code.visualstudio.com/docs/python/testing)
- [Testing in django](https://docs.djangoproject.com/en/dev/topics/testing/overview/)
- [Understanding the Python Mock Object Library](https://realpython.com/python-mock-library/)

## 2. Type annotation

As we all know python types are really flexible (weakly typed), but this could create some confusions in the code.

Python 3.6 introduced type annotation. [**PEP 526**](https://www.python.org/dev/peps/pep-0526/) describes it in full details.

In [None]:
age: int = 1

# In Python 3.5 and earlier you can use a type comment instead
# (equivalent to the previous definition)
age = 1  # type: int

# You don't need to initialize a variable to annotate it
a: int  # Ok (no value at runtime until assigned)

# The latter is useful in conditional branches
child: bool
if age < 18:
    child = True
else:
    child = False

You can anotate variables, functions, classes, pretty much anything

In [None]:
# this function has an input type str and return type str
def greeting(name: str) -> str:
    return 'Hello ' + name

There are many builtin types you can use:

In [None]:
from typing import List, Set, Dict, Tuple

# For simple built-in types, just use the name of the type
x: int = 1
x: float = 1.0
x: bool = True
x: str = "test"
x: bytes = b"test"

# For collections, the name of the type is capitalized, and the
# name of the type inside the collection is in brackets
x: List[int] = [1]
x: Set[int] = {6, 7}

# Same as above, but with type comment syntax
x = [1]  # type: List[int]

# For mappings, we need the types of both keys and values
x: Dict[str, float] = {'field': 2.0}

# For tuples of fixed size, we specify the types of all the elements
x: Tuple[int, str, float] = (3, "yes", 7.5)

# For tuples of variable size, we use one type and ellipsis
x: Tuple[int, ...] = (1, 2, 3)

You can also create custom types for you code

In [None]:
from typing import NewType

UserId = NewType('UserId', int)
some_id = UserId(524313)

Combine them together, you can annotate your function easily:

In [None]:
from typing import Callable, Iterator, Union, Optional, List

def send_email(
    address: Union[str, List[str]],
    sender: str,
    cc: Optional[List[str]],
    bcc: Optional[List[str]],
    subject='',
    body: Optional[List[str]] = None
) -> bool:
    pass

### Reference

- [Type hints cheat sheet (Python 3)](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html)
- [Python Type Checking (Guide)](https://realpython.com/python-type-checking/)

## 3. Python Best practices

I've seen so much code with bad practice (leetcode, kaggle, open source projects, etc), that's why I would like to show you what are the Python best practices.

**DISCLAIMER**: This presentation is entirely subjective, and is based on author’s experience, and what’s widely accepted by Python devs

All python code is guided by Python Enhancement Proposals (***PEP***s)


### Virtual Environment

Since Python has many different versions, and there are many packages with different versions, its always best to use virtual enviornments to make sure the verions are properly isolated.

There are mainly three virtual environments:
- Virtualenv
- Anaconda
- Pipenv

#### Virtualenv

- Most widely used
- The package `virtualenvwrapper` makes everything much eaiser
    - `makevirtualenv`
    - `workon` and `deactivate`
    - `setvirtualenvproject`

#### Anaconda

- Most widely used in the AI areas (ML, DS, etc)
- Integration of many packages
- Commands slightly different or Win and Mac
- Some commands
    - `conda  create –n <env_name>`
    - `conda install <package_name>`
    - `source activate` & `source deactivate`

#### Pipenv

- New standard of python packages
- Combination of pip and virtualenv
- Dependency version lock
- Some Commands:
    - `pipenv shell`
    - `pipenv install <package_name>`
    - `pipenv  install --dev`


    

### Code Structure 

```
project/
│
├── src/
│   ├── __init__.py
│   └── main_func.py
|
├── tests/
│    └── test_my_func.py
|
├── README.md
|
├── License
|
├── requirements.txt
|
├── Documentations/
│    └── my_doc.md
|
└─ setup.py  # if this is a package
```

### Formatting, linting, and typing

**Format** is how you write your code

**Linting** is a way to check your code's format

**Types** are the annotations talked above

These two things you ***MUST*** remember:
- ***PEP8***: Official style guide for Python code
    - https://www.python.org/dev/peps/pep-0008/
- ***Black***: the Uncompromising Code Formatter
    - More readability, great formatting
    - One of  the official supported formatter for VSCode
    - Django has accepted using black as formatting as of May 10, 2019

***Code is executed by machines, but is read by human***

For example:

<table>
  <tr>
    <th>Badly formated code</th>
    <th>Good foramted code</th>
  </tr>
<tr>
<td>
   <pre lang="python">
 autoencoder = create_autoencoder(dae_train.shape[1])

    autoencoder.fit(noised_train, dae_train,
                    epochs=500,
                    batch_size=128,
                    callbacks=[
                        ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience = 3, verbose=1, min_delta=1e-4, mode='min')
                        , ModelCheckpoint(f'dae.hdf5', monitor = 'val_loss', verbose = 0, save_best_only = True, save_weights_only = True, mode = 'min')
                        , EarlyStopping(monitor = 'val_loss', min_delta = 1e-4, patience = 8, mode = 'min', baseline = None, restore_best_weights = True)],
                    shuffle=True,
                    validation_split=0.2)
   </pre>
</td>
<td>
  <pre lang="python">
 autoencoder = create_autoencoder(dae_train.shape[1])

    autoencoder.fit(
        noised_train,
        dae_train,
        epochs=500,
        batch_size=128,
        callbacks=[
            ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.1,
                patience=3,
                verbose=1,
                min_delta=1e-4,
                mode='min',
            ),
            ModelCheckpoint(
                f'dae.hdf5',
                monitor='val_loss',
                verbose=0,
                save_best_only=True,
                save_weights_only=True,
                mode='min',
            ),
            EarlyStopping(
                monitor='val_loss',
                min_delta=1e-4,
                patience=8,
                mode='min',
                baseline=None,
                restore_best_weights=True,
            ),
        ],
        shuffle=True,
        validation_split=0.2,
    )
  </pre>
</td>
</tr>




In [None]:
a_dict = dict(
    name="Carson",
    title="developer",
    name1="Carson",
    title2="developer",
    name3="Carson",
    title4="developer",
)
print(a_dict)

from pprint import pprint
pprint(a_dict)

{'name': 'Carson', 'title': 'developer', 'name1': 'Carson', 'title2': 'developer', 'name3': 'Carson', 'title4': 'developer'}
{'name': 'Carson',
 'name1': 'Carson',
 'name3': 'Carson',
 'title': 'developer',
 'title2': 'developer',
 'title4': 'developer'}


In [None]:
import datetime


def get_week():
  pass

#### Indentation

- Python recognizes both space and tab for indentation level
- PEP8 suggests using space
- **4 spaces** per indentation level
- must **NEVER MIX** spaces and tabs

#### Strings

- People use to use single quote (**‘** **’**) coz its easier to type
- Double quote (**“** **”**) is used by English language
- **Black suggests to use double quotes to reduce confusion**
    - For example:
        - `print(f"You've never seen this {animal} before")`
        - `print(f'You\'ve never seen this {animal} before')`
- You can use `--skip-string-normalization` to stop black formatting your quotations

#### Line length

- PEP8 suggests all lines limit to 79 characters, and 72 characters for comments
- Many companies use line length from 100 to 120 characters
- **Black suggests using 88 characters per line**

### Line break

- PEP8 suggests line break after binary operators for readability

  ```python
  # Correct:
  # easy to match operators with operands
  income = (gross_wages
            + taxable_interest
            + (dividends - qualified_dividends)
            - ira_deduction
            - student_loan_interest)
  ```

- Surround top-level function and class definitions with *two blank lines*
- Method definitions inside a class are surrounded by a single blank line

#### Imports

- Be explicit

  ```python
  # Correct:
  from subprocess import Popen, PIPE
  ```

- Imports should be on separate lines

  ```python
  # Correct:
  import os
  import sys
  ```
  ```python
  # Wrong:
  import sys, os
  ```

- Imports order:
  1. Standard lib
  2. Related 3rd party 
  3. Local lib

- What if there are too many classes need to be imported from 1 package?

  ```python
  from sklearn.model_selection import (
      TimeSeriesSplit,
      KFold,
      ShuffleSplit,
      StratifiedKFold,
      GroupShuffleSplit,
      GroupKFold,
      StratifiedShuffleSplit,
      train_test_split,
  )
  ```

#### Call Chains

Some popular APIs, like ORMs, use call chaining.

Here is how black formats it:

```python
def example(session):
    result = (
        session.query(models.Customer.id)
        .filter(
            models.Customer.account_id == account_id,
            models.Customer.email == email_address,
        )
        .order_by(models.Customer.id.asc())
        .all()
    )
```



#### Long String

- When you have a super long string, you can break it up to a few lines
- Use triple quotation
- Use plus sign, but it creates more strings
- Or do multiple lines

In [None]:
# Break into a few lines
message1 = (
  "some really long message"
  "and some really really long message"
)
print(f"1. Break into a few lines: {message1}")

# Use triple quotation 
message2 = """
  some really long message
  and some really really long message
"""
print(f"2. Use triple quotation: {message2}")

# Break into a few lines
message3 = (
  "some really long message"
  + "and some really really long message"
)
print(f"3. Use plus sign: {message3}")

1. Break into a few lines: some really long messageand some really really long message
2. Use triple quotation: 
  some really long message
  and some really really long message

3. Use plus sign: some really long messageand some really really long message


#### Naming Convention

- Use meaningful words rather than single letters
    - i.e. `user_id`
    - ***NOTE*** that this also applies to functions, variables, and everything!
        - i.e. `get_user_id()`, `user_profile_groupped_by_id = dict()`
- Function name: all lower case and separate with underscore
    - i.e. `def my_function()`
- Class name: Capitalized words all together
    - i.e. `class MyClass:`
- Constants: All capital words
    - i.e. `MY_CONSTANT = 5`
- _single_leading_underscore: weak "internal use" indicator
- __double_leading_underscore: invokes name mangling for class attribute

#### White Space

- Immediately inside parentheses, brackets or braces
    - `spam(ham[1], {eggs: 2})`
- Between a trailing comma and a following close parenthesis
    - `foo = (0,)`
- Immediately before a comma, semicolon, or colon
    - `x, y = y, x`
- Immediately before the open parenthesis that starts the argument list of a function call or starts an indexing or slicing
    - `spam(1)`, `dct['key'] = lst[index]`
- More than one space around an assignment (or other) operator to align it with another

  ```python
  x = 1
  y = 2
  long_variable = 3
  ```


#### Doc String

- Write docstrings for all public modules, functions, classes, and methods
- Triple quotation marks **“”””””** for long string or comments 
- **PEP 257 describes good docstring conventions**

#### Function Annotation

- Python is weak typing language
- **PEP 526 is accepted for Python 3.6 as type hints (PEP 484) but not type reinforcement**




#### Helpful Tools

- **Black** – Yes, again, black
- **isort** - sort imports alphabetically, and automatically separated into sections
- **flake8** – style reinforcement
- **mypy** – static type check
- **pytest** – built upon but better than unittest
- **pytest-cov** – test code coverage
- **Git hooks with pre-commit**
- **cookiecutter** – project template
- **pysnoob & pudb** – better debugging


#### Reference

- [PEP8](https://www.python.org/dev/peps/pep-0008/)
- [PEP257](https://www.python.org/dev/peps/pep-0257/)
- [PEP526](https://www.python.org/dev/peps/pep-0526/)
- [Black code style guide](https://black.readthedocs.io/en/stable/the_black_code_style.html)
- [Python Best Practice](https://github.com/zmcddn/Presentations/blob/master/Calgary%20Python%20Meetup%20PyYYC/Python%20Best%20Practice.pdf)

**Some links for setting up environments**

- [Pipenv Tutorial - the official package control for python](https://zmcddn.github.io/pipenv-tutorial-the-official-package-control-for-python.html)
- [Setup Virtualenv for Any Project](https://zmcddn.github.io/setup-virtualenv-for-any-project.html)
- [The ultimate guide to setup multiple Python environment with Anaconda and Sublime Text](https://zmcddn.github.io/the-ultimate-guide-to-setup-multiple-python-environment-with-anaconda-and-sublime-text.html)

## At last

- Google is your best friend, before asking any question, do a google search first.
- Black is also your best friend when comming to Python, make sure you follow the Python best practices.
- Please always test your code!
- Good luck on your career!