# Styling

## Intuition

> Code is read more often than it is written. -- Guido Van Rossum (author of Python)

When we write a piece of code, it's almost never the last time we see it or the last time it's edited. So we need to 

1. explain what's going on (via documentation) but also 
2. make it easy to read. 

One of the easiest ways to make code more readable is to follow consistent style and formatting conventions.

There are many options when it comes to Python style conventions to adhere to, but most are based on [PEP8](https://peps.python.org/pep-0008/) conventions. You'll notice that different teams follow different conventions and that's perfectly alright. The most important aspects are that everybody is consistently following the same convection and that there are pipelines in place to automatically and effortlessly ensure that consistency. Let's see what this looks like in our application.

> One prime example is standardizing the line length of each line of code within a team, this makes us easier to refer to "a line number" when discussing code reviews. Imagine the case where we have different standards on line length, then one man's line $30$ may be other's line $50$.

## PEP8

[PEP8 guide](https://peps.python.org/pep-0008/) details many common conventions for coding. The list below is by no means exhaustive, but are some common coding styles that big organizations practice.

- Indentation level should be at $4$ spaces (a tab), in general, if the indentation is fixed at $k$, the python code will work, but we follow the $4$ spaces convention.
- Maximum line length should be at $79$ maximum. This, however, is subjective and different organizations do it differently. The key is, to maintain consistency across teams. I believe the $79$ number stems from multiple reasons, one of which is many web browsers do not offer dynamic wrapping, and a super long line of code will turn out extremely ugly.
- Variable namings, it should go without saying that namings are important. We are writing code for **us** and therefore readability is important. Consider the following example:
    ```python
    x = "hongnan"     # bad example
    name = "hongnan". # good example
    ```
    The variable `x` represents a person's name, but it is vague as `x` can literally mean anything. Thus, changing `x` to `name` is a much better choice.
- Imports, there's many rich history on how importing should be, but one thing to bear in mind forever is **stop wildcard imports**. This is extremely bad for other developers. Consider the below example:
    ```python
    from .src.main import *
    ```
    You are essentially importing all functions from `src.main` file. We will not really know which function is from where without digging deep.


> There are many more conventions, we can find them more in the guide in reference.

## Tools 

### Formatter, Sorter and Linter

We will be using a very popular blend of style and formatting conventions that makes some very opinionated decisions on our behalf (with configurable options).

- [`black`](https://black.readthedocs.io/en/stable/): an in-place reformatter that (mostly) adheres to PEP8.
- [`isort`](https://pycqa.github.io/isort/): sorts and formats import statements inside Python scripts.
- [`flake8`](https://flake8.pycqa.org/en/latest/index.html): a code linter with stylistic conventions that adhere to PEP8.

We have installed these libraries prior:

```bash
"black==20.8b1",
"flake8==3.8.3",
"isort==5.5.3",
```

### Difference between Linter and Formatter

The difference might be nuanced and isn't clear. The tagline, **linters for catching errors and quality, formatters to fix code formatting style** can be demonstrated with an example:

```python
def shhq(shhq_member: str = "hn"):
    if shhq_member in ["hn", "cw", "jun", "lh", "lz", "mj", "sz", "wj", "yj", "zj"]:
        return True
    else:
        return False
```

- Our linter `pylint` will complain `Unnecessary "else" after "return" (no-else-return)` as this is not a good habit of using an `else` when you could have just `return False` without `else`. This is where our `flake8` linter comes into play. Note that `black` won't catch this issue.
- Our linter and formatter will also see another glaring issue, that is the `if` line is too long, exceeding the `PEP8` standard of $79$ length. Both `black` and `flake8` will tell us this, but `black` will perform an **in-place** treatment, formatting the code on the go for you, whereas `flake8` will just tell you.

Therefore, the coding world generally uses a formatter (`black`) and a linter (`flake8`) in tandem. We can read the below articles for more info:

- [https://sbarnea.com/lint/black/](https://sbarnea.com/lint/black/)
- [Differences between code linters and formatters](https://taiyr.me/what-is-the-difference-between-code-linters-and-formatters)
- [Format Code vs Lint Code](https://medium.com/@awesomecode/format-code-vs-and-lint-code-95613798dcb3)

## Configuration

Before we can properly use these tools, we'll have to configure them because they may have some discrepancies amongst them since they follow slightly different conventions that extend from PEP8. To configure the aforementioned tools such as black, we could just pass in options using the [CLI](https://black.readthedocs.io/en/stable/usage_and_configuration/the_basics.html#command-line-options) method, but it's much more efficient (especially so others can easily find all our configurations) to do this through a file. So we'll need to create a `pyproject.toml` file and place some configurations we have.

More specifically, we define the parameters and rules in which we want our linter and formatter to check in `pyproject.toml`.

### Configuring Formatter and Sorter with pyproject.toml

We create a `pyproject.toml` file and put in the below.

```toml
# Black formatting
[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
/(
      \.eggs         # exclude a few common directories in the
    | \.git          # root of the project
    | \.hg
    | \.mypy_cache
    | \.tox
    | _build
    | buck-out
    | build
    | dist
    | venv_ae
  )/
'''
```

Here we're telling Black that our maximum line length should be $79$ characters and to include and exclude certain file extensions.

Take note that we should definitely exclude our virtual environment folder, in my example, my vm folder is called `venv_ae` and it should differ from users to users.

We're going to follow the same configuration steps in our `pyproject.toml` file for configuring `isort` as well. Place the following configurations right below `black`'s configurations.

```toml
# iSort
[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3
include_trailing_comma = true
skip_gitignore = true
virtual_env = "venv_ae"
```

Though there is a complete list of [configuration options](https://pycqa.github.io/isort/docs/configuration/options) for isort, we've decided to set these explicitly so it works well with Black.

### Configuring Linter with .flake8

Lastly, we'll set up `flake8` but this time we need to create a separate `.flake8` file (`flake8` has its own config file) and place the following configurations:

```.flake8
[flake8]
exclude = venv
ignore = E501, W503, E226
max-line-length = 79

# E501: Line too long
# W503: Line break occurred before binary operator
# E226: Missing white space around arithmetic operator
```

Here we setting up some configurations like before but we're including an ignore option to ignore certain `flake8` rules so everything works with our `black` and `isort` configurations. This is to prevent some conflicts from `black` and `flake8`.

## Example Usage

We include a reproducible example on google colab to help visualize the workflow.

### Google Colab Walkthrough

1. We first install the packages needed.

In [None]:
!pip install -q black flake8 isort

[K     |████████████████████████████████| 1.4 MB 5.3 MB/s 
[K     |████████████████████████████████| 64 kB 2.4 MB/s 
[K     |████████████████████████████████| 103 kB 61.2 MB/s 
[K     |████████████████████████████████| 96 kB 5.1 MB/s 
[K     |████████████████████████████████| 843 kB 57.6 MB/s 
[K     |████████████████████████████████| 42 kB 782 kB/s 
[K     |████████████████████████████████| 69 kB 6.2 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
markdown 3.3.6 requires importlib-metadata>=4.4; python_version < "3.10", but you have importlib-metadata 4.2.0 which is incompatible.
flask 1.1.4 requires click<8.0,>=5.1, but you have click 8.1.2 which is incompatible.[0m
[?25h

2. We define a global variable: `BASE_DIR` which points to one level up, its root folder. We verify that it is `/content` (in google colab).

In [None]:
from pathlib import Path

# Creating Directories
BASE_DIR = Path("__file__").parent.absolute()
print(BASE_DIR)

/content


3. Write a python file named `test.py` into `BASE_DIR`. The file contains the function we talked about just now.

    We now write this function into the file `test.py`. 

In [None]:
%%writefile {BASE_DIR}/test.py
def shhq(shhq_member: str = "hn"):
    if shhq_member in ["hn", "cw", "jun", "lh", "lz", "mj", "sz", "wj", "yj", "zj"]:
        return True
    else:
        return False

Writing /content/test.py


4. As detailed in the earlier section, we set some configurations for the formatter `black` and write these in `pyproject.toml` file. 

    Note that we excluded folders like the virtual environments `venv_ae`. As a reminder, we do not want our formatter and linter to check on **every file** in our code base. Even though this example here is not directly applicable, we should take note during production.

In [None]:
%%writefile {BASE_DIR}/pyproject.toml
# Black formatting
[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
/(
      \.eggs         # exclude a few common directories in the
    | \.git          # root of the project
    | \.hg
    | \.mypy_cache
    | \.tox
    | _build
    | buck-out
    | build
    | dist
    | venv_ae
  )/
'''

Writing /content/pyproject.toml


5. Before we run the `black` formatter, we call `%pycat` to view the python file and take note that in `line 2`, the line length definitely exceeded $79$.

In [None]:
%pycat {BASE_DIR}/test.py

6. To use these tools that we've configured, we could run these commands individually such as calling `black .` where `.` signifies that the configuration file for that package is in the current directory.

In [None]:
!black .

[1mreformatted test.py[0m

[1mAll done! ✨ 🍰 ✨[0m
[34m[1m1 file [0m[1mreformatted[0m.


7. We see that the console said the files are formatted. We can call `%pycat` once again to check the code is indeed formatted!

In [None]:
%pycat {BASE_DIR}/test.py

8. We can repeat the steps for our `flake8` file. We will shorten the example here, but for completeness sake we re-initialize `test.py` and see what our `flake8` has to say.

In [None]:
%%writefile {BASE_DIR}/test.py
def shhq(shhq_member: str = "hn"):
    if shhq_member in ["hn", "cw", "jun", "lh", "lz", "mj", "sz", "wj", "yj", "zj"]:
        return True
    else:
        return False

Overwriting /content/test.py


In [None]:
%%writefile {BASE_DIR}/.flake8
[flake8]
exclude = venv
ignore = W503, E226 # E501
max-line-length = 79

# E501: Line too long
# W503: Line break occurred before binary operator
# E226: Missing white space around arithmetic operator

Overwriting /content/.flake8


In [None]:
!flake8

./test.py:5:21: W292 no newline at end of file


In the original example, the author ignores `E501: Line too long` clause in order to avoid conflicts with `black`. However, I included it to show as an example. Even though `flake8` highlights the issue, it will not automatically format the code!

It also did not seem to have the `uncessary else after return` statement (perhaps this is under `pylint` and not `flake8`).

## Workflow

### Workflow in IDE

Here is the command line if you are working in VSCode.

```bash
cd "to your desired directory"  # change dir to your working directory
code .                          # open your VSCode
touch test.py                   # touch is mac command to create a new file, upon creation, add in the code
touch .flake8                   # add in the configurations for flake8
touch pyproject.toml            # add in the configurations for black and isort
black .                         # runs black config from pyproject and formats code in-place
isort .                         # runs isort config from pyproject and formats code in-place
flake8                          # runs flake8 config from .flake8 add flake8 ... if it hangs
```

### Workflow in Google Colab

To fill in

## Next Steps

Let us see what we can further do to automate this step.

### Branch

Create a branch and make a tutorial on this styling.

### Makefile

We will mention in `Makefile` on how to call these commands.



### Pre-commit

We may sometimes forget to run these style checks after we finish development. We'll cover how to automate this process using pre-commit so that these checks are automatically executed whenever we want to commit our code.

## References

- [PEP8 guide](https://peps.python.org/pep-0008/)
- [What really is pyproject.toml?](https://snarky.ca/what-the-heck-is-pyproject-toml/)
- [MLOps madewithml](https://madewithml.com/courses/mlops/styling/)
- [https://sbarnea.com/lint/black/](https://sbarnea.com/lint/black/)
- [Differences between code linters and formatters](https://taiyr.me/what-is-the-difference-between-code-linters-and-formatters)
- [Pre-commits Styling](https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/)