# Python is dead, long live Python!
### Adapting old codebase for Python 3.x compatibility - case study

Łukasz Rogalski

## About me

![title](photo.png)

- rogalski.91 /at/ gmail /dot/ com
- https://github.com/rogalski
- https://www.linkedin.com/in/lukasz-rogalski

## Motivation

### Python 2.x deprecation

[PEP 373 -- Python 2.7 Release Schedule](https://www.python.org/dev/peps/pep-0373):
> Being the last of the 2.x series, 2.7 will have an extended period of maintenance. Specifically, 2.7 will receive bugfix support until January 1, 2020. After the last release, 2.7 will receive no support.

### New features

- type hints
- `subprocess` module with timeouts
- `pathlib`
- advanced iterable unpacking
- ...

## About project

- test automation framework
- CLI for running automated tests
- mostly used as an element of bigger fully-automated environments
- also used by other clients "on their desk"
- we cannot just stop supporting Python 2.x without any notice

## Codebase & infrastructure

- quite old (started in 2012), Python 2.6 support at some point
- some unit-level tests (but far from 100% coverage)
- trunk based development, short-lived feature branches via Gerrit
- continuous integration, staging env, manual deploy to production

## Approach

#### Prepare users

- select a date and stick to it
- add deprecation info to official docs
- send (multiple) e-mails within organization with announcement of pending deprecation
- extensive `log.warning` for CLI invocations

#### Make codebase Python 3.x compatible

- but keep backwards compatibility

## What backwards compatibility means

How clients are using your code?

- if you do web dev and deploy Docker images: no one cares about Python version used
- if you develop library: importable API needs to be kept intact
- if you develop CLI tool: old command line options / settings files needs to be supported

## Tools

`pylint --py3k` is a Pylint's special checker mode focused on porting code.

*Pre-existing `pylint` configuration extended to use `--py3k` checkers as well.*

`python-modernize` is an automated code conversion tool to apply basic fixes.

*Decided not to use it.*

## Few words about backports

- stdlib packages for older Python versions maintained on PyPI by volunteers
- easy way to use some of new features
- installation criteria controllable from your app requirements `singledispatch==5.4; python_version < '3.3'`

You can avoid messy import statements by preparing compatibility module.

In [None]:
# compat.py
import sys
if sys.version_info < (3, 3):
    from singledispatch import singledispatch
else:
    from functools import singledispatch

In [None]:
# code using singledispatch
from .compat import singledispatch

@singledispatch
def fun(arg):
    ...


## Unit tests

- start running unit tests on Python 3.x
- `tox` used to manage virtual environments
- reproducible test environment for Windows / Linux / MacOS
- easy to start, just run `tox -e py36`

Ideally, failing tests will uncover all issues.

There are libraries that can help with this process of fixing code.
- `six`
- `python-future`

## Laziness of collections methods

- `dict.iterkeys()` vs. `dict.keys()`
- `xrange()` vs. `range()`

You can use `six` to achieve consistent behavior in both Python 2.x and Python 3.x.

In [None]:
import six
d = {"a": 1, "b": 2}
for k, v in six.iteritems(d):
    print(k, v)

Our collections were rather small. We decided to use newer syntax only at the cost of being suboptimal on Python 2.x.

In [None]:
d = {"a": 1, "b": 2}
for k, v in d.items():
    print(k, v)

## Module/object renames

Use `six`.

In [None]:
from six.moves import winreg

Or do error handling manually.

In [None]:
try:
    import winreg
except ImportError:
    import _winreg as winreg

## More drastic stdlib reorganizations - `urllib`, `urllib2`

Significantly redesigned in Python 3.x

Use `six`.

In [None]:
from six.moves.urllib.parse import urllib
from six.moves.urllib.error import HTTPError
from six.moves.urllib.request import urlopen
from six.moves.urllib.response import addinfo

Or don't use those libraries at all.

In our project `urllib`-dependent code was rewritten to use `requests`.

## `__future__` imports

> A future statement is a directive to the compiler that a particular module should be compiled using syntax or semantics that will be available in a specified future release of Python where the feature becomes standard.

Enabling `print_function`, `absolute_import` and `division` - highly recommended.

Enabling `unicode_literals` can introduce some cryptic errors and [opinions on using it varies](https://python-future.org/unicode_literals.html#should-i-import-unicode-literals).


## `__dunder__` methods

- Some protocols / operators changed their magic method name in Python 3.x.
- `__nonzero__` became `__bool__` (truthiness and falsiness)
- `next` became `__next__` (iterator protocol)

There is a neat trick to handle this on code level.

In [None]:
class MyClass(object):
    def __bool__(self):  # python 3.x
        return self.member != 0
    
    __nonzero__ = __bool__  # python 2.x

## Are your unit tests good? (mock curse)
We have bunch of code with interacts with subprocesses.

Launch a process, parse output, return accordingly to output and/or return code.

In [None]:
import subprocess


def my_func():
    output = subprocess.check_output("ping -c 1 127.0.0.1", shell=True)
    return 'some_string' in output


if __name__ == "__main__":
    my_func()

In [None]:
import mock

import my_module


def test_fn():
    mocked_output = 'my_lovely_output'
    with mock.patch("subprocess.check_output", return_value=mocked_output):
        assert not my_module.my_func()

Unit test was valid on Python 2.x.

Unit test is passing on both Python 2.x and Python 3.x.

Code works on production on Python 2.x.

Code **fails** on production on Python 3.x

```
Traceback (most recent call last):
  File "my_module.py", line 10, in <module>
    my_func()
  File "my_module.py", line 6, in my_func
    return 'some_string' in output
TypeError: a bytes-like object is required, not 'str'
```

## Why?

- Python 2.x - concept of bytes and strings was ambiguous
- Python 3.x - explicitly discriminates between bytes and unicode strings

**Be aware of quirks like that!**

## About unit tests and coverage
- `coverage` library can be used to collect data about unit test coverage info
- now you have two sets of data (one for Python 2.x, other for Python 3.x)
- luckily, it is possible to merge info from multiple environments

tox.ini
```
[env]
setenv=COVERAGE_FILE=.coverage.{envname}

```

Postprocessing:
```
coverage combine
coverage html
```
See [official docs](https://coverage.readthedocs.io/en/latest/cmd.html#cmd-combining) for more details.

## Unit tests passing locally 🎉 

- configure CI server to run UT both interpreters
- extremely easy (dependencies: `python3` and `tox` available on build agent)
- `tox` takes care of creating reproducible environment

In ideal world, you are done here.

In practice, not really.

## Local integration testing

1. Prepare environment with all dependencies (`tox` takes care of that)
2. Set up command line, settings files etc.
3. Run app
4. Check whether passing criteria occurred:
  - exit code = 0
  - no errors logged
  - output files with specific content created
  - etc.

Set up extra environments in `tox.ini` file:
```
[testenv]
deps= -rrequirements.txt
commands= pytest --cov=my_package {posargs:tests/unit/}

[env:py27-integration]
basepython = python2.7
commands = pytest {posargs:tests/integration/}

[env:py36-integration]
basepython = python3.6
commands = pytest {posargs:tests/integration/}
```

Running those is as simple as:
- `tox -e py27-integration`
- `tox -e py36-integration`

What you get?
- Much slower than unit tests, but...
- Quick way for developer for running sanity integration tests

**Make those tests pass!**

We eventually added those tests as step in CI server anyway.

## Start using Python 3 in staging environments
Release pipeline:

Checkin -> CI Server (UT, integration tests) -> Staging environment -> Manual deploy to production

Started to use Python 3.x in staging environment:
- Both Python 2.x and Python 3.x are tested (same as in UT and integration tests)
- Make those tests pass
- Monitor for regressions

## Phased rollout

- identify independent areas where we can start using Python 3.x in production

- enable those one by one

- if something got messed up, damage is more manageable

## First non-Py2.x compatible release

- **Increment a major number of your app versioning scheme**

- Develop first functionality using Python 3.x features

- **Clean up code for supporting Python 2.x.**

## Summary

- give (reasonably long) ETA for compatibility breakage to your users

- be aware of differences between 2.x and 3.x branches

- use backports if needed

- automate your testing

- take a calculated risk

# Thank you!

## Contact
- rogalski.91 /at/ gmail /dot/ com
- https://github.com/rogalski
- https://www.linkedin.com/in/lukasz-rogalski

# Q&A