# State of the Nation: Python as of March 2021

## A brief history of Python and glimpse at versions 3.8, 3.9* and 3.10**

* \* latest minor version
* \*\* upcoming minor version, expected October 2021

**Sadie Bartholomew, presented on 26.03.21 at the NCAS Python Technical Meeting**

-----

Let's begin with a quick look at the big picture, because...

## Python turned 30 this February!

Happy belated birthday Python:

In [1]:
# In not many other languages could you (or would you want to) do this:
print(
    '\N{birthday cake}',  # unicode emoji CLDR short name with '\N' escape character
    '\N{balloon}',
    '\N{confetti ball}',
    '\N{snake} ' * 6,
)

🎂 🎈 🎊 🐍 🐍 🐍 🐍 🐍 🐍 


**Over the three decades, milestones of note are:**

* version 0.9.0 in early 1991;
* 1.0 in 1994;
* 2.0 shortly after the millenium;
* formally losing Guido as the "Benevolent dictator for life" (BDFL) in 2018 shortly after [PEP 572](https://www.python.org/dev/peps/pep-0572/) a.k.a. the "walrus operator" (`:=`) PEP which was added for version 3.8 (summarised later);
* Python 2 reaching end-of-life in 2020;
* all the way up to the latest version 3.9, released October 2020.

Here's a nice graphic depicting Python's version evolution up to ~2018 (source is a post 'Should You Learn Python Programming In 2020?' on the *Venture Lessons* site, at https://www.venturelessons.com/should-you-learn-python-programming/):

![](https://www.venturelessons.com/wp-content/uploads/2019/06/short-history-of-python-1024x419.jpg)

In light of this long history, it is nice to revisit the "Zen of Python" (PEP 20) created in 2004, which "succinctly \[channel\] the BDFL's guiding principles":

In [2]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


I'm not sure how much these principles have been adhered to. For example:

> There should be one-- and preferably only one --obvious way to do it.

And and somehow we now have a situation like this for (taking one obvious example) string formatting:

In [3]:
name = "Sadie"
python_version = str(sys.version_info.major) + "." + str(sys.version_info.minor)

a = f"{name} is using Python {python_version}"            # f-string ("literals")
b = "{} is using Python {}".format(name, python_version)  # format string
c = "%s is using Python %s" % (name, python_version)      # C-style
d = name + " is using Python " + python_version           # construction by concatenation
# ... or even some mix of these (not advised, and f- and format string mix is forbidden):
e = f"{name} is using %s" % ("Python " + python_version)

for string in [a, b, c, d, e]:
    print(string)

Sadie is using Python 3.9
Sadie is using Python 3.9
Sadie is using Python 3.9
Sadie is using Python 3.9
Sadie is using Python 3.9


On that note, which is best to use in March 2021?

It is somewhat subjective based on readability considerations etc., but this talk considers Python 3.8 and onwards, in which case literal string interpolation is available ([PEP 498](https://www.python.org/dev/peps/pep-0498/) in for Python 3.6+) and seems largely agreed that such "f-strings" are the best choice to use going forward, due for example to less limitations and better performance, e.g. compare the representative timings of:

In [4]:
def literal_test(ints):
    """Keep adding digits to an empty string, via literal string formatting."""
    a = ""
    for integer in range(ints):
        a += f"{integer}"


test_input_int = int(1e6)
%timeit literal_test(test_input_int)  # note %timeit is an iPython magic func: see %lsmagic for more

128 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


to the timings of:

In [5]:
def format_test(ints):
    """Keep adding digits to an empty string, this time with format strings."""
    a = ""
    for integer in range(ints):
        a += "{}".format(integer)

%timeit format_test(test_input_int)

211 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


So, that's a brief history. Now to the present and the future, starting with a quick detour to demo the highly controversial "walrus operator" PEP that is available from 3.8...

## Assignment expressions via the "walrus operator" `:=`, new in v. 3.8

With `:=` in 3.8 onwards we can assign a value to a variable in the context of an expression.

![walrus](https://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Pacific_Walrus_-_Bull_%288247646168%29.jpg/800px-Pacific_Walrus_-_Bull_%288247646168%29.jpg)

(Walrus source: Wikimedia commons via https://en.wikipedia.org/wiki/File:Pacific_Walrus_-_Bull_(8247646168).jpg)

Take, towards a near-minimal example, some trivial logic which takes a random integer from 0 to 4 and prints it if it is non-zero:

In [6]:
from random import randrange

# Without walrus operator:
choice = randrange(5)  # 0 to 4 at random
if choice: # if not zero
    print(choice)

4


With the walrus operator, this could be condensed into two lines:

In [7]:
if choice := randrange(5):  # combines first two lines of the above into one
    print(choice)

1


Note though that there are certain contexts where the operator can't be used, for example in many cases when at top-level and unparenthesised. For details consult the associated PEP: https://www.python.org/dev/peps/pep-0572/#exceptional-cases.

## The latest (minor) version: Python 3.9

Python 3.9 is the current latest version, released October 2020.


Good resources covering the changes between 3.8 and 3.9 include:

* the offical summary: https://docs.python.org/3/whatsnew/3.9.html#what-s-new-in-python-3-9
* some blog posts, for example: https://realpython.com/python39-new-features/ 

### Confirming support for Python 3.9

Many Python libraries have started to confirm, and test they work for, 3.9, though maintainers need to ensure dependencies support 3.9 before packages can be upgraded to 3.9 themselves.

For example, with cf-python we've recently confirmed 3.9 works and hence documented it as such and added 3.9 to our Actions test workflows etc., but we had to wait on a version of `netcdf4` being available for Python 3.9, so anyhow couldn't have supported 3.9 until early 2021.

**Note: now 3.9 is available directly from conda (as well as pip). A few months back one had to use the conda-forge channel with conda to access it.** Just create a clean 3.9 environment via `conda create -n python39env python=3.9` or similar and you're up and running.

### Quick overview of some notable new features

* Syntactic sugar for merging of dictionaries, with a new union operator `|` (see [PEP 584](https://www.python.org/dev/peps/pep-0584/) for details):

In [8]:
a = {1:10, 2:20}
b = {3:30, 4:40}

# Before, the best we could do is one of either:
c1 = {**a, **b}
# or (assuming a solution that shouldn't change 'a' in-place):
c2 = a.copy()
c2.update(b)

# Now, in 3.9, we can do:
c3 = a | b

# Just to show they all are the same and the original dicts are untouched:
print(
    a,
    b,
    c1,
    c1 == c2 == c3,
    sep="\n"
)

{1: 10, 2: 20}
{3: 30, 4: 40}
{1: 10, 2: 20, 3: 30, 4: 40}
True


* Handy new string stripping methods for `removeprefix` and `removesuffix` (see [PEP 616](https://www.python.org/dev/peps/pep-0616/) for details).

  Note that part of the reason for these methods being added, as covered in the PEP in question, is that similar existing methods, `lstrip()` and `rstrip()`, were often assumed to do what these methods do, but they instead work on a set of characters rather than a ordered sequence of them i.e. a substring. For example, compare:

In [9]:
# Comparing rstrip and the new method for 3.9, removesuffix:

fruits = "Apples, oranges and bananas"

print(fruits.rstrip("ans"))
print(fruits.removesuffix("ans"))

print(fruits.rstrip("nas"))
print(fruits.removesuffix("nas"))

Apples, oranges and b
Apples, oranges and bananas
Apples, oranges and b
Apples, oranges and bana


* Further changes to breifly note:

  * 3.9 is the final version providing Python 2 backward compatibility layers, so watch out for any cases of `DeprecationWarning` in your codebases (details in the PEP [here]());
  * it has a a new parser (details in the PEP [here]())!

### New module: `zoneinfo` for proper time zone support

As a quick sample, see the many, many supported timezones:

In [10]:
import zoneinfo
zoneinfo.available_timezones()

{'Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/

Under-the-hood it makes use of a database to handle the complexities of time zones including the numerous historical relative changes (quoted from the [database website](https://www.iana.org/time-zones), consult that for details):

> The Time Zone Database (often called tz or zoneinfo) contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules.

In [11]:
from datetime import datetime

my_time = datetime.now(zoneinfo.ZoneInfo("UTC"))
print(my_time)

# Inspecting the hour today:
my_time_hour = my_time.hour
central_european_time_hour = datetime.now(zoneinfo.ZoneInfo("CET")).hour
print(
    f"It's {my_time_hour % 12} o'clock for me in UTC but "
    f"{central_european_time_hour % 12} o'clock in CET.")

2021-03-26 15:52:18.359570+00:00
It's 3 o'clock for me in UTC but 4 o'clock in CET.


There's a post I found really useful for understanding `zoneinfo` and how it relates to previous means to manage timezones in Python open to read here: https://howchoo.com/g/ywi5m2vkodk/working-with-datetime-objects-and-timezones-in-python

### New module: `graphlib` for operations on graph-like structures

In [12]:
import graphlib
graphlib.__dir__()
help(graphlib.TopologicalSorter())

Help on TopologicalSorter in module graphlib object:

class TopologicalSorter(builtins.object)
 |  TopologicalSorter(graph=None)
 |  
 |  Provides functionality to topologically sort a graph of hashable nodes
 |  
 |  Methods defined here:
 |  
 |  __bool__(self)
 |  
 |  __init__(self, graph=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  add(self, node, *predecessors)
 |      Add a new node and its predecessors to the graph.
 |      
 |      Both the *node* and all elements in *predecessors* must be hashable.
 |      
 |      If called multiple times with the same node argument, the set of dependencies
 |      will be the union of all dependencies passed in.
 |      
 |      It is possible to add a node with no dependencies (*predecessors* is not provided)
 |      as well as provide a dependency twice. If a node that has not been provided before
 |      is included among *predecessors* it will be automatically added to the graph with
 |      no p

There's a lovely post that explains what toplogical sorting is and the basics of `graphlib` here: https://willearp.com/posts/pythons-topological-sorter/.

## `from __future__ import`: what's set to come in version 3.10

The current (alpha 'a6') documentation lists the features that have already been set to go into 3.10, though more will likely be added before release: https://docs.python.org/3.10/whatsnew/3.10.html

I'll only discuss what I think is the most interesting new feature, but to mention briefly a few other additions I found notable:

* parenthesized continuation for multiple context managers;
* more helpful syntax errors than "SyntaxError: unexpected EOF while parsing" for unclosed parentheses, brackets or braces and similar;
* entire `distutils` package becomes deprecated, so it seems like `setuptools` is being encouraged now (until the next packaging module comes along...) e.g. for our modules I updated the `setup.py` `setup()` function like so: https://github.com/NCAS-CMS/cf-python/commit/353c053618f8ba288c1aaa221cd915bc359abed3;

### Structural Pattern Matching (PEP 634) with `match`/`case`

Easily the most outlandish and likely controversial of the features already due to go into 3.10 is from [PEP 634](https://www.python.org/dev/peps/pep-0634/) which introcudes new reserved keywords `match` and `case` which facilitate pattern matching:

* `match` takes an expression, which
* `case` statements provide patterns which are successively compared to until a match is found (or otherwise).

The structure of the `match`/`case` expression is superficially (syntactically) similar to the `switch`/`case` statements available in C and Java, etc., but can only be used in a certain way for pattern matching which is very common in functional languages e.g. `Haskell` or `Scala`.

In [13]:
# Note: all this code won't run without a SyntaxError as 3.10 is not available yet!
# So here I create a custom iPython 'magic' method to ignore the errors:

from IPython.core.magic import register_cell_magic

# Adapted from a method suggested in https://stackoverflow.com/questions/40110540/
@register_cell_magic('assume_future')
def assume_future(line, cell):
    try:
        exec(cell)
    except Exception as e:
        print("It all works in version 3.10. Be patient!")

As a basic example that applies the basic pattern matching I am learning about in my recent foray of Haskell:

In [14]:
%%assume_future

match some_list:
    case []:
        print("This list is empty")
    case [x]:
        print("This list has one element.")
    case _:
        print("This list has multiple elements!")

It all works in version 3.10. Be patient!


Another nice use case is to match against a set of named constants:

In [15]:
%%assume_future

from enum import Enum

class Response(Enum):
    FIGHT = 0
    FLIGHT = 1

# Note: this won't run without a SyntaxError as 3.10 is not available yet!
match response:
    case Response.FIGHT:
        person.fight()
    case Response.FLIGHT:
        person.run()

It all works in version 3.10. Be patient!


For further detail, there is a nice tutorial straight "from the horses' mouth": https://github.com/gvanrossum/patma/blob/master/README.md#tutorial

--------