# New Python Language Features

These are some recently added (3.5+) Python language features that are generally useful. I will note what version they're introduced in and if later versions include additional functionality. The linked PEP (Python Enhancement Proposal) documents go into the feature rationale in detail but are not necessarily the best introduction.

1. [F-strings](./f-strings.ipynb)\* (3.6) — [PEP 498](https://www.python.org/dev/peps/pep-0498/), [PEP 701](https://peps.python.org/pep-0701/)
2. [Type Hints](./type-hints.ipynb)\* (3.5) — [PEP 484](https://www.python.org/dev/peps/pep-0484/) & [many more](https://peps.python.org/topic/typing/)
3. [Data Classes](https://docs.python.org/3/library/dataclasses.html) (3.7) — [PEP 557](https://www.python.org/dev/peps/pep-0557/)
4. [Walrus `:=` Operator](https://docs.python.org/3/reference/expressions.html#grammar-token-python-grammar-assignment_expression) a.k.a "Assignment Expressions" (3.8) — [PEP 572](https://www.python.org/dev/peps/pep-0572/)
5. [Match...Case](https://docs.python.org/3/tutorial/controlflow.html#match-statements) a.k.a. "Structural Pattern Matching" (3.10) — [PEP 634](https://www.python.org/dev/peps/pep-0634/), [PEP 635](https://www.python.org/dev/peps/pep-0635/), [PEP 636](https://www.python.org/dev/peps/pep-0636/) (tutorial)


\* These features have their own, separate notebooks in this repo and are not covered in this one.

## 3. Data Classes

Data Classes skaffold out a lot of fundamental class methods and structure from a few type hints. They're a lighter, more readable way to define classes of a certain, common structure. They've obviously useful for "data" types, like information from a database or API, but that name undersells their utility a bit. They also use a decorator instead of class inheritance which may be easier to understand for folks not used to object-oriented programming.

In [1]:
from dataclasses import dataclass

# dataclass decorates a class
@dataclass
class Person:
    name: str
    age: int
    location: dict

Athena = Person("Athena", 25, {"city": "New York", "state": "NY"})
print(Athena)

Person(name='Athena', age=25, location={'city': 'New York', 'state': 'NY'})


What did the dataclass do? It made it unnecessary to define an `__init__` constructor to handle creating our object from a series of arguments, instead it just takes the class properties and their type hints and writes that method for us. The code below does the same thing with extra boilerplate:

In [None]:
class PersonClass:
    def __init__(self, name, age, location):
        self.name = name
        self.age = age
        self.location = location

This is only a small part of the utility because `@dataclass` fills in [many other fundamental methods](https://peps.python.org/pep-0557/#abstract) like ones that determine object sorting, equality, and string representation (notice how nice `print(Athena)` is?).

How could we envision using this? I often consume data from a discovery layer or ILS API where records are returned as JSON with a certain structure. Dataclasses make it easier to map these JSON dicts to classes and then maybe utility methods that aid my work. Below, I create a `Patron` class from a standard JSON API response but with an added `is_expired` method.

In [22]:
# Pretend API response from an ILS
from datetime import date


patrons = [
    {"name": "Athena", "age": 25, "expiration_date": "2022-01-01", "homebranch": "OAK"}
]

@dataclass
class Patron:
    name: str
    age: int
    expiration_date: str
    homebranch: str = "SF" # default value

    def is_expired(self) -> bool:
        return date.fromisoformat(self.expiration_date) < date.today()

patrons = [Patron(**p) for p in patrons] # convert dicts to Patron objects
print(patrons[0])

Patron(name='Athena', age=25, expiration_date='2022-01-01', homebranch='OAK')


There are some notable pitfalls and limitations of dataclasses. First, the type values are not enforced (though our code editor might warn us about them). If we pass in the wrong types, they're added as attributes with no errors, creating potentially unexpected behavior. Notice how we had to parse the `birthdate` string into a `date` object in the example above? Dates coming from JSON APIs are strings so they have to be converted somewhere, simply specifying `expirdation_date: date` as a type hint won't do it.

In [23]:
# none of these values make sense because we passed them in the wrong order
wacky = Patron(35, "Zeno", {"favorite_book": "The Poetics Pt. II"}, None)
print(f"Name: {wacky.name} ({type(wacky.name).__name__}), Age: {wacky.age} ({type(wacky.age).__name__})")

Name: 35 (int), Age: Zeno (str)


Secondly, properties with default values have to be defined after those without defaults. If we think of how the constructor function works, this makes sense—we cannot omit an argument with a default value and then pass in a value for the next argument.

In [27]:
p = {
    "name": "Athena",
    "homebranch": "OAK",
    "expiration_date": "2024-05-30",
}
# If our patron dicts from an API looks like the above and we want to specify a default value for "homebranch"
# we have to do so for expiration_date too, or rearrange the dict before passing it in
@dataclass
class Patron:
    name: str
    homebranch: str = "SF"
    expiration_date: str|None = None # have to specify a default value


Note: [`namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple) is a similar, older feature that's more lightweight but less flexible. It's a good choice for simple, immutable objects.

## 4. Walrus `:=` Operator

Assign a variable as part of an expression, e.g. in an `if` condition or as part of a list comprehension, with `var_name := value`. This can save a line or two of code and make things more concise, but at the risk of readability. Sometimes code linters even warn against assignments during expressions.

In [27]:
# example from the 3.8 release notes: use result from if-condition calculation
if (n := len([1,2,3])) > 2:
    print(f"List is too long ({n} elements, expected <= 2)\n")

List is too long (3 elements, expected <= 2)



In [28]:
# example: checking for a pattern and storing information about the match
import re

titles = ["2001: A Space Odyssey", "Space Balls", "300"]
for title in titles:
    if matches := re.findall(r"\d", title): # remember result of `re`
        print(f'{len(matches)} digits in string "{title}": {matches}')

4 digits in string "2001: A Space Odyssey": ['2', '0', '0', '1']
3 digits in string "300": ['3', '0', '0']


Below is an example using list comprehensions; given a list of numbers, some of which are represented as strings, we both filter the list to numbers over 20 but also normalize values to the `int` type. This kind of processing happens when normalizing names, call numbers, etc.

In [47]:
# normalize while filtering a list (saves a function call)
nums = [12, "24", "2", 22, "34"]
print("Numbers over 20:", [n for num in nums if (n := int(num)) > 20])

Numbers over 20: [24, 22, 34]


We could already accomplish this by repeating the call to `int` inside the list comprehension:

```py
[int(num) for num in nums if int(num) > 20]
```

but that repeats the work of casting the number to an `int` twice. This is virtually unnoticeable when converting a few strings to integers, but could result in a lot of duplicated work if our iterable is large and our processing function expensive.

## 5. Match...Case

Python has always lacked a control flow operators where a single variable or expression is checked against a variety of potential values, each with its own code blocks, and a fallback. The reasoning, I believe, was that we can already accomplish such a structure with the regular `if-elif-else` conditions. So what in JavaScript would be

```js
let temp = 'warm'
switch (temp) {
    case 'cold': // "break" needed to prevent fall through, bad JS feature
        console.log('eww cold porridge'); break
    case 'warm':
        console.log('mmm delicious'); break
    case 'hot':
        console.log('I burned my tongue!'); break
    default:
        console.log('Unknown temperature!'); break
}
```

in Python is

```python
temp = 'warm'
if temp == 'cold':
    print('eww cold porridge')
elif temp == 'warm':
    print('mmm delicious')
elif temp == 'hot':
    print('I burned my tongue!')
else:
    print('Unknown temperature!')
```

There's some obvious repetition in the Python version, we must repeat `temp ==` for each condition. The new "structural pattern matching" feature provides both a case-based control flow that looks less repetitive but also some neat ways to introspect into data structures.

In [49]:
# our example rewritten with match...case
temp = 'warm'
match temp:
    case 'cold':
        print('eww cold porridge')
    case 'warm':
        print('mmm delicious')
    case 'hot':
        print('I burned my tongue!')
    case _: # this is the fallback/default
        print('Unknown temperature!')

mmm delicious


There are some decent reasons to prefer this over a lengthy `if` condition: it's a bit less repetitive, the indentation makes it more clear all these conditions are related to each other, and it's arguably easier to add or remove new conditions. But it's not a huge difference.

But the real power of `match...case` is in the ability to destructure data structures in the conditions. It allows much more nuanced pattern matching than simple equality and also lets us assign variables while matching. For an example, let's look at using match to analyze a Pymarc Subfield (which is a named tuple with `code` and `value` properties like `Subfield(code='a', value='Title String')`).

In [5]:
from pymarc import Subfield

subject_subfield = Subfield(code='a', value='Creative non-fiction')
match subject_subfield:
    case Subfield(code="a", value=value):
        print("Topical term or geographic name entry element: ", value)
    case Subfield(code="b", value=topical_term):
        print("Topical term following geographic name entry element: ", topical_term)
    case Subfield(code="c", value=location) if location == "Venice": # we can add an if-condition to a case
        print("It's {location}!")
    case Subfield(code="c", value=location):
        print("Location of event (spoilers: it ain't Venice): ", location)
    case Subfield(code='v', value=genre):
        print('Genre: ', genre)
    case Subfield(code=code, value="2666"):
        print(f"Whatever this {code} subfield is, it's value is 2666.")
    case _: # this is the fallback/default
        print('Unknown subfield!')

Topical term or geographic name entry element:  Creative non-fiction


The match looks _inside_ the subfield object to see if it matches, we can specify a sort of structure for the pattern, and we can assign pieces of the structure to variables. If my example is too confusing, the official docs have [a more direct (x, y) coordinate example](https://docs.python.org/3/whatsnew/3.10.html#patterns-with-a-literal-and-variable).

Sort of like dataclasses, I see tremendous value in `match` for working with APIs or variable data structures, _especially deeply nested ones_. One of my least favorite parts of working with any data is when it's deeply nested and each layer is uncertain. I end up writing lots of `if` conditions to check if a key exists, then I need type checks to do different things if it's a list, a dict, etc. `match` makes this _much_ easier.

In [26]:
# not every API is this poorly behaved, but some are...look at the locations
patrons = [
    {"name": "Hephaestus", "location": None},
    {"name": "Artemis", },
    {"name": "Poseidon", "location": "The Ocean"},
    {"name": "Hera", "location": {"city": "Athens", "country": "Greece"}},
    [1, 2, 3]
]

for patron in patrons:
    match patron:
        case {"name": name, "location": location} if isinstance(location, str):
            print(f"{name} is in {location}.")
        case {"name": name, "location": {"city": city, "country": country}}:
            print(f"{name} is in {city}, {country}.")
        case {"name": name}: # catches the first two cases, anyone with a name, has to gp last
            print(f"{name} has no location.")
        case _:
            print("We don't even have a name for this patron!")

Hephaestus has no location.
Artemis has no location.
Poseidon is in The Ocean.
Hera is in Athens, Greece.
We don't even have a name for this patron!


Despite the `location` property being either absent, `None`, a string, or a dict, we can pretty easily parse out its contents. This might have involved a lot of nested `if` conditions checking for the existence and type of the location otherwise.

The blog post ["Real-world match/case"](https://nedbatchelder.com/blog/202312/realworld_matchcase.html) does a good job illustrating this sort of usage with an "event" object from the GitHub API which has different properties depending upon the type of event being represented.

A concrete example of structures like this are XML documents after they're parsed with `xmltodict`, a library that converts an XML document tree into a nested dictionary. If our documents vary in structure and have deeply nested elements (happens quite a bit in MODS, for instance), then code can look like this:

```python
name = xml.get("mods", {}).get("name", {}).get("namePart", {})
# name can be either a list or a string or another dict if there are further nested elements!
```