# Types in Python
## Alex Rutherford
### X September

# The headlines

- Python is a dynamically typed language
- This trades off ease of developing for lack of robustness
- Types should be part of documentation
- The `typing` library helps enforce typing

# A (Horror) Story about Python Types

<video controls src="./figs/python_type_example.mov" />

** Using [OneCompiler](https://onecompiler.com/python/42qgur5e6)

## What happened?

- `random.random` is a _function_
- `random.random()` calls a function which returns a _float_ 
- In Python 2.6 Boolean comparisons OK between most types
- `if x:` returns `True` unless x is `0` or `None`

# A Common Type Issue

In [73]:
def get_last_add_one(my_list):
    '''A simple function to get the last element of a list and add one'''
    return my_list[-1] + 1

In [56]:
my_list = [1,2,3]

In [57]:
get_last_add_one(my_list)

4

In [71]:
my_dict = {'a' : 1, 'b' : 2, 'c' : 3}

In [59]:
get_last_add_one(my_dict)
# Fails because dictionary doesn't have numerical indexing

TypeError: pop expected at least 1 argument, got 0

# Type basics I

- Everything in code has a type
- Some are basic data structures: int, array, dictionary (hash map)
- Some are language specific: `struct`, `NSObject`
- Some are user defined: `MyAwesomeClass`

# Type Basics II

- Python variables don't need to specify their type
- Python variables can change type ('Dynamic Typing') (add C example)
- `type()` is your friend
- Core Python has `int`,`float`,`str`,`list`,`dict`

```
// Declare a variable 
int myNum ;

// Assign a value to the varia ble
myNum = 
``` 15;

In [60]:
x = 1
type(x)
# Implicitly initialising an integer

int

In [61]:
x = 0.5
type(x)
# Now a float

float

In [62]:
x = 'Hello world'
type(x)
# Now a string

str

In [63]:
type(None)

NoneType

# Aside

- Arises in pySpark
- Pandas dataframes and pySpark dataframes coexist
```
df = get_spark_df()
...
df = df.toPandas()
...
# Disaster
```
- Better to use different variables
```
sdf = get_spark_df()
...
pdf = df.toPandas()
...
# Not disaster
```

# Possible Solutions

1. Verbose documentation
2. Explicit `type()` checks
3. Use a capable editor
4. Add function annotations (`typing` library)

# 1. Verbose Documentation i.e. Doc Strings

## _What makes a good Doc String?_

- Only explain non trivial code
- Describe what the code does
- Describe the arguments and the return value


- See [PEP 257](https://peps.python.org/pep-0257/)

In [64]:
def get_last_add_one(my_list):
    '''Takes my_list of type list and returns the last entry plus one'''
    return my_list[-1] + 1

# This is fine

In [4]:
def get_last_add_one(my_list):
    '''Takes a list and returns the last entry plus one
    Parameters
    -------
    my_list : list of integers

    Returns
    ------
    integer
    '''
    return my_list[-1] + 1

# This is better

In [None]:
?get_last_add_one

# 2. Explicit `type()` checks

In [65]:
def get_last_add_one(my_list):
    '''Takes my_list of type list and returns the last entry plus one'''
    
    if not type(my_list) == list:
        raise TypeError('my_list must be of type list') 
    else:
        return my_list[-1] + 1

In [15]:
get_last_add_one({'a': 1})
# This gives an error

KeyError: -1

In [17]:
get_last_add_one({-1: 1})
# This gives unintended behaviour

2

# 3. Use a Capable Editor

![alt text](./figs/Capture.png)

# 4. Add function annotations (`typing` library)

In [11]:
from typing import List

def get_last_add_one(my_list:List[int]) -> int:
    '''Takes my_list of type list and returns the last entry plus one'''
    return my_list[-1] + 1

In [7]:
get_last_add_one.__annotations__

{'my_list': typing.List[int], 'return': int}

In [8]:
?get_last_add_one

[1;31mSignature:[0m [0mget_last_add_one[0m[1;33m([0m[0mmy_list[0m[1;33m:[0m [0mList[0m[1;33m[[0m[0mint[0m[1;33m][0m[1;33m)[0m [1;33m->[0m [0mint[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Takes my_list of type list and returns the last entry plus one
[1;31mFile:[0m      c:\users\alex.rutherford\appdata\local\temp\ipykernel_3936\1227971960.py
[1;31mType:[0m      function

In [12]:
get_last_add_one({'a' : 1})

KeyError: -1

- Clear documentation
- Helps IDEs do type checking
- Helps linters e.g. mypy (add details)

- See [StackOverflow Discussion](https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5)

# Look at some examples

In [77]:
import matplotlib.pyplot as plt

In [78]:
plt.scatter.__annotations__

{'x': 'float | ArrayLike',
 'y': 'float | ArrayLike',
 's': 'float | ArrayLike | None',
 'c': 'ArrayLike | Sequence[ColorType] | ColorType | None',
 'marker': 'MarkerType | None',
 'cmap': 'str | Colormap | None',
 'norm': 'str | Normalize | None',
 'vmin': 'float | None',
 'vmax': 'float | None',
 'alpha': 'float | None',
 'linewidths': 'float | Sequence[float] | None',
 'edgecolors': "Literal['face', 'none'] | ColorType | Sequence[ColorType] | None",
 'plotnonfinite': 'bool',
 'return': 'PathCollection'}

In [79]:
print(plt.scatter.__doc__)

A scatter plot of *y* vs. *x* with varying marker size and/or color.

Parameters
----------
x, y : float or array-like, shape (n, )
    The data positions.

s : float or array-like, shape (n, ), optional
    The marker size in points**2 (typographic points are 1/72 in.).
    Default is ``rcParams['lines.markersize'] ** 2``.

    The linewidth and edgecolor can visually interact with the marker
    size, and can lead to artifacts if the marker size is smaller than
    the linewidth.

    If the linewidth is greater than 0 and the edgecolor is anything
    but *'none'*, then the effective size of the marker will be
    increased by half the linewidth because the stroke will be centered
    on the edge of the shape.

    To eliminate the marker edge either set *linewidth=0* or
    *edgecolor='none'*.

c : array-like or list of colors or color, optional
    The marker colors. Possible values:

    - A scalar or sequence of n numbers to be mapped to colors using
      *cmap* and *norm*.
   