# GHSC HazDev Summer Python Tutorial Series
## Typing, Pydantic, and Object Oriented Programming
August 11, 2021


## Python Typing

### Dynamically typed  
Can assign different types to one variable

### Strongly typed  
Types are not automatically converted

In [1]:
# Dynamically typed
a = 1
print(type(a))
a = "1"
print(type(a))

# Strongly typed
try:
    "1" + 1
except Exception as e:
    print(e)

<class 'int'>
<class 'str'>
can only concatenate str (not "int") to str


## Type Hints

- Python 3.6+
- Documentation as code
- Useful for IDE intellisense
- Use `mypy` to check for errors in a CI/CD pipeline

  > The Python **runtime does not enforce function and variable type annotations**.
  > They can be used by third party tools such as type checkers, IDEs, linters, etc.  
  > https://docs.python.org/3/library/typing.html


### Typing Syntax


In [2]:
# variable example
# name: type = <value>
# (or name = <value>)
name: str = "World"


# function example
# def function_name(<parameters>) -> return_type:
def greet(greeting: str, name: str) -> str:
    return f"{greeting}, {name}"


result = greet("Hello", name)
print(result)

Hello, World


### Type Hints Example

In [3]:
from examples.using_json import get_catalog
import inspect

print(inspect.getsource(get_catalog))


def get_catalog(
    starttime: datetime,
    endtime: datetime,
    url: str = DEFAULT_URL,
    **kwargs,
) -> Dict:
    """Load events from the USGS Earthquake Web Service.

    Arguments
    ---------
    starttime:
        minimum event time
    endtime:
        maximum event time
    url:
        url for FDSN Event web service query endpoint
    **kwargs
        other keyword arguments are passed as query parameters
        see https://earthquake.usgs.gov/fdsnws/event/1/

    Returns
    -------
    GeoJson Feature Collection as Dict.
    """
    response = requests.get(
        url=url,
        params={
            "endtime": endtime.isoformat(),
            "format": "geojson",
            "starttime": starttime.isoformat(),
            **kwargs,
        },
    )
    logging.info(f"Loaded {response.url}")
    response.raise_for_status()
    # response is json (a geojson featurecollection)
    catalog = response.json()
    return catalog



In [4]:
# load events and show event data structure
from dateutil.parser import isoparse
from examples.using_json import get_catalog
from examples.time import parse_milliseconds

catalog = get_catalog(
    starttime=isoparse("2021-01-01"),
    endtime=isoparse("2021-08-11"),
    producttype="finite-fault",
)
events = catalog["features"]
events[0]


{'type': 'Feature',
 'properties': {'mag': 8.2,
  'place': '104 km SE of Perryville, Alaska',
  'time': 1627539347536,
  'updated': 1628663149409,
  'tz': None,
  'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/ak0219neiszm',
  'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak0219neiszm&format=geojson',
  'felt': 268,
  'cdi': 8.1,
  'mmi': 7.355,
  'alert': 'yellow',
  'status': 'reviewed',
  'tsunami': 1,
  'sig': 1252,
  'net': 'ak',
  'code': '0219neiszm',
  'ids': ',ak0219neiszm,us6000f02w,at00qwzteb,pt21210001,usauto6000f02w,',
  'sources': ',ak,us,at,pt,usauto,',
  'types': ',dyfi,finite-fault,general-text,ground-failure,impact-link,internal-moment-tensor,internal-origin,losspager,moment-tensor,oaf,origin,phase-data,shakemap,trump-losspager,trump-shakemap,',
  'nst': None,
  'dmin': None,
  'rms': 0.8,
  'gap': None,
  'magType': 'mww',
  'type': 'earthquake',
  'title': 'M 8.2 - 104 km SE of Perryville, Alaska'},
 'geometry': {'type': 'Point', 'co

In [5]:
# format list of events
print(f"{len(events)} matching events:")
for event in events:
    props = event["properties"]
    time = parse_milliseconds(props["time"]).date().isoformat()
    print(f"{time} - M{props['mag']:.1f} {props['place']}")


11 matching events:
2021-07-29 - M8.2 104 km SE of Perryville, Alaska
2021-07-21 - M6.7 71 km S of Punta de Burica, Panama
2021-07-08 - M6.0 Antelope Valley, CA
2021-05-21 - M7.3 Southern Qinghai, China
2021-03-20 - M7.0 30 km E of Ishinomaki, Japan
2021-03-04 - M8.1 Kermadec Islands, New Zealand
2021-03-04 - M7.4 Kermadec Islands, New Zealand
2021-02-13 - M7.1 73 km ENE of Namie, Japan
2021-02-10 - M7.7 southeast of the Loyalty Islands
2021-01-19 - M6.4 26 km SW of Pocito, Argentina
2021-01-11 - M6.7 29 km SSW of Turt, Mongolia


## VSCode Extensions

Python specific:
- Jupyter
- Mypy
- Python
  - (try the Black formatter)

General:
- Formatting Toggle
- GitLens
- Prettier


## Pydantic

- python 3.6+
- uses type hints for JSON parsing/formatting
- **enforces type hints at runtime**
- creates automatic constructor (more later on this)
- override validators and parsing logic
- used by FastAPI for web services, and Typer for command line interfaces

### Pydantic syntax

In [6]:
import json

from pydantic import BaseModel


class User(BaseModel):
    id: int
    name: str


user = User(id=1, name="Jill")
print(user.name)
print(user.json())

user2_json = """
{
    "id": 5, 
    "name": "Jack"
}
"""
user2 = User(**json.loads(user2_json))
print(user2.json(exclude={"id"}))

Jill
{"id": 1, "name": "Jill"}
{"name": "Jack"}


### JSON Schema

In [7]:
print(User.schema_json())

{"title": "User", "type": "object", "properties": {"id": {"title": "Id", "type": "integer"}, "name": {"title": "Name", "type": "string"}}, "required": ["id", "name"]}


### Validation

In [8]:
from pydantic import ValidationError

try:
    User(id="abc", name="123")
except ValidationError as e:
    print(e)

1 validation error for User
id
  value is not a valid integer (type=type_error.integer)


### Pydantic example

In [9]:
# show function source from 'examples/using_pydantic/get_catalog.py
from examples.using_pydantic import get_catalog

print(inspect.getsource(get_catalog))

def get_catalog(
    starttime: datetime,
    endtime: datetime,
    url: str = DEFAULT_URL,
    **kwargs,
) -> EarthquakeCatalog:
    """Load events from the USGS Earthquake Web Service.

    Arguments
    ---------
    starttime:
        minimum event time
    endtime:
        maximum event time
    url:
        url for FDSN Event web service query endpoint
    **kwargs
        other keyword arguments are passed as query parameters.
        see https://earthquake.usgs.gov/fdsnws/event/1/

    Returns
    -------
    List of matching event geojson features.
    """
    response = requests.get(
        url=url,
        params={
            "endtime": endtime.isoformat(),
            "format": "geojson",
            "starttime": starttime.isoformat(),
            **kwargs,
        },
    )
    logging.info(f"Loaded {response.url}")
    response.raise_for_status()
    # parse response (a geojson featurecollection)
    return EarthquakeCatalog(**response.json())



In [10]:
# load events and show event data structure
from dateutil.parser import isoparse
from examples.using_pydantic import get_catalog

catalog = get_catalog(
    starttime=isoparse("2021-01-01"),
    endtime=isoparse("2021-08-11"),
    producttype="finite-fault",
)
events = catalog.features
events[0]


EarthquakeSummaryFeature(type='Feature', id='ak0219neiszm', properties=EarthquakeSummaryFeatureProperties(alert='yellow', cdi=8.1, code='0219neiszm', detail='https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak0219neiszm&format=geojson', dmin=None, felt=268, gap=None, ids={'at00qwzteb', 'pt21210001', 'us6000f02w', 'ak0219neiszm', 'usauto6000f02w'}, mag=8.2, magType='mww', mmi=7.355, net='ak', nst=None, place='104 km SE of Perryville, Alaska', rms=0.8, sig=1252, sources={'ak', 'pt', 'us', 'usauto', 'at'}, status='reviewed', time=datetime.datetime(2021, 7, 29, 6, 15, 47, 536000, tzinfo=tzutc()), title='M 8.2 - 104 km SE of Perryville, Alaska', tsunami=1, type='earthquake', types={'trump-losspager', 'losspager', 'phase-data', 'trump-shakemap', 'origin', 'dyfi', 'finite-fault', 'internal-moment-tensor', 'general-text', 'internal-origin', 'shakemap', 'oaf', 'moment-tensor', 'ground-failure', 'impact-link'}, tz=None, updated=datetime.datetime(2021, 8, 11, 6, 25, 49, 409000, tzinfo=tzu

In [11]:
# format list of events
print(f"{len(events)} matching events")
for event in events:
    props = event.properties
    time = props.time.date().isoformat()
    print(f"{time} - M{props.mag:.1f} {props.place}")


11 matching events
2021-07-29 - M8.2 104 km SE of Perryville, Alaska
2021-07-21 - M6.7 71 km S of Punta de Burica, Panama
2021-07-08 - M6.0 Antelope Valley, CA
2021-05-21 - M7.3 Southern Qinghai, China
2021-03-20 - M7.0 30 km E of Ishinomaki, Japan
2021-03-04 - M8.1 Kermadec Islands, New Zealand
2021-03-04 - M7.4 Kermadec Islands, New Zealand
2021-02-13 - M7.1 73 km ENE of Namie, Japan
2021-02-10 - M7.7 southeast of the Loyalty Islands
2021-01-19 - M6.4 26 km SW of Pocito, Argentina
2021-01-11 - M6.7 29 km SSW of Turt, Mongolia


## Object Oriented Programming

- "Data with methods"
- `Class`es define functionality
- Create multiple `Object`s that are separate instances


### Class Syntax

In [12]:
# class example
# class ClassName(<parent class,es>):
class Greeter(object):
    """Docstring for class.

    Goes here instead of with __init__ method.
    """

    # list attributes with type hints
    greeting: str

    def __init__(self, greeting="Hello"):
        # __init__ is called when creating instance, and initializes state
        self.greeting = greeting

    def greet(self, name: str) -> str:
        # member functions include "self" as first parameter,
        # which is reference to instance of class
        return f"{self.greeting}, {name}"


# create a new instance
greeter = Greeter()
# call the instance greet method
print(greeter.greet("world"))

Hello, world


### Key concepts

#### Encapsulation

Hiding internal state, by using "public", "protected", or "private" attributes and methods.  

Python uses conventions to label different types of attributes, but does not completely restrict access.

- `__private` attributes are "mangled" so subclasses use different internal names to avoid conflicts.  This makes it difficult to access outside a class.

- `_protected` attributes indicate a class uses the attribute for internal state and should not be modified.
  
- `public` attributes are intended for public access and/or modification.

As a general rule, use `public` and `_protected` attributes for testing and inheritance.


In [13]:
class EncapsulationDemo(object):
    # use private variable to store property value
    __name: str

    @property
    def name(self):
        print(f"called name getter")
        return self.__name

    @name.setter
    def name(self, name: str):
        print(f"called name setter with {name}")
        self.__name = name.capitalize()


instance = EncapsulationDemo()
instance.name = "hello"
print(instance.name)

called name setter with hello
called name getter
Hello


#### Abstraction

Hiding implementation details, by providing a simple or generic interface.


In [14]:
from typing import List


class AbstractDataFactory:
    def __init__(self):
        pass

    # this interface hides details about how to get data
    def get_data(self) -> List[str]:
        raise NotImplementedError()


#### Inheritance

Extending/overriding existing behavior.

In [15]:
class JsonDataFactory(AbstractDataFactory):
    file: str

    def __init__(self, file: str):
        # must call the base class __init__ method
        super().__init__()
        self.file = file

    # this is a simpler interface that hides loading/parsing details
    def get_data(self) -> List:
        data = self._load_data()
        return self._parse_data(data)

    def _load_data(self) -> bytes:
        # read from some data source, return bytes
        return "data from file".encode()

    def _parse_data(self, data: bytes) -> List:
        # parse json, convert to rows
        return [data.decode()]


class DatabaseDataFactory(JsonDataFactory):
    # get_data, and _parse_data are inherited from parent class unless overridden
    db_url: str

    def __init__(self, db_url: str):
        # must call the base class __init__ method
        super().__init__(file="file")
        self.db_url = db_url

    def _load_data(self) -> bytes:
        # connect to database and read json data
        return "data from database".encode()


#### Polymorphism

Using subclasses in place of a parent class.

In [16]:
class DataReport:
    # just needs data, so works with base class
    factory: AbstractDataFactory

    def __init__(self, factory: AbstractDataFactory):
        self.factory = factory

    def format(self) -> str:
        data = self.factory.get_data()
        # format report using data
        return "\n".join(data)


print("Using json factory:", DataReport(JsonDataFactory("file")).format())
print(
    "Using database factory:",
    DataReport(DatabaseDataFactory("sqlite:///test.db")).format(),
)


Using json factory: data from file
Using database factory: data from database


### Magic Methods

- `__init__`
  constructor for new instance of a class
- `__lt__`, `__eq__`, ...
  comparison operators  
  check out [https://docs.python.org/3/library/functools.html#functools.total_ordering](https://docs.python.org/3/library/functools.html#functools.total_ordering) 
- `__len__`


> [https://docs.python.org/3/reference/datamodel.html#basic-customization](https://docs.python.org/3/reference/datamodel.html#basic-customization)

### SOLID Design Principles

> Robert C. Martin, 2000, Design Principles and Design Patterns, [https://fi.ort.edu.uy/innovaportal/file/2032/1/design_principles.pdf](https://fi.ort.edu.uy/innovaportal/file/2032/1/design_principles.pdf)


#### **S**ingle-responsibility principle

A class should have one, and only one, reason to change.

Use separate classes for Data IO and Processing.

#### **O**pen–closed principle

You should be able to extend a classes behavior, without modifying it.

Take advantage of polymorphism and inheritance to extend/modify behavior.

#### **L**iskov substitution principle

Derived classes must be substitutable for their base classes.

#### **I**nterface segregation principle

Make fine grained interfaces that are client specific.

Some clients may only need to read data, and an interface for reading may be easier to implement than one for both reading and writing.

#### **D**ependency inversion principle

Depend on abstractions, not on concretions.

By accepting abstract/base classes as parameters, code is more easily reused.


### Migrating Existing Projects

- Look for code that could benefit from _Encapsulation_, _Abstration_, _Inheritance_, and _Polymorphism_.

- Similar sequences of if statements spread across a codebase are great candidates.  Extracting to a class can make it easier to support new models/inputs/etc.

- Start small.

- Add a class to implement behavior, and refactor existing code to use the new class. (Tests make refactoring easier).

- Don't overthink abstractions up front.
