# Modern Python

If you haven't been doing python for about the last 3 years, or you haven't been keeping up with all the changes
since 3.7, then python may feel like a new language for you.

This notebook will be a gentle introduction to python for engineers familiar with programming in another language. After
the basics are out of the way, an in depth coverage of modern features will be covered.  This guide will also introduce
you to popular python packages, and how to best manage your python project.

- Prerequisites
    - Installing VS Code
    - Installing python devel deps
    - Installing a python version manager
    - Virtualenvs
- The Basics
    - Basic Data types
        - Maps, lists, sets, int, num, str, bytes, file-like 
    - Intermediate data types
        - Iterables, Sequences, Unions, Intersections, Callable
    - Dev Tools part 1
        - poetry
        - autopep8
        - ruff
        - git hooks
    - Iterator protocol
        - Comprehensions (list, dict, and generators)
    - Scoping
        - Closures
        - nonlocal and global
    - Functions
        - Pass by reference (vs Pass by value)
        - The multitude of argument passing options
        - decorators
    - Modules, packages and imports
    - Classes
        - dataclass types (dataclasses, attrs, pydantic, etc)
        - property
        - slots
    - Exceptions
    - Context Managers
- Advanced
    - Generators
    - concurrency in python
        - asyncio
            - event loop and runners
            - Futures, Tasks, and Coroutines
            - aiohttp
        - multiprocessing and threading
        - [ray framework](https://ray.io) (actors)
    - Inheritance
        - Simple Inheritance
        - Multiple Inheritance and MRO
        - Difference between `__new__` and `__init__`
    - Type Theory: [PEP-483](https://peps.python.org/pep-0483/)
        - Protocols/Subtyping: [PEP-544](https://peps.python.org/pep-0544/)
        - ParamSpec: [PEP-612](https://peps.python.org/pep-0612/)
        - TypeVar and Generic: [PEP-484](https://peps.python.org/pep-0484/)
        - Variadic Generics: [PEP-646](https://peps.python.org/pep-0646/)
        - Literal and LiteralString: [PEP-675](https://peps.python.org/pep-0675/)
        - New type annotation syntax: [PEP-695](https://peps.python.org/pep-0695/)
    - Jupyter plotting
    - Dataframes with [duckdb](https://duckdb.org/), [polars](https:/pola.rs) and [pyarrow](https://arrow.apache.org/docs/python/index.html)
        - what pyarrow is
        - querying with parquet, arrow and ndjson
    - Faster Python
        - Write a native python module [using pyo3](https://pyo3.rs) written in rust
        - Python profiling
            - Why and when
            - cProfile
            - [scalene](https://github.com/plasma-umass/scalene)
        - [Mojo](https://docs.modular.com/mojo/why-mojo.html): a superset of python (once available locally, and not on the playground)

In [None]:
# Install a python development environment
# 
# If you are coming from a brand new environment (eg, a new laptop or installing to a docker image), you should use the
# module as a callable with `python -m excursor.core.installer`.  If you already have the following, you can run
# this notebook instead: 
# - python 3.9+
# - pipx

from excursor.core.installer import Installer
from excursor.core.installer_39 import PythonDevel
from excursor.core.process import Run

# Set up python devel deps
pd = PythonDevel()
# await pd._install_sysdeps()
# await pd._install_asdf()
# await pd._install_poetry()
await pd._create_venv()
# await pd._create_project("teaching", ["dev", "ds", "notebook" "data"])


## Practical Projects

In order to help digest and make the training more useful, we will build a couple of small python projects to make the
concepts more clear.

- Command line Http Client
    - Start off using the requests library
    - Write data models with pydantic and dataclasses to (de)serialize data
    - Make multiple requests simultaneously with aiohttp/trio
- S3 Library and CLI client
    - get, list, download and upload files to s3
    - parallelize the above with the ray framework
- Duckdb and polars to query JSON, parquet and arrow

## Why Python

So why am I doing this?  I want to show you how python is becoming a very important language and is challenging the reigning kings
in several domains.  It would behoove every engineer to learn this language.  

- It is becoming a universal language for all domains
- Machine Learning, Machine Learning, Machine Learning (did I say Machine Learning?)
- notebooks for data analytics, business reporting, visualization and general experimentation
- Big Data
- serverless
- statistical analysis

### Universal Language

I used to believe there were 3 languages every engineer should learn:

- typescript: for the web world
- python: for automation and science (including data science/machine learning)
- rust: for low-level "run anywhere" (including the OS and embedded)

Two things have changed though.  Webassembly is becoming more mature all the time, and the possibility of compiling python to wasm is already here in limited form, thus making it not really necessary to learn typescript/javascript (other than working on legacy code).  The second is the upcoming language Mojo, which aims to be a systems programming language specialized for Machine Learning.  Like rust or C, it will have no garbage collector (if pure mojo code), but at the same time, it will be a superset of the python language.

### Machine Learning (it's coming...maybe for us)

With the rise of Machine Learning, python has become an extremely important language.  Whether you like python or not, it is the lingua franca of Machine Learning.  Modular, the company creating mojo, is creating it because all of their customers said the python language was a non-negotiable deal. And also like it or not, as engineers, we seriously have to consider how AI will change how and what we do.  Will AI eventually replace our jobs or just become helpers for us?    I therefore think it is very important to become good at python for this reason alone.

### Notebooks

Somewhat related to machine learning is the rise of _notebooks_.  Heavily used as experimental tools by data scientists, but also for analytics querying and reporting for BI.  Some even propose that notebooks are better than dashboards by proprietary vendors (eg, New Relic, Data Dog, Splunk, Kibbana, etc).

One of the reasons notebooks with python is so nice is because the python language is simple enough to execute code in the REPL (read-eval-print-loop for you non-lispers).  Unlike a traditional REPL, a notebook saves the code, and lets you rerun the code in the cell.  This makes it ideal for exploratory code and debugging.  This is a big reason almost all data scientists use it, but it is also ideal for ad-hoc exploratory testing, or running just a part of the code (as opposed to writing a script where you have to execute the script from the beginning.  Thus, python is an ideal candidate for QA teams, because regression suites are only half the battle.  But it's also good for experimenting to create unit tests too.

### Serverless

Another trend is serverless where execution time has become important and VM warmup is an issue.  Python import times can be just as bad as VM cold starts, so some care and profiling still needs to be done.  The rise of serverless has caused a big dent in the Java ecosystem, and is a reason Oracle has been working on their Graal technology to make native code that doesn't run on a JVM.  It's also been the fuel for languages like rust, go and swift...native machine executable code.

If/when mojo become available publicly, this is also something to keep an eye on for serverless architectures

### Statistics

Lastly, also related to machine learning, is how we will need to start testing applications that use Machine Learning.  Since AI apps give non-deterministic answers (during training, and possibly during inference depending on the model), you will need to use statistical analysis to test your models or the app using AI.  Python tools like numpy, pytorch, scipy, sympy are the industry standards (R and julia have a small share, but have been slowly dwindling).

## Calling by keyword

You can also call by keyword as shown here

In [None]:
value = basic_with_args("john", age=40)
print(value)
# basic_with_args(name="john", 40) # cant use keywordless args after using one earlier

# Note I can reassign value
value = basic_with_args(name="mary", age=32)
print(value)

value = basic_with_args("jane", age="29")  # note that "29" is invalid type,and checker will complain but this will stull work
print(value)

## Default parameters

You can also set default values.  Some care has to be taken with mutable values.  Also, default parameters can only
be declared after all non-default parameters

The code below also shows an example of a Union type: a type that can be one of several choices.  In Algebraic types, 
it is a Sum type (classes are Product types).

This example also shows how to import built in modules to get access to their functionality.

In [None]:
import os
from pathlib import Path


def writer(content: str | bytes, output: str | Path | None = None):
    """Writes content to output path

    Parameters
    ----------
    content : str | bytes
        the data to write to file
    output : str | Path | None
        path of file to write to (if none, current directory)
    """
    # We defaulted to None, so we check for it
    if output is None:
        output = Path(os.getcwd()) / "testdata"
    
    # Example of pattern matching
    match content:
        case str():      # if content is a str type
            mode = "w+"  # For text
        case bytes():    # if content is a bytes type
            mode = "wb"  # for binary add a 'b'
    with open(output, mode) as f: 
        f.write(content)

writer("just a test")  # No argument passed to `output` parameter
# Check your directory

test_path = Path("testdata2")
writer("another test", output=test_path)
bin_path = "binary_data"
writer(b'01234', bin_path)


In [None]:
# delete data files
test_path.unlink()
Path("testdata").unlink()
Path("binary_data").unlink()

## Passing lists and dicts

A common (some might say too common) occurrence in python functions is the ability to pass in lists and dicts to
python functions.  By convention, it's often called `*args` and `**kwargs`.

It's best shown with examples

In [None]:
from datetime import datetime, timedelta
from typing import Tuple


def lots_of_args(
    mapping: dict,
    answers: Tuple[int, int],
    cwd: str | None,
    completed: bool,
    start: datetime,
    end: datetime,
    timeout: timedelta
):
    print(mapping)
    print(answers)
    print(cwd)
    print(completed)
    print(start)
    print(end)
    print(timeout)
    return end + timeout

args = [
    {1: "foo"},
    (100, 200),
    None
]
kwargs = {
    "completed": True,
    "start": datetime.now(),
    "end": datetime.now(),
    "timeout": timedelta(hours=1)
}
lots_of_args(*args, **kwargs)

## Another example

This is a more generic way to write functions.  This is, IMHO poor style, and should only be done with decorators.

In [None]:
def generic(*args, **kwargs):
    print(f"arg 1 is {args[0]}")
    print(f"{args[1:]}")
    # example of iteration
    for k,v in kwargs.items():
        print(f"{k} = {v}")


generic("john", [1, 2], datetime.now(), runnable=False)

## Typing out *args and **kwargs

It is possible to type annotate *args and **kwargs which is more acceptable, but still not as clear.

## Scopes

This will be a very brief intro, but hopefully even people using python will learn a thing or two here.  The first thing
we will go over is the concept of `scopes`.  Python, unlike most modern languages is not lexically scoped.

What does `lexically scoped` mean?  Lexical means essentially, words, but it has to do with "boundaries" of your code.
In many languages with curly braces, the braces define a scope.  Variables or other symbols introduced in that scope
(eg the curly braces) only last inside the brace section.

Python is not like that.  First off, it doesn't have curly braces.  But it does have indentation, so you might think
that the indentation acts as a kind of scope, just like curly braces do.  This is incorrect.  Python really only has 
Method (or class) scope, and module scope.

The code below illustrates

In [None]:
module_scoped = 10

def lookup():
    module_scoped = 100

lookup()
print(f"module_scoped is {module_scoped}") # it's still 10

a_list = [1, 2]

def adder(some_list: list[int], val: int):
    some_list.append(val)

def replacer(some_list: list[int]):
    print(f"id of some_list = {id(some_list)}")
    some_list = []
    print(f"id of some_list is now = {id(some_list)}")

adder(a_list, 3)
print(f"a_list is now {a_list}") # this mutates a_list

print(f"id of a_list is {id(a_list)}")
replacer(a_list) # this is like lookup(), it creates a new binding, leaving the original alone
print(a_list)

In [None]:
# the global and nonlocal are rarely used.  it's kind of a code smell if they are

def globalizer():
    global foo
    foo = "testing"

globalizer()
print("here's foo {foo}")

name = "sean"

def rename(to: str):
    global name
    name = to

rename("john") # because we use global, we reassign it
print(name)

def wrapper(name: str):
    def inner(greet: str):
        #nonlocal name # try uncommenting this
        name = "sean"
        return f"{greet} {name}"
    print(inner("hi"))
    return name

name = "tony"
print(wrapper(name))


## More scoping gotchas

This is a common source of confusion for new pythonistas

In [None]:
score = 50 # try setting this to 50

def grader(check: int):
    # dont have to create a "default" grade here,  grade is scoped to the entire function
    # however, try changing score to 50, and commenting out the marked lines
    if check > 90: 
        grade = "A"
    elif check > 80:
        grade = "B"
    elif check > 70:
        grade = "C"
    elif check > 60:
        grade = "D"
    #else:              # try commenting this out
    #    grade = "F"    # and this
    return grade


grader(score)

## Closures and Functions

Understanding scopes is necessary to make use of closures and nested functions.  We already showed several examples
of functions