# Modern Python

If you haven't been doing python for about the last 3 years, or you haven't been keeping up with all the changes
since 3.7, then python may feel like a new language for you.

This notebook will be a gentle introduction to python for engineers familiar with programming in another language. After
the basics are out of the way, an in depth coverage of modern features will be covered.  This guide will also introduce
you to popular python packages, and how to best manage your python project.

- Prerequisites
    - Installing VS Code
    - Installing python devel deps
    - Installing a python version manager
    - Virtualenvs
- The Basics
    - Basic Data types
        - Maps, lists, sets, int, num, str, bytes, file-like 
    - Intermediate data types
        - Iterables, Sequences, Unions, Intersections, Callable
    - Dev Tools part 1
        - poetry
        - autopep8
        - ruff
        - git hooks
    - Iterator protocol
        - Comprehensions (list, dict, and generators)
    - Scoping
        - Closures
        - nonlocal and global
    - Functions
        - Pass by reference (vs Pass by value)
        - The multitude of argument passing options
        - decorators
    - Modules, packages and imports
    - Classes
        - dataclass types (dataclasses, attrs, pydantic, etc)
        - property
        - slots
    - Exceptions
    - Context Managers
- Advanced
    - Generators
    - concurrency in python
        - asyncio
            - event loop and runners
            - Futures, Tasks, and Coroutines
            - aiohttp
        - multiprocessing and threading
        - [ray framework](https://ray.io) (actors)
    - Inheritance
        - Simple Inheritance
        - Multiple Inheritance and MRO
        - Difference between `__new__` and `__init__`
    - Type Theory: [PEP-483](https://peps.python.org/pep-0483/)
        - Protocols/Subtyping: [PEP-544](https://peps.python.org/pep-0544/)
        - ParamSpec: [PEP-612](https://peps.python.org/pep-0612/)
        - TypeVar and Generic: [PEP-484](https://peps.python.org/pep-0484/)
        - Variadic Generics: [PEP-646](https://peps.python.org/pep-0646/)
        - Literal and LiteralString: [PEP-675](https://peps.python.org/pep-0675/)
        - New type annotation syntax: [PEP-695](https://peps.python.org/pep-0695/)
    - Jupyter plotting
    - Dataframes with [duckdb](https://duckdb.org/), [polars](https:/pola.rs) and [pyarrow](https://arrow.apache.org/docs/python/index.html)
        - what pyarrow is
        - querying with parquet, arrow and ndjson
    - Faster Python
        - Write a native python module [using pyo3](https://pyo3.rs) written in rust
        - Python profiling
            - Why and when
            - cProfile
            - [scalene](https://github.com/plasma-umass/scalene)
        - [Mojo](https://docs.modular.com/mojo/why-mojo.html): a superset of python (once available locally, and not on the playground)

## Installing prerequisites

Make sure that you have brew installed on your mac before proceeding.  Once you have brew installed, we will need at a
minimum the following:

- python 3.9 (ideally 3.11)
- pip (should have been installed with python through brew)
- pipx 

I highly recommend before proceeding to also install the Oh My Zsh project to make your zsh environment nicer.  If you
have a .zshrc file, it will make a backup of it, and install it's own basic zshrc file.  That's why I always install it
first thing when I get a new development environment.

Follow the directions on the [brew webpage](https://brew.sh/) to install brew if you don't have it already.  Next,
install python3 and some python devel libraries

In [None]:
!brew install python3
!brew install openssl readline sqlite3 xz zlib tcl-tk

### Installing pipx

pipx is a python utility that acts somewhat like `npm install -g`.  It installs a binary in (by default) your 
`~/.local/bin` PATH and puts any dependency packages in its own virtual environment.  This makes it safe for python
executables.

In [None]:
%pip install pipx

## Alternative installation

If you are brave you can try running excursor's own installer.  It's been somewhat tested on linux, but it is difficult
to test on mac, since you only get one shot technically.

In [None]:
# Install a python development environment
# 
# If you are coming from a brand new environment (eg, a new laptop or installing to a docker image), you should use the
# module as a callable
# 
# For example:  python3 -m excursor.core.installer_39
# 
# If you already have the following, you can run
# this notebook instead: 
# - python 3.9+
# - pipx

from excursor.core.installer_39 import PythonDevel

# Set up python devel deps
pd = PythonDevel()
# await pd._install_sysdeps()
await pd._install_asdf()
# await pd._install_poetry()
# await pd._create_venv()
# await pd._create_project("teaching", ["dev", "ds", "notebook" "data"])


## Practical Projects

In order to help digest and make the training more useful, we will build a couple of small python projects to make the
concepts more clear.

- Command line Http Client
    - Start off using the requests library
    - Write data models with pydantic and dataclasses to (de)serialize data
    - Make multiple requests simultaneously with aiohttp/trio
- S3 Library and CLI client
    - get, list, download and upload files to s3
    - parallelize the above with the ray framework
- Duckdb and polars to query JSON, parquet and arrow

## Why Python

So why am I doing this?  I want to show you how python is becoming a very important language and is challenging the reigning kings
in several domains.  It would behoove every engineer to learn this language.  

- It is becoming a universal language for all domains
- Machine Learning, Machine Learning, Machine Learning (did I say Machine Learning?)
- notebooks for data analytics, business reporting, visualization and general experimentation
- Big Data
- serverless
- statistical analysis

### Universal Language

I used to believe there were 3 languages every engineer should learn:

- typescript: for the web world
- python: for automation and science (including data science/machine learning)
- rust: for low-level "run anywhere" (including the OS and embedded)

Two things have changed though.  Webassembly is becoming more mature all the time, and the possibility of compiling python to wasm is already here in limited form, thus making it not really necessary to learn typescript/javascript (other than working on legacy code).  The second is the upcoming language Mojo, which aims to be a systems programming language specialized for Machine Learning.  Like rust or C, it will have no garbage collector (if pure mojo code), but at the same time, it will be a superset of the python language.

### Machine Learning (it's coming...maybe for us)

I think everyone should read this [article from Scientific American about how AI knows things it wasn't taught](https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/), because it's an eye-opener.  While AI still has some ways to go to understand high level architecture and "connect the dots", ML research is advancing _rapidly_.  The things that ML is better at humans already is concerning.  From fund managers, to doctor's diagnosis.  In a research paper on GPT4, it was given an Amazon Technical Interview.  It scored 100% and did it in 4 minutes.  Regardless of whether AI will become another tool in the toolbox that engineers use, or whether it can (or will) potentially replace engineering jobs, it would behoove engineers to become familiar with Machine Learning...and that means knowing python.

With the rise of Machine Learning, python has become an extremely important language.  Whether you like python or not, it is the lingua franca of Machine Learning.  Modular, the company creating mojo, is creating it because all of their customers said the python language was a non-negotiable deal. And also like it or not, as engineers, we seriously have to consider how AI will change how and what we do.  Will AI eventually replace our jobs or just become helpers for us?    I therefore think it is very important to become good at python for this reason alone.

### Notebooks

Somewhat related to machine learning is the rise of _notebooks_.  Heavily used as experimental tools by data scientists, but also for analytics querying and reporting for BI.  Some even propose that notebooks are better than dashboards by proprietary vendors (eg, New Relic, Data Dog, Splunk, Kibbana, etc).

One of the reasons notebooks with python is so nice is because the python language is simple enough to execute code in the REPL (read-eval-print-loop for you non-lispers).  Unlike a traditional REPL, a notebook saves the code, and lets you rerun the code in the cell.  This makes it ideal for exploratory code and debugging.  This is a big reason almost all data scientists use it, but it is also ideal for ad-hoc exploratory testing, or running just a part of the code (as opposed to writing a script where you have to execute the script from the beginning.  Thus, python is an ideal candidate for QA teams, because regression suites are only half the battle.  But it's also good for experimenting to create unit tests too.

### Serverless

Another trend is serverless where execution time has become important and VM warmup is an issue.  Python import times can be just as bad as VM cold starts, so some care and profiling still needs to be done.  The rise of serverless has caused a big dent in the Java ecosystem, and is a reason Oracle has been working on their Graal technology to make native code that doesn't run on a JVM.  It's also been the fuel for languages like rust, go and swift...native machine executable code.

If/when mojo become available publicly, this is also something to keep an eye on for serverless architectures

### Statistics

Lastly, also related to machine learning, is how we will need to start testing applications that use Machine Learning.  Since AI apps give non-deterministic answers (during training, and possibly during inference depending on the model), you will need to use statistical analysis to test your models or the app using AI.  Python tools like numpy, pytorch, scipy, sympy are the industry standards (R and julia have a small share, but have been slowly dwindling).