# High-Performance Python
Once in awhile, you will hear that "oh Python is so slow or not performant enough" or "Python is not for <i>enterprise</i> use cases since it is not fast enough" or "Python is a language for script kiddies and at best "glue" code."  
ZOMG! Scala's functional API is awesome and stands on the shoulders of the JVM. Go(lang) has goroutines that beats the pants off of Python's multithreading. Rust--I don't know what it does but it sounds trendy. 🙄 

Spare me the faux concern ~~you pretentious C++ developer~~ ... I mean good question! Python supports rapid speed of development and experimentation, expressiveness and conciseness of syntax, and high-performance frameworks for speed. At a high-level, speed up Python performance on 1 machine using vectorization, multi-threading, multi-processing, and asynchronous programming. On multiple machines, you have cluster/distributed computing frameworks: Apache Spark, Dask, Apache Beam, Ray, and others. On cloud services, you have containers and scalable, serverless computing. In this tutorial, I will show you various ways to make Python fast and scalable, so performance is not your bottleneck.  

# Debunking the "Slow" Python Myth
What do you mean by slow? Do you mean that Python runs slow, so it would cost more money when renting EC2s?  
Remember that compute time is cheap and developer time is relatively expensive. Suppose a standard laptop has 4 CPUs and 16 GB of RAM. An EC2 instance on AWS rents for just \\$0.154 per hour; an engineer costs \\$50 per hour. So for each hour of developer time you waste, you could have rented a machine for 300 hours--or you can rent 300 machines for 1 hour. A 40 hour week for on EC2 would be \$6--cheaper than your daily lunch!  
Some "faster" languages are notoriously verbose, so you end up with pages and pages of code. Python's ease of use and expressivenes gives you the super power: rapid speed of development gives you the ability to beat your competition to the market. Rapid experimentation allows you to out-maneuver your competition with better and newer features. Python's readability offers maintainability since developers come and go. Basically, can you write your code faster than somebody else: can you do in 1 hour that somebody else takes 5?  

If you need speed, Guido recommends you delegate the performance critical part to Cython:  
```"At some point, you end up with one little piece of your system, as a whole, where you end up spending all your time. If you write that just as a sort of simple-minded Python loop, at some point you will see that that is the bottleneck in your system. It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language, because for most of what you're doing, the speed of the language is irrelevant." --Guido van Rossum```  
I won't mention Cython in the tutorial since it is rarely ever used--I have never personally met anybody who wrote their custom Cython code. Things that need to be performant in Python are already written in Cython: numpy, XGBoost, SpaCy, etc. The high level idea for Cython is that Python is actually written on top of C, so for speed you can write directly in C (or C++) and import it to Python.  

If you are concerned about speed, do you know what is faster than Python? Fortran.  Then you can brush up on your Fortran to get hired by ... nobody (or worse, mainframe specialists 🤢).  

Speed of the language primarily helps with CPU bound problems and not IO bound problems--in IO bound problems, any programming language will perform roughly the same since the majority of the time, you are waiting for a network response.  

Finally, you don't really care about the raw speed of a language. You care about the expressioness and functionality of the language: its ecosystem, user-base, and mindshare. Whatever problem you can think of, Python probably has a library for that. Which you can pip install. Which you can write minimal code. With which you solve your problem. So you can call it a day. Like Newton, when you use Python, you stand on the shoulders of giants. Things just work! Imagine if you couldn't just import sklearn and run Random Forest but had to write your own random forest algorithm. Actually, don't...  
Here are some libraries just for scientific computing.  
<p align="center"><img src="images/scientific-ecosystem.png" width=600></p><br>  

In [1]:
import antigravity # a cute Easter egg; things in Python are so easy. Python has "batteries included"

## The Big O, Not Just Japanese Batman
<p align="center"><img src="images/The-Big-O.jpg" width=300></p><br>  
Data structures and algorithms (ie your functions) have different performance characteristics that can be measured by how much RAM it takes to do something and how steps it takes to do something. Fancy people use the terms <i>space and time complexity</i>, respectively. Don't be intimidated by the fancy names; I assure you it's a simple idea.  
<b>Big O notation</b>: is used to denote the worst case scenario of how things will grow. 


Let's walk through some simple examples.  
<i>list</i>:
    * to append to the end of a list, it is (an amoritized) O(1) operation. Adding to a list

to insert to the beginning of a list, then it takes O(n) operations because the element has to be 


, which is pronounced oh-enn.

deque
runtime vs space complexity  
Galvanize technical interview with anagrams  
adding strings together vs joining, adding list vs tuple 

# Vectorized Calculations with SIMD
vectorization with numpy and pandas: SIMD  
like R's apply. pd.DataFrame.apply is not vectorized but without the speed performance. It's a convenience function.  
CPU vs GPU: ALU vs FPU  
order of magnitude of speed: CPU, RAM, storage, network: have image  

# Multi-threading vs Multi-processing
threads (concurrency) (race condition, tug of war) vs processes (parallel): concurrent.futures vs multiprocessing  
concurrency vs parallelism vs asynchronous (event driven)  
give example of how the HBase was too slow and that Python was not the bottleneck--it was an IO problem    
ETLed 150 TB of data on 1 machine WITHOUT Spark  
GIL  

In [1]:
from threading import Thread


def append_and_pop(list_append, list_pop, iterations, checkpoint_iterations):
    for i in range(iterations):
        list_append.append(list_pop.pop())
        if i % checkpoint_iterations == 0:
            print(
                "length of `list_append`: {}; length of `list_pop` {}"
                .format(len(list_append), len(list_pop))
            )
    print(
        "final length of `list_append`: {}; final length of `list_pop` {}"
        .format(len(list_append), len(list_pop))
    )

In [2]:
l1 = list(range(10000000))
l2 = list(range(10000000))

thread1 = Thread(target=append_and_pop, args=(l1, l2, 10000000, 1000000))
thread2 = Thread(target=append_and_pop, args=(l2, l1, 10000000, 1000000))

thread1.start()
thread2.start()

thread1.join()
thread2.join()

length of `list_append`: 10000001; length of `list_pop` 9999999
length of `list_append`: 9891816; length of `list_pop` 10108183
length of `list_append`: 10134060; length of `list_pop` 9865940
length of `list_append`: 10000000; length of `list_pop` 10000000
length of `list_append`: 10116257; length of `list_pop` 9883743length of `list_append`: 10000000; length of `list_pop` 10000000

length of `list_append`: 10006301; length of `list_pop` 9993699length of `list_append`: 10000000; length of `list_pop` 10000000

length of `list_append`: 10052019; length of `list_pop` 9947981length of `list_append`: 10000000; length of `list_pop` 10000000

length of `list_append`: 10016316; length of `list_pop` 9983684length of `list_append`: 10000000; length of `list_pop` 10000000

length of `list_append`: 10134521; length of `list_pop` 9865479length of `list_append`: 10000000; length of `list_pop` 10000000

length of `list_append`: 10320389; length of `list_pop` 9679611
length of `list_append`: 9877198; 

In [3]:
from collections import Counter

print(l1 == list(range(10000000)))
print(l2 == list(range(10000000)))

print()

print(Counter(i == j for i, j in zip(l1, range(10000000))))
print(Counter(i == j for i, j in zip(l2, range(10000000))))

print()

print(set(range(10000000)) - set(l1))
print(set(range(10000000))  - set(l2))

print()

print(Counter(l1).most_common(10))
print(Counter(l2).most_common(10))

False
False

Counter({True: 9896565, False: 103435})
Counter({True: 9700888, False: 299112})

{9954872, 9992673, 9974936, 9926734}
{9957249, 9878581, 9799822, 9999999}

[(9799822, 2), (9878581, 2), (9957249, 2), (9999999, 2), (0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)]
[(9974936, 2), (9954872, 2), (9926734, 2), (9992673, 2), (0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)]


In [4]:
def incrementer(lst, iterations, checkpoint_iterations):
    for i in range(iterations):
        lst[0] += 1
        if i % checkpoint_iterations == 0:
            print("lst: {}".format(lst))
    print("final lst: {}".format(lst))

def decrementer(lst, iterations, checkpoint_iterations):
    for i in range(iterations):
        lst[0] -= 1
        if i % checkpoint_iterations == 0:
            print("lst: {}".format(lst))
    print("final lst: {}".format(lst))

In [5]:
lst = [0]

thread1 = Thread(target=incrementer, args=(lst, 10000000, 1000000))
thread2 = Thread(target=decrementer, args=(lst, 10000000, 1000000))

thread1.start()
thread2.start()

thread1.join()
thread2.join()

lst: [1]
lst: [233888]
lst: [796536]
lst: [403365]
lst: [1195976]
lst: [1519278]
lst: [1996572]
lst: [2254294]lst: [2279738]

lst: [2484877]
lst: [2708386]
lst: [2635015]
lst: [2652956]
lst: [3076750]
lst: [2708854]
lst: [2642597]
lst: [2410366]
lst: [2779173]
lst: [1697253]
lst: [2030757]
final lst: [2285048]
final lst: [1030758]


# Asynchronous Programming/Event-Driven Programming
asyncio is in some sense serial with indirection of a coroutine. asynchronous immplemented in 3 ways: callback (hell from nested structure), futures/promises (looks like [flat] method chaining) and flatter, async/await  
yield from  
you don't need asyncio to do asynchronous programming  
C10k problem

# Distributed Computing Frameworks
A common lesson of high performance computing: <i>High performance computing isn’t about doing one thing exceedingly well, it’s about doing nothing poorly.</i> Spark, Dask (dataframe, bag, array, futures, delayed), Beam, Ray, 
amdahl's law  
vertical vs horizontal scaling  

### MapReduce (old news)
explain what is MapReduce and disadvantages  
A `reduce` operation that is commutative and associative can be partially parallelized using a technique call <i>combiner</i>.  
Map reduce on-premise cannot be scalled up, so EMR on AWS gives you the ability to scale up when needed. Even Spark cluster is apportioned and probably isn't ideal for overallocation.  

## Apache Spark
If you want to get a high-demand skill to get a new job, learn Spark. I always say: `If you know Spark, you can't get fired!`
Simple Spark  
Spark and Dask: often a functional approach, functoolz syntax translates to Dask bags, FP allows embarrasingly parallel and laziness.  
Show Simple-Spark

## Dask
`If you learn Dask, then you know Spark. And if you know Spark, you can't get fired!`
Compare and contrast  
Show Dask tutorial  
x = x.map_blocks(compress).persist().map_blocks(decompress)

### Apache Beam
explain why it is cool: unified API btween Batch + Stream, auto-scaling, serverless, templates  
explain its weaknesses: does not support data with schema, no graph algorithms  
Through picture in. 62 machines running for less than 30 minutes for cost of $2.50.

### Ray (and Actor Model)


# Virtual Machines vs Containers
VM vs Docker/containerization, multi-tenancy  
serverless computing, scalable/decoupled, notifications and PubSub connected to AWS Lambda  
world has moved on from hardware -> co-location -> virtualization (rented VM) -> container -> function: https://read.acloud.guru/the-evolution-from-servers-to-functions-21833b576744  
Focus on your business logic and delegate the rest to AWS. Who knows (and who cares) how its really implemented? Are you in the business of creating your private cloud/Hadoop or solving business problems that make $$$?  
Can mention Pub/Sub and object notifications  


# What is AWS?
AWS is really supply chain maximization, squeezing out efficiencies from elementary pieces: compute, storage, and networking. With these Lego blocks, they can create managed services that are just a copy of a Apache/open-source software and charge for it. Overall win-win scenario. Like Costco or Trade Joe model 
Show all the logos of services from AWS or GCP. Show the open source version and then AWS service  

# Extra Resources
Raymond Hettinger, Keynote on Concurrency, PyBay 2017: https://www.youtube.com/watch?v=9zinZmE3Ogk  
https://pybay.com/site_media/slides/raymond2017-keynote/threading.html  
https://glyph.twistedmatrix.com/2014/02/unyielding.html  