# `metadsl`: seperating API from execution

## Three stories



### Compatability / Flexability

#### "Moore's law is dying"?

[Moore's law](https://en.wikipedia.org/wiki/Moore%27s_law): # transistors doubles every two years


Denard scaling: As transistors get smaller, power stays proportionate to area


> The exponential processor transistor growth predicted by Moore does not always translate into exponentially greater practical CPU performance. Since around 2005â€“2007, **Dennard scaling appears to have broken down, so even though Moore's law continued for several years after that, it has not yielded dividends in improved performance**.[40][178] The primary reason cited for the breakdown is that at small sizes, current leakage poses greater challenges, and also causes the chip to heat up, which creates a threat of thermal runaway and therefore, further increases energy costs.[40][178]
> 
> The breakdown of Dennard scaling **prompted a switch among some chip manufacturers to a greater focus on multicore processors**, but the gains offered by switching to more cores are lower than the gains that would be achieved had Dennard scaling continued.[179][180] In another departure from Dennard scaling, Intel microprocessors adopted a non-planar tri-gate FinFET at 22 nm in 2012 that is faster and consumes less power than a conventional planar transistor.[181] 


= [Koomey's Law](https://en.wikipedia.org/wiki/Koomey%27s_law): computations per joule double every 1.57 years

> By the second law of thermodynamics and Landauer's principle, irreversible computing cannot continue to be made more energy efficient forever. As of 2011, computers have a computing efficiency of about 0.00001%.[13] **Assuming that the energy efficiency of computing will continue to double every 1.57 years, the Landauer bound will be reached in 2048**. Thus, after about 2048, Koomey's law can no longer hold. 




#### Von Neuman Architecture

![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/Von_Neumann_Architecture.svg/1920px-Von_Neumann_Architecture.svg.png)

Sequential instructions executed and memory access.


> A **von Neumann language is any of those programming languages that are high-level abstract isomorphic copies of von Neumann architectures**.[1] As of 2009, **most current programming languages fit into this description**[citation needed], likely as a consequence of the extensive domination of the von Neumann computer architecture during the past 50 years. 



#### End of von neuman

We see these two things: hardware around von neuman architecture is hitting a wall in terms of improving performance while staying in that architecture.

Programming languages are built around the assumption that this is our underlying architecure.


John Backus in his 1977 ACM Turing Award lecture (emphesis mine):

> Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of words back and forth through the von Neumann bottleneck. Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is **an intellectual bottleneck that has kept us tied to word-at-a-time thinking instead of encouraging us to think in terms of the larger conceptual units of the task at hand**. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant data itself, but where to find it.

The issue with von neuman is not a hardware issue, but a thought, a software design issue. And we are seeing this play out in real today.


#### NumPy and Python

Both are von neuman based. 

CPython: you build statements and that corresponds to a number of underlying CPU instructions.

NumPy: There are too many instructions executed for scientific computing. We can write specialized subroutines of some instructions and call them in blocks, as long as our data is in the right form.

We still have the preservation of von neuman. Why? We have some fixed blocks of memory and we iterate sequentally though instructions to modify them.

This was working great! Programmers could optimize these chunks (LaPack, fortran, C, Cython) and hardware could improve underlying speed, leading to improved performance year over year.

But this is a local minima.


#### Deep Learning

Tensorflow came out of distributed system engineering. So they already had broken down hte von neuman archicture. Compare the approaches. They build up some large chunk of computations at runtime, then compile them and execute them as a whole. This is precicely the "larger conceptual units of the task at hand" that John Backus was talking about all the way back in 1977.

Why do they need this? Because they are targeting heterogenous hardware, like TPUs, GPUs, browsers, where we cannot depend on a von neuman execution model.

#### "Larger Conceptual Units"

What are these? Well we are in Python. Let's look at the different conceptual units provided by different backends:




### Optimization

### Introspectionm