# Introduction to Deep Learning Systems

## Why study deep learning?

**Deep Learning Systems** (DLS) solved problems considered hard prior to 2010, e.g. obtaining superhuman / SOTA scores on tasks and challenges such as [ImageNet](https://www.image-net.org/challenges/LSVRC/), [CASP](https://en.wikipedia.org/wiki/CASP), and [Go (board game)](https://en.wikipedia.org/wiki/Computer_Go) $^{[1]}$. Later, in the early 2020s, unprecedented progress in text & image generation were made with models like GPT-3 and Stable Diffusion: 

[1] Game tree complexity of $10^{360}$ at 250 moves over 150 move games.

<img src="img/01-0.png">

<img src="img/01-1.png">

### Reason #1. To build deep learning systems

Despite the dominance of deep learning libraries and TensorFlow and PyTorch, the
playing field in this space is remarkably fluid (see e.g., recent emergence of JAX). 
You may want to work on developing existing frameworks (virtually all of which are
open source), or developing your own new frameworks for specific tasks.

DLS is not just for the "big players":

<img src="img/01-2.png">

**Controversial claim.** The
single largest driver of
widespread adoption of deep
learning has been the creation of
easy-to-use automatic
differentiation libraries:

<img src="img/01-3.png">


### Reason #2. To use existing systems more effectively 

Understanding how the internals of existing deep learning systems work let you
use them much more efficiently. For example, you can make your custom 
non-standard layer run (much) faster in
TensorFlow / PyTorch by understanding how these
operations are executed. Understanding deep learning systems is a "superpower" that will let you
accomplish your research aims much more efficiently.

|  |  |
|:-: | :-: |
| **2012** | <img src="img/01-6.png"> | 
| **2020s** | <img src="img/01-5.png"> | 
| **2025**+ | <img src="img/01-4.png"> |

### Reason #3: Deep learning systems are fun!

Despite their seeming complexity, the core underlying algorithms behind deep
learning systems (**automatic differentiation** + **gradient-based optimization**) are
extremely simple. Unlike (say) operating systems, you could probably write a “reasonable” deep
learning library in <2000 lines of (dense) code.

The first time you build your automatic differentiation library, and realize you can
take the gradient of a gradient without actually knowing how you would even go
about deriving that mathematically (e.g. complex operations like batch norm, or the gradient of the gradient of a for-loop).

## Elements of deep learning systems

- **Compose** multiple tensor operations to build modern machine learning models
- **Transform** a sequence of operations (automatic differentiation)
- **Accelerate** computation via specialized hardware 
- **Extend** more hardware backends, more operators 