# 1c. Introduction: Why PyTorch?

## Immediate vs deferred execution
* usually, the Python interpreter executes eagerly
* functions can be used to defer execution
* thanks to operator overloading, an expression can look like it will be executed eagerly, but it is deferred
 
```python
a = InputParameterPlaceholder()
b = InputParameterPlaceholder()
c = (a**2 + b**2) ** 0.5 # __add__ and __mul__ are overloaded
callable(c) # True # c is some execution graph
```

* that way
    * the computation is not recorded as Python byte code operation
    * but instead it is compiled into a static graph that can be converted into optimized (fused) machine code
* this has two downsides:
    1. cannot use python tools for debugging
    2. cannot mix with python flow control (e.g. if statements etc)
    
### Static vs. dynamic graph
* static graph/ deferred execution: builds up the entire computation as a graph
* dynamic graph/ eager execution: an operation is not aware of the following operation
    * 'define by run' 
    * dynamic is not better per se, but it's easier to accomplish looping or conditional behavior with dynamic graphs as it integrates better with python control flow
* `PyTorch` eager by default, static graph capabilities added by `TorchScript` in order to optimize production builts
* `TensorFlow` only static in TF1; since TF2, eager by default

## Framework history
### TF
* TF2 is eager by default
* robust pipeline to production
* widely used in industry
* massive mind share
      
### PyTorch
* consumed Caffe2 backend
* replaced low-level code from Lua-based Torch project
* support for ONNX
* delayed execution graph mode runtime: TorchScript
* widely spread in research and education


**Overall**, TF and PyTorch are converging

# PyTorch has the batteries included
* written in C++ and CUDA, can be run from C directly
* essential building blocks: 
    * *Tensor* 
        * Tensors with *native support for backpropagation*: tensors remember what operations has been executed on them
    * additional support due to *torch.autograd*
* Makes pytorch attractive for use cases beyond NN: physics, rendering, optimization, simulation, modeling

## Modules
* for training, one needs to *source data*, *run an optimizer*, *hardware delegation*
* `torch.util.data` :: provides 
    * Dataset :: bridges between custom data and *Tensor*
    * Dataloader :: spawns child processes to load data in the background
* hardware delegation with help of `torch.nn.DataParallel` and `torch.distributed`
* `torch.optim` :: for learning
* TorchScript :: delegating computations to C++/ Cuda is efficient but incurs a small overhead cost in the python layer each time
    * this can add up if repeated often
    * TorchScript enables deferred execution, which serializes an instruction set that can be invoked independently from python
    * like a virtual machine with instruction set for tensor operations
    * also just-in-time transformation of sequences of known operations into fused operations
    
## Ecosystem
PyTorch is not 'only' a single `torch` library but it offers a lot of additional modules, libraries and tools designed for variety of use cases. Some of the most interesting parts of it include:

- [fastai](https://docs.fast.ai/) - "training fast and accurate neural nets using modern best practices".
- [ParlAI](https://parl.ai/) - "sharing, training and evaluating dialogue models across many tasks".
- [Ray](https://github.com/ray-project/ray) - "for building and running distributed applications".

More about PyTorch ecosystem can be found in [PyTorch documentation](https://pytorch.org/ecosystem/).
