## GPU History and Evolution: From Pipelines to Accelerators

A GPU is:
* Specialized hardare for graphics rendering
* Optimized for floating point transformations on 
    * 3-d models (representations of physical objects)
    * 2-d dense arrays (presentation) 
    
The graphics pipeline evolved from customized hardware for 3-d rendering onto 2-d planes for visualization.  The translation of the representation of a physical system into monitor/screen views.
  * Input: vertexes and primitves, lighting parameters, transformations
  * Output: 2-d image for display
  
The pipeline was fixed function transforming models into images.
The user prepared and loaded data then invoked the pipeline with tools like DirectX or OpenGL.
The pipeline had phases:
* Command:
  * triangulate polygons
  * prepare vertex data streams
* Geometry: vertex operaations
  * model transformations
  * view transformations
  * vertex-based lighting
  * perspective transformations
* Rasterization: 3-d to 2-d
  * map triangle
  * interpolate vertex attributes
  * clipping
  * create fragments
* Texture: 
  * lookup and apply texture
  * perform texture filtering
* Fragement:
  * lighting
  * anti-aliasing
  * alpha blending
  * image effects
* Display: convert fragments to pixels
  * depth buffer test
  * stencil buffer test
  * accumulation
  * output
  
<img src="http://romain.vergne.free.fr/teaching/IS/imgs03/pipeline-v1-06.png" title="Pixel Processing in OpenFL" />

* Hardware Evolution
  * CPU not fast enough to render data to screen
    * goal was 20-60 frames per second
    * customized HW developed
  * Early pipelines had each function in separate hardware
    * pipeline parallelism
  * Hardware evolved to two units
      * 3-d vertex processing
      * 2-d raster proessing
  * Then became programmable
    * texture and blending operations in 2-d
  * As of 2003
  
<img src="./images/gpupipeline.png" width=512 />
    
_Observations_: 
  * pipeline is unidirectional: from input to screen buffer
  * 'texture' memory is read only (by pipeline)

### To GPGPU

This hardware (in the fragment unit) had the capability to do:
  * dense floating point operations on arrays with low power
  * for computations that were simple
    * no recursion
    * no stack
    * limited branching

Clever scientists started hacking code onto GPUs.  This was:

__GPGPU__ general purpose computing on graphics processing units.

## The Advent of CUDA

* GPGPU had limitations:
  * poor utilization: only fragment unit
  * hard to program in the graphics API
  * no data reuse: pipeline went one direction
* nVidia normalized hardware and made more programmable
  * vertex, geometry and fragment unit are now jobs schedule on one architecture
  * create unified programmable framework for general purpose computations
  
### CUDA == Compute Unified Device Architecture

* CUDA is the whole thing:
  * hardware architecture
  * API to interface GPU
  * toolkit to develop general purpose code
* Initial support for general computations
  * random memory access
  * integer arithmetic


### From CUDA TO CUDA 11

CUDA has become a general purpose processing framework that has maintained the value proposition of GPUs, while becoming easier to program.
  * Good for FP/Integer dense computing
  * Low power per FLOP
  * Limited support for branching, recursion, speculation, etc.
  
The restrictive programming model goes back to its GPU history. But, it has been preserved because GPUs inherent benefits from:
  * little per core caching
  * no managed cache
  * large register files
  * no branch predication, speculative execution
GPUs hit a different design point than CPUS
  * the reason that repurposing GPUs was attractive is why they have stayed largely the same (from a high-level archictecture standpoint)
  