# Spark + Distributed Deep Learning Training
<br>
* The Challenge
* Outside of Spark
* With Spark
* Transfer Learning

### The Challenge

* Multiple modes of distributed DL training
  * Sync vs Async SGD, Parameter server?, AllReduce?
  * How much flexibility is the "right" amount to expose?
* Scheduling and cluster resources
  * Reserving GPUs
  * Placing tasks
  * Timing: Spark scheduler model vs MPI model
  
### The Upshot

1. It's easy(ish) to do a brute-force training using the old Spark scheduling approach, but that's not performant.
2. State-of-the art, fast training mechanisms (e.g., Horovod) need different cluster and scheduling considerations.
3. In the past, these integrations have been dicey...
4. Databricks' MLRuntime and HorovodEstimator serve as a proof-of-concept for a nice future solution

### Outside of Spark

Wha? For now, it's actually the most straightforward path!

A workflow orchestrator like Airflow can help you sequence
* ETL/Featurization (Spark) work 
* Horovod-based distirbuted DL training
* Resume Spark data processing pipeline

It's not perfect ... there can be problems ensuring the proper cluster resources are available at the right time, while maintaining high utilization.

### Options For Training Full Models under Spark's Control/Scheduling

__Present__
* Intel BigDL
  * CPU focus
  * Leverages Xeon Phi / Skylake coprocessors
  * Some performance questions
  * May be easier for enterprises to buy/provision (as compared to NVidia datacenter GPUs)
* DeepLearning4J
  * JVM based, Spark integration
  * GPU support
* TensorFlowOnSpark
  * CPU/GPU/Infiniband support
  * Doesnt' really integrate with Spark APIs/Patterns
* Microsoft MMLSpark (OSS)
  * Leverages CNTK
  * MSFT Research distributed training: https://arxiv.org/pdf/1804.04031.pdf
  
* A number of others with smaller communities or little activity
* Databricks MLRuntime + TensorFlow + Horovod
  * https://docs.databricks.com/applications/deep-learning/distributed-deep-learning/horovod-estimator.html

__Future__
* &#x1f44d; __*Future (2019): Apache Spark (Barrier Mode, MPI) + Horovod + TensorFlow*__
  * Open-source Spark (Project Hydrogen) is addressing this: https://www.youtube.com/watch?v=vVZwzG7uKvI
* Related work includes
  * Alchemist (UCBerkeley), with presentations here at ODSC https://arxiv.org/abs/1806.01270
  * Spark-MPI (Intel/Brookhaven) https://arxiv.org/pdf/1806.01110.pdf

__Transfer Learning__ 

Transfer learning involves using an existing, already-trained neural network as part of a new model.

For example, it is common to use large chunks of pre-trained image recognition networks as "feature extractors" for new image recognition tasks.

Any of the listed tools -- or your own Python-based model, used as a pipeline step (Transformer) with a vectorized PandasUDF call -- can help you implement transfer learning patterns.

In addition, the following tools have APIs and examples designed to simplify this process:

* Databricks - Spark Deep Learning Pipelines (OSS)
* Microsoft - MMLSpark

# Wrapup

## Q & A