<h1 align="center">Deep learning with TensorFlow in Azure PART 1</h1>
<h1 align="center">Introduction to Deep Learning</h1>
<h1 align="center">Meetup DFW Data & AI - Microsoft</h1>
## Setting Up Environment 

### 1) Deploy Linux Data Science VM in Azure - Ubuntu version with GPU

In order to complete this notebook, you must deploy a Linux DSVM in your azure subscription. Click [HERE](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu), then click on GET IT NOW.

**In Basics blade:**<br>
**Name:** meetupdsvmgpu <br>
**VM Disk type:** HDD<br>
**Username:** sshuser<br>
**Password:** Passw0rd.1!!<br>
**Resource Group:** meetupdsvmgpu_rg <br>
**Location:** Pick among East US, North Central US, South Central US or West US 2<br>

**In Size blade:
Size:** NC6 (if you want GPU), or D4_V3 (if you want CPU)

**In Settings blade:**
Leave as is


[The data science VM](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-virtual-machine-overview) can be used for training model using deep learning algorithms on GPU (Graphics processing units) based hardware. Utilizing VM scaling capabilites of Azure cloud, DSVM helps you use GPU based hardware on the cloud as per need. One can switch to a GPU based VM when training large models or need high speed computations while keeping the same OS disk. The Windows Server 2016 edition of DSVM comes pre-installed with GPU drivers, frameworks and GPU version of the deep learning algorithms. On the Linux, deep learning on GPU is enabled only on the Data Science Virtual Machine for Linux (Ubuntu) edition. You can deploy the Ubuntu/Windows-2016 edition of Data Science VM to non GPU based Azure virtual machine in which case all the deep learning frameworks will fallback to the CPU mode. 

### 2) SSH into the VM and git clone the meetup repo

```
> ssh sshuser@YOUR.VM.IP.ADDRESS

> cd notebooks

> git clone https://github.com/pablomarin/Meetups-Data-AI-DFW.git

> sudo ln -s /anaconda/envs/py35/bin/pip /usr/bin/pip3

> sudo pip3 install tqdm
```

### 3) Open the Jupyter notebook from your VM on your local browser

> https://YOUR.VM.IP.ADDRESS:8000 <br>
> Login with your VM username and password<br>
> Go to the ***Meetups-Data-AI-DFW folder***<br>
> Open the Notebook:***Meetup8-DeepLearningTensorFlowinAzure-Part-1.ipynb***<br>

## PART 1 - INTRO TO DEEPLEARNING IN AZURE USING TENSORFLOW
**Disclaimer:** Most of the below information was originally created by Udacity

I'm going to divide this talk into three sections:
1. WHAT - What is Deep Learning
2. WHERE - Where I can run Deep Learning
3. HOW - How do I run Deep Learning

### 1) WHAT  - What is Deep Learning?

![img](image/machine-learning-deep-learning-and-data-analysis-introduction-4-638.jpg)

![img](image/ai-ml-dl.jpg)

### My Personal Definition:
DeepLearning = ( Matrix Multiplication + Gradient Descent ) * Multiple times back and forth on Vast amount Data using GPUs

### 6 Basics concepts to know to understand Deep Learning (Neural Networks with many layers):
1. Perceptron (neuron)
<a href="http://www.youtube.com/watch?feature=player_embedded&v=Mqogpnp1lrU" target="_blank"><img src="http://img.youtube.com/vi/Mqogpnp1lrU/1.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>
2. Gradient Descent
<a href="http://www.youtube.com/watch?feature=player_embedded&v=7sxA5Ap8AWM" target="_blank"><img src="http://img.youtube.com/vi/7sxA5Ap8AWM/0.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>
3. Multilayer Perceptron
<a href="http://www.youtube.com/watch?feature=player_embedded&v=Rs9petvTBLk" target="_blank"><img src="http://img.youtube.com/vi/Rs9petvTBLk/0.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>
4. Backpropagation
<a href="http://www.youtube.com/watch?feature=player_embedded&v=MZL97-2joxQ" target="_blank"><img src="http://img.youtube.com/vi/MZL97-2joxQ/0.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>
4. Deep vs Wide Architecture
<a href="http://www.youtube.com/watch?feature=player_embedded&v=CsB7yUtMJyk" target="_blank"><img src="http://img.youtube.com/vi/CsB7yUtMJyk/0.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>
4. Regularization & Dropout
<a href="http://www.youtube.com/watch?feature=player_embedded&v=6DcImJS8uV8" target="_blank"><img src="http://img.youtube.com/vi/6DcImJS8uV8/0.jpg" alt="IMAGE" width="240" height="180" border="10" /></a>

#### A neural network using only Numpy. To solve the college admission classification problem on 1. before

In [1]:
import numpy as np
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

np.random.seed(21)

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


# Hyperparameters
n_hidden = 2  # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

for e in range(epochs):
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    for x, y in zip(features.values, targets):
        ## Forward pass ##
        # Calculate the output
        hidden_input = np.dot(x, weights_input_hidden)
        hidden_output = sigmoid(hidden_input)

        output = sigmoid(np.dot(hidden_output,
                                weights_hidden_output))

        ## Backward pass ##
        # Calculate the network's prediction error
        error = y - output

        # Calculate error term for the output unit
        output_error_term = error * output * (1 - output)

        ## propagate errors to hidden layer

        # Calculate the hidden layer's contribution to the error
        hidden_error = np.dot(output_error_term, weights_hidden_output)

        # Calculate the error term for the hidden layer
        hidden_error_term = hidden_error * hidden_output * (1 - hidden_output)

        # Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output
        del_w_input_hidden += hidden_error_term * x[:, None]

    # Update weights
    weights_input_hidden += learnrate * del_w_input_hidden / n_records
    weights_hidden_output += learnrate * del_w_hidden_output / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Train loss:  0.25135725242598617
Train loss:  0.24996540718842886
Train loss:  0.24862005218904654
Train loss:  0.24731993217179746
Train loss:  0.24606380465584848
Train loss:  0.24485044179257162
Train loss:  0.2436786320186832
Train loss:  0.24254718151769536
Train loss:  0.24145491550165465
Train loss:  0.24040067932493367
Prediction accuracy: 0.725


### 2) WHERE - Azure as the Infrastructure for Deep Learning
We just created before a multi-layer perceptron using just Numpy to solve a very small classification problem.
The reality though, is that problems are much more complex, and the accuracy required must be equal or better than human standard. In order to achieve this, we need three main things: a Massive compute and storage platform with GPUs (like Azure),  a Deep Learning framework/library like CNTK or TensorFlow, and a lot of Data.

As of August 2017, This is what Microsoft Azure has to offer regarding Deep Learning

![img](image/Azure-GPU-Roadmap.PNG)

![img](image/GPU-NCSeries.PNG)

![img](image/GPU-NDSeries.PNG)

![img](image/DSVM-DeepLearning.PNG)

![img](image/Msft-Cognitive-Services-API.PNG)

![img](image/Azure-Batch-AI.PNG)

### 3) HOW - Using TensorFlow (A Deep Learning backend library) in Azure DSVM ![tf](https://avatars-05.gitter.im/group/iv/3/57542c04c43b8c601976f1a5)

Throughout this lesson, you'll apply your knowledge of neural networks on real datasets using [TensorFlow](http://tensorflow.org), an open source Deep Learning library.
You’ll use TensorFlow to classify images from the notMNIST dataset - a dataset of images of English letters from A to J.

But before we start classifing images, let's first understand a little bit of Tensorflow. Starting with a Hello World example, and some basic concepts.

### Hello, world!
Try running the following code to make sure you have TensorFlow properly installed.


In [2]:
import warnings
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

# Create TensorFlow objects called tensors
# String Constant
hello_constant = tf.constant('Hello World!')

# Number Constants
x = tf.constant(10)
y = tf.constant(2)

# Vector Constants
logit_data = [2.0, 1.0, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

#Define later-set constants, so you can reuse the same constant name in the code
n = tf.placeholder(tf.int32)
logits = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
output = None

#Define Variables
v = tf.Variable(5)

# Define Functions 
z = tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1), tf.float64)) # z = x/y - 1
softmax = tf.nn.softmax(logits)
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

# Create Session and run commands
with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)
    
    output = sess.run(n, feed_dict={n: 123})
    print(output)
    
    output = sess.run(z)
    print(output)
    
    softmax_data = sess.run(softmax, feed_dict={logits: logit_data})
    print(softmax_data)
    
    output = sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data})
    print(output)

TensorFlow Version: 1.0.0
b'Hello World!'
123
4.0




[ 0.65900117  0.24243298  0.09856589]
0.41703


### 10 API calls to know to understand the basics of TensorFlow:
1. **Tensor**:<br>
In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:<br>
> A is a 0-dimensional int32 tensor<br>
A = tf.constant(1234) <br>
 B is a 1-dimensional int32 tensor<br>
B = tf.constant([123,456,789]) <br>
 C is a 2-dimensional int32 tensor<br>
C = tf.constant([ [123,456,789], [222,333,444] ])<br>

2. **Session**: <br>
TensorFlow’s api is built around the idea of a computational graph. A session is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines.

3. **tf.placeholder()**: <br>
Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you'll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()! <br>
tf.placeholder() returns a tensor that gets its value from data passed to the tf.session.run() function, allowing you to set the input right before the session runs.

4. **Math functions**: <br>
>x = tf.add(5, 2)  # 7 <br>
x = tf.subtract(40, 30) # 10 <br>
y = tf.multiply(1, 5)  # 5 - Element wise multiplication<br>
z = tf.divide(x,y) # 2 <br>
m = [tf.matmul(a,b)](https://www.tensorflow.org/api_docs/python/tf/matmul) # Multiplies matrix a by matrix b, producing a * b (dot product) <br>
s = [tf.reduce_sum()](https://www.tensorflow.org/api_docs/python/tf/reduce_sum) # Computes the sum of elements across dimensions of a tensor<br>
l = [tf.log()](https://www.tensorflow.org/api_docs/python/tf/log) # Computes natural logarithm of x element-wise.

5. **tf.variable()**:<br>
The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where tf.Variable class comes in.<br>
The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.<br><br>
The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.
```
init = tf.global_variables_initializer() <br>
with tf.Session() as sess:<br>
   sess.run(init)<br><br>
```

6. **tf.truncated_normal()**:<br>
The [tf.truncated_normal()](https://www.tensorflow.org/api_docs/python/tf/truncated_normal) function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean. <br>
```
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
```

7. **tf.zeros()**:<br>
The [tf.zeros()](https://www.tensorflow.org/api_docs/python/tf/zeros) function returns a tensor with all zeros.

8. **tf.nn.softmax() and cross-entropy**:<br>
The softmax function squashes it's inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It's the perfect function to use as the output activation for a network predicting multiple classes.<br>
Previously we've been using the sum of squared errors as the cost function in our networks, but in those cases we only have singular (scalar) output values.<br>
When you're using **softmax**, however, your output is a vector. One vector is the probability values from the output units. You can also express your data labels as a vector using what's called **one-hot encoding**.<br> 
This just means that you have a vector the length of the number of classes, and the label element is marked with a 1 while the other labels are set to 0. In the case of classifying digits from before, our label vector for the image of the number 4 would be:
```
y=[0,0,0,0,1,0,0,0,0,0]
```
And our output prediction vector could be something like
```
y^=[0.047,0.048,0.061,0.07,0.330,0.062,0.001,0.213,0.013,0.150].
```
We want our error to be proportional to how far apart these vectors are. To calculate this distance, we'll use the [cross entropy](https://en.wikipedia.org/wiki/Cross_entropy). Then, our goal when training the network is to make our prediction vectors as close as possible to the label vectors by minimizing the cross entropy. The cross entropy calculation is shown below:
```
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))
```

9. **Mini-Batching**:<br>
Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.<br>
Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.<br>
Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)
In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.
```
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
```
What does None do here?
The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.
Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

10. **Epochs**:<br>
An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data.


### Build a Classifier on the notMNIST dataset using the 10 basics Tensorflow API calls above:

Now we have a great library to build neural networks (better and easier than numpy), let's put all the above concepts to use and build a Single Layer (Input and Output) NN to predict/classify images!

Click [here](Meetup8-Lab-notMNIST-Tensorflow.ipynb) to open the notebook.