# Advanced Deep Learning Topics

### * Word2Vec
### * Sequence Models
### * Autoencoders
### * Variational Autoencoders
### * GANs

## Word2Vec

* What even is meaning???

In [5]:
from sklearn.feature_extraction.text import CountVectorizer
sentence = 'the dog had puppies and the cat had kittens'
bow = CountVectorizer().fit([sentence])
sentence_transformed = bow.transform([sentence])

<img src = "./resources/dog_cat.png">

**What about TF-IDF??**

TF-IDF is an attempt to get some more meaning, but it is unable to do anything more than determine which words are important. It does not actually give us any insight into the meaning of words in relation to one another.

### What if we were able to create a model that would be able to determine how words related to one another??

<img src="./resources/vec-relationships.png">

### What's happening under the hood




#### Training the model
* Every word will have a dense vector
Skip-Grams
* Predict context words given a certain target (position independent)
* For each word, what is the probability of the other words, with a radius *m*
* Use softmax to obtain the probability of word *c* to obtain other words *o*


$ p(context|word_{t})= ....$  
We are minimizing a loss function $ J = 1-p(w_{-t}|w_{t}) $


<img src = "./resources/skip_grams.png">

<img src="./resources/softmax-nplm.png">

##### How we can best make use of it?

Use transfer learning of GloVe!! https://nlp.stanford.edu/projects/glove/  

**Glo**bal **Ve**ctors of word representation  

cool visual example of words in space: https://projector.tensorflow.org/

analogies with word2vec: http://bionlp-www.utu.fi/wv_demo/

In [None]:
### what Word2Vec code looks like

from gensim.models import Word2Vec
model = Word2Vec(data, size=100, window=5, min_count=1, workers=4)
model.train(data, total_examples=model.corpus_count, epochs=10)





## Sequence Models (Recurrent Neural Networks)

### Applications

* Machine Translation
* Time series predictions
* Speech recognition
* Music composition (https://soundcloud.com/rapping_neural_network/networks-with-attitude)
* Rhythym Learning

### What's happening under the hood
BackPropagation Through Time: http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf
<img src = "./resources/Recurrent_neural_network_unfold.svg">
<img src = "./resources/unfolded.png">

##### Wow this is going to be a lot of different layers, especially if we have numerous recurrent nodes. Can you foresee any issues with this?
#### LSTM

<img src ="./resources/Long_Short-Term_Memory.svg">

* Input Gate: Determines how much of the cell state that was passed along should be kept
* Forget Gate: Which determines how much of the current state should be forgotten
* Output Gate: Which determines how much of the current state should be exposed to the next layers of the network


In [None]:
## LSTM in code
lstm_model = Sequential()
lstm_model.add(Embedding(20000, 128))
lstm_model.add(LSTM(50, return_sequences=True))
lstm_model.add(GlobalMaxPool1D())
lstm_model.add(Dropout(0.5))
lstm_model.add(Dense(50, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add(Dense(20, activation='softmax'))

#### GRU (Gated Recurrent Unit)

<img src = "./resources/Gated_Recurrent_Unit.svg">

* Reset Gate: Determines what should be removed from the cell's internal state before passing itself along to the next time step.
* Update Gate: Determines how much of the state from the previous time step should be used in the current time tep


In [None]:
# GRU Model in code

gru_model = Sequential()
gru_model.add(Embedding(20000, 128))
gru_model.add(GRU(50, return_sequences=True))
gru_model.add(GlobalMaxPool1D())
gru_model.add(Dropout(0.5))
gru_model.add(Dense(50, activation='relu'))
gru_model.add(Dropout(0.5))
gru_model.add(Dense(20, activation='softmax'))

#### Which is better????????
* it depends on the context!!

### Bidirectional Sequence Models

Work extremely well for NLP tasks. Within each recurrent layer, half of the neurons will move from forward to in the sequence to the end of the sequence. The other half of the neurons will move in reverse! The model will then use a formula to combine the results of both the outputs from the forward-in-time and backward-in-time neurons at each time step. We can choose this formula as one of the hyperparameters of keras class:  

Bidirectional(merge_mode= )

<img src ="./resources/bidirectional.png">

In [None]:
model.add(Bidirectional(LSTM(10, return_sequences=True),
                        input_shape=(5, 10)))
model.add(Bidirectional(LSTM(10)))


## Autoencoders

A method of unsupervised (or semi-supervised) learning with a neural network. It is useful for:  
* feature extraction and dimensionality reduction
* denoising images

<img src="./resources/enhance.gif">

Three main components:  
1. Encoder:  Takes the original input and compresses it into a latent space representation.  
2. Code:  The compressed version of the original input.  
3. Decoder: Decodes the coded data back to an approximation of the original input.  

### What's happening under the hood

<img src ="./resources/autoencoder.png">

Hyperparameters to tune:
* Number of layers
* Number of nodes per layer: typically the number of nodes ends up forming an hour-glass shape and is symmetrical
* Loss function: mean square error or binary crossentropy



In [None]:
# IDentify input and encoding dimensions
input_dim = x_train.shape[1] # input dimension = 784
encoding_dim = 32

# Calculate the compression factor
compression_factor = float(input_dim) / encoding_dim
print("Compression factor: %s" % compression_factor)

# Build the autoencoder model 
autoencoder = Sequential()
# Encoder Layer
autoencoder.add(Dense(encoding_dim, input_shape=(input_dim,), activation='relu'))
#Decoder Layer
autoencoder.add(Dense(input_dim, activation='sigmoid'))

# Show model summary
autoencoder.summary()


#### Denoising Autoencoder

<img src ="./resources/denoising_mnist.jpg">

How it's done:

In [None]:
# Create noisy train and test datasets
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)

# Clip the datasets to ensure data integrity
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

# Input dimensions
x = Input(name='inputs', shape=input_shape, dtype='float32')
    
# Encoder model 
encoder = Dense(32, activation='relu')(x)
    
# Decoder model 
decoder = Dense(input_shape[0], activation='sigmoid')(encoder)
    
# Print network summary
DAE = Model(inputs=x, outputs=decoder)
DAE.summary()


# Compile the model with given parameters
batch_size = 128
epochs = 30
DAE.compile(optimizer='adam', loss='binary_crossentropy')

# Fit the data 
DAE.fit(x_train_noisy, x_train, 
        epochs=epochs, 
        batch_size=batch_size,
        shuffle=True, 
        validation_data=(x_test_noisy, x_test))

# Predict images
decoded_imgs = DAE.predict(x_test_noisy)

#### Convolutional AutoEncoder

An autoencoder that works even better for images! Same idea as autoencoder, except the filters are being updated rather than the weights

Refresher on Convolutions in CNNs

<img src = "./resources/convolution.gif">


How does this work in the context of ConvAutoEncoders???

<img src ="./resources/conv-autoencoder.png">

## Variational AutoEncoders

<img src = "./resources/faces_transformed.jpg">

#### Why not standard AutoEncoders?

* Autoencoders are work incredibly well for recreating images, but they lack the ability to produce anything other than recreating the original image
* The latent space created by standard Auto Encoders is frequently discrete in nature.

<img src="./resources/clus_encoded.png">


What if we wanted to try to estimate the areas where there was no data? In order to do so, we can sample from continuous distributions.
<img src="./resources/vae1.png">

## GANs (Generative Adversarial Network)

Pits a generator and a discriminator against each other. Frequently, are components within GANs.

<img src="./resources/GANs.png">

#### Deep Fakes

<img src="./resources/deep_fake.webp">

#### Style Transfer
<img src="./resources/style_transfer.jpg">

Cool applications: https://affinelayer.com/pixsrv/  
Learn more about GANs: https://www.youtube.com/watch?v=9JpdAg6uMXs&t=1s

## So if you really like deep learning stuff and you don't want to get a PhD......

Industry has a need for people involved in Aritifical Intelligence without spending 5 years getting a PhD. Check out these new residency programs/fellowships:

* https://github.com/dangkhoasdc/awesome-ai-residency

Quote from the Microsoft AI Residency:

"We are searching for a diverse range of researchers, engineers, and applied scientists with **unique perspectives, including candidates who may not have a traditional background in AI, but who are passionate about working on AI technologies to solve real-world challenges.**"


More resources: https://github.com/terryum/awesome-deep-learning-papers#understanding--generalization--transfer