# Introduction To Tensorflow NLP

## Topics covered in this section

* Typical RNN Architecture (220-223)
* Download Sample Data (224)
* Visualize Text Data (225)
* Train and Test Data Split - (226)
* Converting text into numbers using tokenization - (227 - 229)
* Turning our tokenized text into an embedding (230)
* Modelling Experiments (231)
  * Model_0 - Naive Bayes with TF-IDF encoder (232)
    * Custom Function to visualize Machine Learning Metrics (233)
  * Model_1 - Feed Forward Neural Network (234)
    * Visualize Learned Embeddings (235)
  * Overview Of RNN (236)
  * Model_2 - LSTM RNN Model (237)
  * Model_3 - GRU RNN Model (238)
  * Model_4 - Bidirectional RNN Model (239)
  * Model_5 - 1D Convolution Neural Network for Text and Sequences (240 - 241)
  * Tranfer Learning for NLP (242)
  * Model_6 - Build, Train & Evaluate transfer learning model for NLP (243)
      * Common Ways to improve Model Performance (244)
  * Model_7 - Build, Train and Evaluate transfer learning with 10% Data (244 -245)
      * Fix data leakage issue (246)
* Comparing Modelling Experiments Evaluation Metrics (247)
* Upload training logs to tensorboard and compare (248)
* Saving and loading a trained NLP model (249)
* Visualize the most wrong predictions (250 - 251)
* Visualize prediction on test dataset (252)
* Speed/Score trade off (253)

## RNN Architechture - (220 - 223)
![RNN Architechture](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/06-NLP/rnn-architecture.png)


## Download Sample Data - 224

* Link to data set https://storage.googleapis.com/ztm_tf_course/nlp_getting_started.zip
* Write a function to get the above data set.
* Unzip the zip file.

In [1]:
# write code to download and extract data

## Visualize Data - 225

* Load the data as pandas dataframe.
* Shuffle training data frame
* Check class/label counts on dataset
* Write a function to randomly print 5 samples with target label and text message

In [2]:
# write code for visualize data

## Train and Test Data Split - (226)

* Scikit learn train_test_split

In [3]:
# Write code to split data into train and validation data set

## Converting text into numbers using tokenization - (227-229)

* Tokenization vs Embedding
* https://www.tensorflow.org/text/guide/word_embeddings
* https://tfhub.dev/s?module-type=text-embedding
* http://jalammar.github.io/illustrated-word2vec/

In [4]:
# Write code for text vectorization using tensorflow
# Use tensorflow text vectorization adapt function on our training data.

## Turning our tokenized text into an embedding (230)

* Use tensorflow embedding layer

In [5]:
# write code for embedding

## Modelling Experiments (231)

|  Experiment No| Model  |
|--|--|
| 0| Naive Bayes with TF-IDF encoder|
| 1| Feed Forward neural network (Dense Model)|
| 2| LSTM (RNN)|
| 3| GRU (RNN) |
| 4| Bi-Directional - LSTM (RNN)|
| 5| 1D Convolution Neural Network|
| 6| Tensorflow Hub Pretrained Feature Extractor|
| 7| Tensorflow Hub Pretrained Feature Extractor(10% Data)

### Model_0 - Naive Bayes with TF-IDF encoder (232)

* Reference
  * https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

In [6]:
# use scikit learn for text classficiation

#### Write Custom Function to visualize Machine Learning Metrics (233)

![RNN Architechture](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/06-NLP/metrics.jpg)

Reference - https://scikit-learn.org/stable/modules/model_evaluation.html

In [7]:
# write a generic function for the metrics

### Model_1 - Feed Forward Neural Network (Dense Model) (234)

* Use Tensorboard callback
* Use ModelCheckpoint callback

In [8]:
# write code to create a feed forward neural network

#### Visualize Learned Embeddings (235)

* https://projector.tensorflow.org/
* https://www.tensorflow.org/text/guide/word_embeddings

## Overview Of RNN (236)

![RNN Architechture](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/06-NLP/rnn-arch.jpg)

* https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=3&ab_channel=AlexanderAmini
* https://colah.github.io/posts/2015-08-Understanding-LSTMs/
* http://karpathy.github.io/2015/05/21/rnn-effectiveness/

## Model_2 - LSTM RNN Model (237)

In [9]:
# Write code for LSTM RNN Model

## Model_3 - GRU RNN Model (238)

In [10]:
# Write code for LSTM RNN Model

## Model_4 - Bidirectional RNN Model (239)

In [11]:
# Write code for Bi-Directional RNN Model

## Model_5 - 1D Convolution Neural Network for Text and Sequences (240 - 241)

### Architecture
![RNN Architechture](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/06-NLP/cnn-architecture-text-sequence.jpg)

Reference - https://poloclub.github.io/cnn-explainer/

In [12]:
# Write code for Conv1D Neural Network

## Tranfer Learning for NLP (242)

### Architecture

![RNN Architechture](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/06-NLP/nlp-feature-extraction.jpg)

Reference - https://tfhub.dev/google/universal-sentence-encoder/4

In [14]:
# write code for sample sentence encoding using pre-trained model above.

## Model_6 - Build, Train & Evaluate transfer learning model for NLP (243)

In [13]:
# write code to load universal sentence encoder and apply to our data set.

### Common Ways to improve Model Performance (244)

* Adding New Layers
* Increase Number Of hiddent units.
* Change Activation Function.
* Change Optimizer Function.
* Change the learning rate.
* Fitting on More Data
* Fitting for longer.

## Model_7 - Build, Train and Evaluate transfer learning with 10% Training Data (244 - 245)

* Write code to sample 10 % training data from original training data.
* Use tf.keras.models.clone_model to recreate model_6 as model_7

In [15]:
# Write code to sample 10 % training data from original training data.

# Write code to build, train and evaluate transfer learning model using USE pretrained model

### Fix data leakage issue (246)

In [16]:
# Write code to fix data issue bcos with 10 % training data the accuracy is higher than with 100 % training data

### Comparing Modelling Experiments Evaluation Metrics (247)

* Put all model results in a pandas dataframe
* Visualize accuracy/precision/recall/f1 across all models.

In [17]:
# write code to visualize results.

## Upload training logs to tensorboard and compare (248)

* https://tensorboard.dev/

In [18]:
# Write code to upload all logs dir

## Saving and loading a trained NLP model (249)

* https://www.tensorflow.org/tutorials/keras/save_and_load 
* Save Model_6

In [20]:
# TODO

## Visualize the most wrong predictions (250 - 251)

* If our best model is not performing well, then what examples it is getting wrong?
* Out of wrong prediction, which predictions are most wrong meaning high prediction probablity of wrong class.

In [None]:
# TODO

## Visualize prediction on test dataset (252)

In [21]:
# TODO

## Speed/Score trade off (253)

In [None]:
# TODO