# WELCOME TO **"TensorFlow for Neural Language Processing" Series**  😁 

TensorFlow makes it easy for beginners and experts to create machine learning models for desktop, mobile, web, and cloud. TensorFlow provides a collection of workflows to develop and train models using Python, JavaScript, or Swift, and to easily deploy in the cloud, on-prem, in the browser, or on-device no matter what language you use.

We will see how we can gain insights into text data and hands-on on how to use those insights to train NLP models and perform some human mimicking tasks. Let’s dive in and look at some of the basics of NLP.
<br/> <br/>
**In this series of 4 project courses, you will learn practically how to build Natural Language Processing algorithms and learn how to create amazing models and build, train, and test Neural Networks in NLP with Tensorflow!** 😎


## 👉🏻 Course 1: Text Embedding and Classification



## 👉🏻 Course 2: Semantic Similarity in Texts

## 👉🏻 Course 3: Sentiment Analysis in Texts

## 👉🏻 Project 4: Text Generation with RNNs

In [None]:
print ("Let's start with Course 1: Word and Text Embeddings")

# WELCOME to this guided project "Text Embedding and Classification"! 😁 
#### This project course is part of "Tensorflow for Natural Language Processing" Series of project courses.<br/><br/>

We will go through 5 tasks to implement our project:<br/><br/>
👉🏻**Task 1:** Overview of the project and  Import the Libraries. <br/><br/>
👉🏻**Task 2:** Analyzing the embeddings. <br/><br/>
👉🏻**Task 3:** Use Embedding in Text Classification. <br/><br/>
👉🏻**Task 4:** Create and Train the model. <br/><br/>
👉🏻**Task 5:** Evaluate the model with Predictions. <br/><br/>

At the end, you will practice an amazing exercise that's related to the project.



## 👉🏻 Task 1: Overview of the Project and Import the Libraries

In this project, you will learn how to use text embeddings for text classification tasks, and you will train and evaluate a text classifier. <br/>

The CORD-19 Swivel text embedding module from TF-Hub was built to support researchers analyzing natural languages text.<br/>

In this project we will:
- Analyze semantically similar words in the embedding space
- Train a classifier on the SciCite dataset using the CORD-19 embeddings ✨

At the end of this project, you will try out an amazing Bonus Exercise! 🤩

In [None]:
!pip install tfds-nightly 
###install tfds-nightly and restart kernel before starting the project

In [None]:
import functools
import itertools
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd

import tensorflow as tf

import tensorflow_datasets as tfds
import tensorflow_hub as hub

from tqdm import trange

## 👉🏻 Task 2: Analyzing the embeddings

Let's start off by analyzing the embedding by calculating and plotting a correlation matrix between different terms. If the embedding learned to successfully capture the meaning of different words, the embedding vectors of semantically similar words should be close together. Let's take a look at some COVID-19 related terms.👀

In [None]:
# Use the inner product between two embedding vectors as the similarity measure
def plot_correlation(labels, features):
  corr = np.inner(features, features)
  corr /= np.max(corr)
  sns.heatmap(corr, xticklabels=labels, yticklabels=labels)

# Generate embeddings for some terms
queries = [
  # Related viruses
  'coronavirus', 'SARS', 'MERS',
  # Regions
  'Italy', 'Spain', 'Europe',
  # Symptoms
  'cough', 'fever', 'throat'
]

module = hub.load('https://tfhub.dev/tensorflow/cord-19/swivel-128d/3')
embeddings = module(queries)

plot_correlation( queries, embeddings )

We can see that the embedding successfully captured the meaning of the different terms. Each word is similar to the other words of its cluster (i.e. "coronavirus" highly correlates with "SARS" and "MERS"), while they are different from terms of other clusters (i.e. the similarity between "SARS" and "Spain" is close to 0).

Now let's see how we can use these embeddings to solve a specific task.

## 👉🏻 Task 3: Use Embedding in Text Classification

### ⭐SciCite: Citation Intent Classification

Now we will see how we can use the embedding for downstream tasks such as text classification. We'll use the [SciCite dataset](https://www.tensorflow.org/datasets/catalog/scicite) from TensorFlow Datasets to classify citation intents in academic papers. Given a sentence with a citation from an academic paper, classify whether the main intent of the citation is as background information, use of methods, or comparing results.😊

In [None]:
builder = tfds.builder(name='scicite')
builder.download_and_prepare()
train_data, validation_data, test_data = builder.as_dataset(
    split=('train', 'validation', 'test'),
    as_supervised=True)

In [None]:
# Let's take a look at a few labeled examples from the training set
NUM_EXAMPLES =  12 ### YOUR CODE HERE - type:"integer"

TEXT_FEATURE_NAME = builder.info.supervised_keys[0]
LABEL_NAME = builder.info.supervised_keys[1]

def label2str(numeric_label):
  m = builder.info.features[LABEL_NAME].names
  return m[numeric_label]

data = next(iter(train_data.batch(NUM_EXAMPLES)))


pd.DataFrame({
    TEXT_FEATURE_NAME: [ex.numpy().decode('utf8') for ex in data[0]],
    LABEL_NAME: [label2str(x) for x in data[1]]
})

## 👉🏻 Task 4: Create and Train the model

### ⭐ Creating the model



We'll train a classifier on the [SciCite dataset](https://www.tensorflow.org/datasets/catalog/scicite) using Keras.  Let's build a model which use the CORD-19 embeddings with a classification layer on top.🦠

In [None]:
# Hyperparameters { run: "auto" }

EMBEDDING = 'https://tfhub.dev/tensorflow/cord-19/swivel-128d/3'  #@param {type: "string"}
TRAINABLE_MODULE = False  # type: "boolean"

hub_layer = hub.KerasLayer(EMBEDDING, input_shape=[], 
                           dtype=tf.string, trainable=TRAINABLE_MODULE)

model = tf.keras.Sequential()
model.add(hub_layer)

### YOURE CODE HERE - Create Dense Layers
model.add(tf.keras.layers.Dense(3))

### YOURE CODE HERE - Print the model summary
model.summary()

model.compile(optimizer='adam', ### YOURE CODE HERE - Choose optimizer
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy']) ### YOURE CODE HERE, Choose the Metrics

### ⭐  Let's train and evaluate the model to see the performance on the SciCite task.😁

In [None]:
EPOCHS =  40 ### YOURE CODE HERE - type: "integer"
BATCH_SIZE = 32 ### YOURE CODE HERE - type: "integer"

history = model.fit(train_data.shuffle(10000).batch(BATCH_SIZE),
                    epochs=EPOCHS,
                    validation_data=validation_data.batch(BATCH_SIZE),
                    verbose=1)

In [None]:
from matplotlib import pyplot as plt
def display_training_curves(training, validation, title, subplot):
  if subplot%10==1: # set up the subplots on the first call
    plt.subplots(figsize=(10,10), facecolor='#F0F0F0')
    plt.tight_layout()
  ax = plt.subplot(subplot)
  ax.set_facecolor('#F8F8F8')
  ax.plot(training)
  ax.plot(validation)
  ax.set_title('model '+ title)
  ax.set_ylabel(title)
  ax.set_xlabel('epoch')
  ax.legend(['train', 'valid.'])

In [None]:
display_training_curves(history.history['accuracy'], history.history['val_accuracy'], 'accuracy', 211)
display_training_curves(history.history['loss'], history.history['val_loss'], 'loss', 212)

## 👉🏻 Task 5: Evaluate the model with Predictions

And let's see how the model performs. Two values will be returned. Loss (a number which represents our error, lower values are better), and accuracy.😉

In [None]:
results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
  print('%s: %.3f' % (name, value))

We can see that the loss quickly decreases while especially the accuracy rapidly increases. Let's plot some examples to check how the prediction relates to the true labels:

In [None]:
prediction_dataset = next(iter(test_data.batch(20)))

prediction_texts = [ex.numpy().decode('utf8') for ex in prediction_dataset[0]]
prediction_labels = [label2str(x) for x in prediction_dataset[1]]

predictions = [label2str(x) for x in model.predict_classes(prediction_texts)]


pd.DataFrame({
    TEXT_FEATURE_NAME: prediction_texts,### YOURE CODE HERE
    LABEL_NAME: prediction_labels,### YOURE CODE HERE
    'prediction': predictions
})

We can see that for this random sample, the model predicts the correct label most of the times, indicating that it can embed scientific sentences pretty well.😎

## Bonus: Extra Exercise!
##### Refresh Your Memory... 😋

In this exercise, we will practice word tokenization, sentence tokenization, and normalization in texts!

In [None]:
!pip install nltk

In [None]:
#import NLTK library
import nltk

In [None]:
# Sentence tokenizer breaks text paragraph into sentences.
from nltk.tokenize import sent_tokenize
nltk.download('punkt')

text="""Hello Mr. Smith, how are you doing today? The weather is great, and city is awesome.
The sky is pinkish-blue. You shouldn't eat cardboard"""
tokenized_text=sent_tokenize(text)
print(tokenized_text)

In [None]:
# Word tokenizer breaks text paragraph into words.
from nltk.tokenize import word_tokenize
tokenized_word=word_tokenize(text)
print(tokenized_word)

In [None]:
# Frequency Distribution
from nltk.probability import FreqDist
fdist = FreqDist(tokenized_word)
print(fdist)

In [None]:
fdist.most_common(2)

In [None]:
# Frequency Distribution Plot
import matplotlib.pyplot as plt
fdist.plot(30,cumulative=False)
plt.show()

In [None]:
# In NLTK for removing stopwords, you need to create a list of stopwords
# and filter out your list of tokens from these words.
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words=set(stopwords.words("english"))
print(stop_words)

In [None]:
# Removing stop words
nltk.download('punkt')
filtered_sent=[]
for w in tokenized_word:
    if w not in stop_words:
        filtered_sent.append(w)
print("Tokenized Sentence:",tokenized_word)
print("Filterd Sentence:",filtered_sent)

In [None]:
# Normalization

# Stemming is a process of linguistic normalization,
# which reduces words to their word root word or chops off the derivational affixes. 
# For example, connection, connected, connecting word reduce to a common word "connect".

# Stemming
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

ps = PorterStemmer()

stemmed_words=[]
for w in filtered_sent:
    stemmed_words.append(ps.stem(w))

print("Filtered Sentence:",filtered_sent)
print("Stemmed Sentence:",stemmed_words)

In [None]:
# Normalization

# Lemmatization reduces words to their base word, which is linguistically correct lemmas. 
# It transforms root word with the use of vocabulary and morphological analysis.
# Lemmatization is usually more sophisticated than stemming. 
# Stemmer works on an individual word without knowledge of the context. 
# For example, The word "better" has "good" as its lemma. 
# This thing will miss by stemming because it requires a dictionary look-up.

nltk.download('wordnet')
#performing stemming and Lemmatization
from nltk.stem.wordnet import WordNetLemmatizer
lem = WordNetLemmatizer()

from nltk.stem.porter import PorterStemmer
stem = PorterStemmer()

word = "flying"
print("Lemmatized Word:",lem.lemmatize(word,"v"))
print("Stemmed Word:",stem.stem(word))

In [None]:
# POS Tagging
# The primary target of Part-of-Speech(POS) tagging is to identify the grammatical group of a given word. 
# Whether it is a NOUN, PRONOUN, ADJECTIVE, VERB, ADVERBS, etc. based on the context. 
# POS Tagging looks for relationships within the sentence and assigns a corresponding tag to the word.
nltk.download('averaged_perceptron_tagger')

sent = "Albert Einstein was born in Ulm, Germany in 1879."

tokens=nltk.word_tokenize(sent)
print(tokens)

nltk.pos_tag(tokens)

Which of the following is true for neural networks?
1. The training time depends on the size of the network.
2. Neural networks can be simulated on a conventional computer.
3. Artificial neurons are identical in operation to biological ones.
4. All of the mentioned
5. (2) is true
6. (1) and (2) are true
7. None of the mentioned
<br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/>  . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> . <br/> 


Answer: 6
Explanation: The training time depends on the size of the network; the number of neuron is greater and therefore the number of possible ‘states’ is increased. Neural networks can be simulated on a conventional computer but the main advantage of neural networks – parallel execution – is lost. Artificial neurons are not identical in operation to the biological ones.

# CONGRATULATIONS! 🤩