# Introduction
In this notebook we demonstrates how to use `SparseEmbedding` and `SparseAdam` to obtain stroger performance with sparse gradient.

### Prepare Environment
Before you start with Apis delivered by bigdl-nano, you have to make sure BigDL-Nano is correctly installed for TensorFlow. If not, please follow [this](../../../../../docs/readthedocs/source/doc/Nano/Overview/nano.md) to set up your environment.

In [19]:
from bigdl.nano.tf.keras import Model, Sequential

### Load the data
We demonstrate with imdb_reviews, a large dataset of movie reviews.

In [20]:
import tensorflow_datasets as tfds
(raw_train_ds, raw_val_ds, raw_test_ds), info = tfds.load(
    "imdb_reviews",
    split=['train[:80%]', 'train[80%:]', 'test'],
    as_supervised=True,
    batch_size=32,
    shuffle_files=False,
    with_info=True
)

Let's preview a few samples

In [3]:
tfds.as_dataframe(raw_train_ds.unbatch().take(5), info)

2022-07-24 22:25:55.972229: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


Unnamed: 0,label,text
0,0 (neg),"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it."
1,0 (neg),"I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all."
2,0 (neg),"Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to enforce the law themselves, then gunfighters battling it out on the streets for control of the town? <br /><br />Nothing even remotely resembling that happened on the Canadian side of the border during the Klondike gold rush. Mr. Mann and company appear to have mistaken Dawson City for Deadwood, the Canadian North for the American Wild West.<br /><br />Canadian viewers be prepared for a Reefer Madness type of enjoyable howl with this ludicrous plot, or, to shake your head in disgust."
3,1 (pos),"This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through New York life at its best. A family film in every sense and one that deserves the praise it received."
4,1 (pos),"As others have mentioned, all the women that go nude in this film are mostly absolutely gorgeous. The plot very ably shows the hypocrisy of the female libido. When men are around they want to be pursued, but when no ""men"" are around, they become the pursuers of a 14 year old boy. And the boy becomes a man really fast (we should all be so lucky at this age!). He then gets up the courage to pursue his true love."


### Prepare the data
In particular, we remove \<br /> tags.

In [21]:
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization
import string
import re

def custom_standardization(input_data):
    lowercase = tf.strings.lower(input_data)
    stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")
    return tf.strings.regex_replace(
        stripped_html, f"[{re.escape(string.punctuation)}]", ""
    )

max_features = 20000
embedding_dim = 128
sequence_length = 500

vectorize_layer = TextVectorization(
    standardize=custom_standardization,
    max_tokens=max_features,
    output_mode="int",
    output_sequence_length=sequence_length,
)

# Let's make a text-only dataset (no labels):
text_ds = raw_train_ds.map(lambda x, y: x)
# Let's call `adapt`:
vectorize_layer.adapt(text_ds)

In [22]:
def vectorize_text(text, label):
    text = tf.expand_dims(text, -1)
    return vectorize_layer(text), label


# Vectorize the data.
train_ds = raw_train_ds.map(vectorize_text)
val_ds = raw_val_ds.map(vectorize_text)
test_ds = raw_test_ds.map(vectorize_text)

# Do async prefetching / buffering of the data for best performance on GPU.
train_ds = train_ds.cache().prefetch(buffer_size=10)
val_ds = val_ds.cache().prefetch(buffer_size=10)
test_ds = test_ds.cache().prefetch(buffer_size=10)

### Build model
`bigdl.nano.tf.keras.Embedding` is a slightly modified version of tf.keras.Embedding layer, this embedding layer only applies regularizer to the output of the embedding layer, so that the gradient to embeddings is sparse. `bigdl.nano.tf.optimzers.Adam` is a variant of the Adam optimizer that handles sparse updates more efficiently. 

Here we create two models, one using normal Embedding layer and Adam optimizer, the other using `SparseEmbedding` and `SparseAdam`.

In [23]:
from tensorflow.keras import layers
from bigdl.nano.tf.keras.layers import Embedding
from bigdl.nano.tf.optimizers import SparseAdam

def make_backbone():
    inputs = tf.keras.Input(shape=(None, embedding_dim))
    x = layers.Dropout(0.5)(inputs)
    x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
    x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
    x = layers.GlobalMaxPooling1D()(x)
    x = layers.Dense(128, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x)

    model = Model(inputs, predictions)
    return model

def make_model():
    inputs = tf.keras.Input(shape=(None,), dtype="int64")
    x = layers.Embedding(max_features, embedding_dim)(inputs)
    predictions = make_backbone()(x)
    model = Model(inputs, predictions)
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

def make_model_mod():
    inputs = tf.keras.Input(shape=(None,), dtype="int64")
    x = Embedding(max_features, embedding_dim)(inputs)
    predictions = make_backbone()(x)
    model = Model(inputs, predictions)
    model.compile(loss="binary_crossentropy", optimizer=SparseAdam(), metrics=["accuracy"])
    return model

In [7]:
from time import time
model = make_model()

# Shorten fitting time during test
import os
epochs = int(os.environ.get('epochs', 3))

start = time()
model.fit(train_ds, validation_data=val_ds, epochs=epochs)
fit_time = time() - start

his = model.evaluate(test_ds)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [8]:
model = make_model_mod()

start = time()
model.fit(train_ds, validation_data=val_ds, epochs=epochs)
fit_time_mod = time() - start

his_mod = model.evaluate(test_ds)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [18]:
template = """
|        Precision     |    Fit Time(s)    | Accuracy(%) |
|        Benchmark     |       {:5.2f}       |    {:5.2f}    |
|     Model_modified   |       {:5.2f}       |    {:5.2f}    |
|      Improvement(%)  |       {:5.2f}       |    {:5.2f}    |
"""
summary = template.format(
    fit_time, his[1] * 100,
    fit_time_mod, his_mod[1] * 100,
    (1 - fit_time_mod/fit_time) * 100,  (his_mod[1]/his[1] - 1) * 100
)
print(summary)


|        Precision     |    Fit Time(s)    | Accuracy(%) |
|        Benchmark     |       67.36       |    85.12    |
|     Model_modified   |       42.56       |    86.62    |
|      Improvement(%)  |       36.81       |     1.76    |

