This Notebook was originally written by Tensorflow and has been modified by R. D. Slater to run properly with recent changes.  Although the original worked--new changes have caused a runtime error in the .predict() function which I beleive to be due to tensor shapes (None,1,128) vs (None,128) or data type lists.  I modified the functions that produce embedding to return numpy arrays and the model now works as before.  Note you can also pass tensors (tf.convert_to_tensor()) as well.

# BERT Embeddings with TensorFlow 2.0
With the new release of TensorFlow, this Notebook aims to show a simple use of the BERT model.
- See BERT on paper: https://arxiv.org/pdf/1810.04805.pdf
- See BERT on GitHub: https://github.com/google-research/bert
- See BERT on TensorHub: https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1
- See 'old' use of BERT for comparison: https://colab.research.google.com/github/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb

## Update TF
We need Tensorflow 2.2 and TensorHub 0.7 for this Colab

In [1]:
!pip install bert-for-tf2
!pip install sentencepiece



In [2]:
!pip install "tensorflow_hub"



In [3]:
import numpy as np
from typing import List

In [4]:
import tensorflow as tf
import tensorflow_hub as hub
print("TF version: ", tf.__version__) # 2.2
print("Hub version: ", hub.__version__) # 0.8

TF version:  2.1.0
Hub version:  0.8.0


## Import modules

In [5]:
import bert
from tensorflow.keras.models import Model       # Keras is the new high level API for TensorFlow
import math

In [6]:
# Initialize Tokenizer 
FullTokenizer = bert.bert_tokenization.FullTokenizer

Building model using tf.keras and hub. from sentences to embeddings.

Inputs:
 - input token ids (tokenizer converts tokens using vocab file)
 - input masks (1 for useful tokens, 0 for padding)
 - segment ids (for 2 text training: 0 for the first one, 1 for the second one)

Outputs:
 - pooled_output of shape `[batch_size, 768]` with representations for the entire input sequences 
 - sequence_output of shape `[batch_size, max_seq_length, 768]` with representations for each input token (in context)

In [7]:
max_seq_length = 128  # Your choice here.

# Three Inputs to the BERT model
input_word_ids = tf.keras.layers.Input(shape=(max_seq_length), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(max_seq_length), dtype=tf.int32, name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(max_seq_length), dtype=tf.int32, name="segment_ids")

# Bert Layer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2", trainable=True)
pooled_output, sequence_output = bert_layer([input_word_ids, input_mask, segment_ids])

In [8]:
model = Model(inputs=[input_word_ids, input_mask, segment_ids], outputs=[pooled_output, sequence_output])

Generating segments and masks based on the original BERT

In [9]:
# See BERT paper: https://arxiv.org/pdf/1810.04805.pdf
# And BERT implementation convert_single_example() at https://github.com/google-research/bert/blob/master/run_classifier.py

###############################
# Robert Slater: Modifications to these functions to simply return numpy arrays 
###############################

def get_masks(tokens, max_seq_length):
    """Mask for padding"""
    if len(tokens) > max_seq_length:
        raise IndexError("Token length more than max seq length!")
    return np.array([1]*len(tokens) + [0] * (max_seq_length - len(tokens)))


def get_segments(tokens, max_seq_length):
    """Segments: 0 for the first sequence, 1 for the second"""
    if len(tokens)>max_seq_length:
        raise IndexError("Token length more than max seq length!")
    segments = []
    current_segment_id = 0
    for token in tokens:
        segments.append(current_segment_id)
        if token == "[SEP]":
            current_segment_id = 1
    return np.array(segments + [0] * (max_seq_length - len(tokens)))


def get_ids(tokens, tokenizer, max_seq_length):
    """Token ids from Tokenizer vocab"""
    token_ids = tokenizer.convert_tokens_to_ids(tokens)
    input_ids = token_ids + [0] * (max_seq_length-len(token_ids))
    return np.array(input_ids)

Import tokenizer using the original vocab file

In [10]:
?FullTokenizer

[1;31mInit signature:[0m [0mFullTokenizer[0m[1;33m([0m[0mvocab_file[0m[1;33m,[0m [0mdo_lower_case[0m[1;33m=[0m[1;32mTrue[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      Runs end-to-end tokenziation.
[1;31mFile:[0m           c:\users\nikhil\.conda\envs\dl_nlp\lib\site-packages\bert\tokenization\bert_tokenization.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     


In [11]:
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
print(vocab_file, ' | ' , do_lower_case)

b'C:\\Users\\Nikhil\\AppData\\Local\\Temp\\tfhub_modules\\ce53fe6769d2ac3a260e92555120c54e1aecbea6\\assets\\vocab.txt'  |  True


In [12]:
tokenizer = FullTokenizer(vocab_file, do_lower_case)

## Test BERT embedding generator model

In [13]:
[2,45,15,706]  # TODO: What is this list?
s = "This movie is bad"

Tokenizing the sentence

In [14]:
stokens = tokenizer.tokenize(s)
stokens

['this', 'movie', 'is', 'bad']

Adding separator tokens according to the paper

In [15]:
stokens = ["[CLS]"] + stokens + ["[SEP]"]
stokens

['[CLS]', 'this', 'movie', 'is', 'bad', '[SEP]']

Get the model inputs from the tokens

In [16]:
input_ids = get_ids(stokens, tokenizer, max_seq_length)
input_masks = get_masks(stokens, max_seq_length)
input_segments = get_segments(stokens, max_seq_length)

In [17]:
print(f"Tokens: {stokens}")
print(f"Input IDs:\n {input_ids}")
print(f"Input Masks:\n {input_masks}")
print(f"Input Segments:\n {input_segments}")

Tokens: ['[CLS]', 'this', 'movie', 'is', 'bad', '[SEP]']
Input IDs:
 [ 101 2023 3185 2003 2919  102    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0]
Input Masks:
 [1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Input

Generate Embeddings using the pretrained model

In [18]:
# Expect a shape Wawrning.  I beleive this is due to eager execution, but not sure
pool_embs, all_embs = model.predict([[input_ids],[input_masks],[input_segments]])

## TODO: I am getting a different warning than the one in the original notebook. 



## Pooled embedding vs [CLS] as sentence-level representation

Previously, the [CLS] token's embedding were used as sentence-level representation (see the original paper). However, here a pooled embedding were introduced. This part is a short comparison of the two embedding using cosine similarity

In [19]:
def square_rooted(x):
    return math.sqrt(sum([a*a for a in x]))


def cosine_similarity(x,y):
    numerator = sum(a*b for a,b in zip(x,y))
    denominator = square_rooted(x)*square_rooted(y)
    return numerator/float(denominator)

In [20]:
cosine_similarity(pool_embs[0], all_embs[0][0])

0.030847375908003807

In [21]:
cosine_similarity(pool_embs[0], all_embs[0][0])

0.030847375908003807

In [22]:
model.summary()
## TODO: Also, below, my shape is (None, 128) whereas the one in the original notebook is (None, 512)

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_word_ids (InputLayer)     [(None, 128)]        0                                            
__________________________________________________________________________________________________
input_mask (InputLayer)         [(None, 128)]        0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        [(None, 128)]        0                                            
__________________________________________________________________________________________________
keras_layer (KerasLayer)        [(None, 768), (None, 109482241   input_word_ids[0][0]             
                                                                 input_mask[0][0]             

In [23]:
pool_embs.shape
## TODO: Also, below, my shape is (1, 768) whereas the one in the original notebook is (512, 768)

(1, 768)

In [24]:
all_embs.shape

(1, 128, 768)

# Assignment

**Take the imdb database data set.  Convert reviews into text and then create data to be put into BERT.**

## 1.   Load imdb dataset

In [25]:
from tensorflow.keras.datasets import imdb

In [26]:
vocabulary_size = 5000

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

Loaded dataset with 25000 training samples, 25000 test samples


In [27]:
print('---review---')
print(len(X_train[0]))
print(X_train[0])
print('---label---')
print(y_train[0])

---review---
218
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]
---label---
1


In [28]:
print(set(y_train), "|", set(y_test))

{0, 1} | {0, 1}


## 2. Convert integers from imdb dictionary to text

In [29]:
# A dictionary mapping words to an integer index
word_index = imdb.get_word_index()
print(type(word_index), "|", len(word_index))
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
for i in range(1,15):
    print(f"{i}: {reverse_word_index[i]}")

<class 'dict'> | 88584
1: the
2: and
3: a
4: of
5: to
6: is
7: br
8: in
9: it
10: i
11: this
12: that
13: was
14: as


In [30]:
type(X_train)

numpy.ndarray

In [31]:
sentences = []
for i in np.arange(len(X_train)):
    sentence = [reverse_word_index[word_index] for word_index in X_train[i]]
    sentences.append(sentence)

X_train_decoded = np.array(sentences)

In [32]:
print(X_train_decoded.shape)
print(X_train[1])
print(X_train_decoded[1])

(25000,)
[1, 194, 1153, 194, 2, 78, 228, 5, 6, 1463, 4369, 2, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 2, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 2, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 2, 2, 349, 2637, 148, 605, 2, 2, 15, 123, 125, 68, 2, 2, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 2, 5, 2, 656, 245, 2350, 5, 4, 2, 131, 152, 491, 18, 2, 32, 2, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]
['the', 'thought', 'solid', 'thought', 'and', 'do', 'making', 'to', 'is', 'spot', 'nomination', 'and', 'while', 'he', 'of', 'jack', 'in', 'where', 'picked', 'as', 'getting', 'on', 'was', 'd

In [33]:
sentences = []
for i in np.arange(len(X_test)):
    sentence = [reverse_word_index[word_index] for word_index in X_test[i]]
    sentences.append(sentence)

X_test_decoded = np.array(sentences)

In [34]:
print(X_test_decoded.shape)
print(X_test[1])
print(X_test_decoded[1])

(25000,)
[1, 14, 22, 3443, 6, 176, 7, 2, 88, 12, 2679, 23, 1310, 5, 109, 943, 4, 114, 9, 55, 606, 5, 111, 7, 4, 139, 193, 273, 23, 4, 172, 270, 11, 2, 2, 4, 2, 2801, 109, 1603, 21, 4, 22, 3861, 8, 6, 1193, 1330, 10, 10, 4, 105, 987, 35, 841, 2, 19, 861, 1074, 5, 1987, 2, 45, 55, 221, 15, 670, 2, 526, 14, 1069, 4, 405, 5, 2438, 7, 27, 85, 108, 131, 4, 2, 2, 3884, 405, 9, 3523, 133, 5, 50, 13, 104, 51, 66, 166, 14, 22, 157, 9, 4, 530, 239, 34, 2, 2801, 45, 407, 31, 7, 41, 3778, 105, 21, 59, 299, 12, 38, 950, 5, 4521, 15, 45, 629, 488, 2733, 127, 6, 52, 292, 17, 4, 2, 185, 132, 1988, 2, 1799, 488, 2693, 47, 6, 392, 173, 4, 2, 4378, 270, 2352, 4, 1500, 7, 4, 65, 55, 73, 11, 346, 14, 20, 9, 6, 976, 2078, 7, 2, 861, 2, 5, 4182, 30, 3127, 2, 56, 4, 841, 5, 990, 692, 8, 4, 1669, 398, 229, 10, 10, 13, 2822, 670, 2, 14, 9, 31, 7, 27, 111, 108, 15, 2033, 19, 2, 1429, 875, 551, 14, 22, 9, 1193, 21, 45, 4829, 5, 45, 252, 8, 2, 6, 565, 921, 3639, 39, 4, 529, 48, 25, 181, 8, 67, 35, 1732, 22, 49, 238

## 3. Tokenize and convert the text to integers for BERT


In [35]:
def tokenize_sentence(sentence: List, tokenizer, max_seq_length: int) -> List:
    """
    Tokenize a single sentence for BERT
      1. Tokenizes the sentence
      2. Chops off excess words
      3. Adds the CLS and SEP tokens

    :param sentence A "single" sentence that needs to be tokenized.
    :type sentence List
    :param tokenizer The BERT tokenizer
    :type tokenizer <TBD>
    :param max_seq_length The maximum sequence length to use (including the CLS and SEP tokens)
    :type max_seq_length int
    :rtype List
    """
    # Tokenize Sentence
    stokens = tokenizer.tokenize(" ".join(sentence))

    # Chop off excess
    if len(stokens) > (max_seq_length - 2):
        stokens = stokens[:(max_seq_length - 2)]

    # Add [CLS] and [SEP] tokens
    stokens = ["[CLS]"] + stokens + ["[SEP]"]

    return stokens

def tokenize_all_data(data: np.ndarray, tokenizer, max_seq_length: int):
    """
    Takes the complete data (multiple sentence) and tokenizes it for BERT
    For each sentence (row of data), performs the following steps
      1. Tokenizes the sentence
      2. Chops off excess words
      3. Adds the CLS and SEP tokens

    :param data A complete data (comprising of multiple sentences) that needs to be tokenized.
    :type data np.ndarray
    :param tokenizer The BERT tokenizer
    :type tokenizer <TBD>
    :param max_seq_length The maximum sequence length to use (including the CLS and SEP tokens)
    :type max_seq_length int
    :rtype List
    """
    data_tokens = []
    for i in np.arange(len(data)):
        stokens = tokenize_sentence(sentence=data[i], tokenizer=tokenizer, max_seq_length=max_seq_length)
        data_tokens.append(stokens)

    data_tokens = np.array(data_tokens)
    return data_tokens

def get_ids_from_tokenized_data(data_tokens, tokenizer, max_seq_length):
    """
    Converts the tokens to IDs for BERT input
    TODO: Complete Docstring
    """
    data_input_ids = []
    for i in np.arange(len(data_tokens)):
        input_ids = get_ids(data_tokens[i], tokenizer, max_seq_length)
        data_input_ids.append(input_ids)

    data_input_ids = np.array(data_input_ids)
    return data_input_ids

def get_masks_from_tokenized_data(data_tokens, max_seq_length):
    """
    Converts the tokens to masks for BERT input
    TODO: Complete Docstring
    """
    data_input_masks = []
    for i in np.arange(len(data_tokens)):
        input_masks = get_masks(data_tokens[i], max_seq_length)
        data_input_masks.append(input_masks)

    data_input_masks = np.array(data_input_masks)
    return data_input_masks

def get_segments_from_tokenized_data(data_tokens, max_seq_length):
    """
    Converts the tokens to segments for BERT input
    TODO: Complete Docstring
    """
    data_input_segments = []
    for i in np.arange(len(data_tokens)):
        input_segments = get_segments(data_tokens[i], max_seq_length)
        data_input_segments.append(input_segments)

    data_input_segments = np.array(data_input_segments)
    return data_input_segments



In [36]:
X_train_tokens = tokenize_all_data(data=X_train_decoded, tokenizer=tokenizer, max_seq_length=max_seq_length)

In [37]:
print(X_train_tokens.shape)
print("-"*100)
print(len(X_train_tokens[0]), "|", X_train_tokens[0])
print("-"*100)
print(len(X_train_tokens[5]), "|", X_train_tokens[5])

(25000,)
----------------------------------------------------------------------------------------------------
128 | ['[CLS]', 'the', 'as', 'you', 'with', 'out', 'themselves', 'powerful', 'lets', 'loves', 'their', 'becomes', 'reaching', 'had', 'journalist', 'of', 'lot', 'from', 'anyone', 'to', 'have', 'after', 'out', 'atmosphere', 'never', 'more', 'room', 'and', 'it', 'so', 'heart', 'shows', 'to', 'years', 'of', 'every', 'never', 'going', 'and', 'help', 'moments', 'or', 'of', 'every', 'chest', 'visual', 'movie', 'except', 'her', 'was', 'several', 'of', 'enough', 'more', 'with', 'is', 'now', 'current', 'film', 'as', 'you', 'of', 'mine', 'potentially', 'unfortunately', 'of', 'you', 'than', 'him', 'that', 'with', 'out', 'themselves', 'her', 'get', 'for', 'was', 'camp', 'of', 'you', 'movie', 'sometimes', 'movie', 'that', 'with', 'scary', 'but', 'and', 'to', 'story', 'wonderful', 'that', 'in', 'seeing', 'in', 'character', 'to', 'of', '70s', 'and', 'with', 'heart', 'had', 'shadows', 'they', '

In [38]:
X_train_input_ids = get_ids_from_tokenized_data(data_tokens=X_train_tokens, tokenizer=tokenizer, max_seq_length=max_seq_length)
X_train_input_ids.shape

(25000, 128)

In [39]:
print(X_train_input_ids[0])
print("-"*100)
print(X_train_input_ids[5])

[  101  1996  2004  2017  2007  2041  3209  3928 11082  7459  2037  4150
  4285  2018  4988  1997  2843  2013  3087  2000  2031  2044  2041  7224
  2196  2062  2282  1998  2009  2061  2540  3065  2000  2086  1997  2296
  2196  2183  1998  2393  5312  2030  1997  2296  3108  5107  3185  3272
  2014  2001  2195  1997  2438  2062  2007  2003  2085  2783  2143  2004
  2017  1997  3067  9280  6854  1997  2017  2084  2032  2008  2007  2041
  3209  2014  2131  2005  2001  3409  1997  2017  3185  2823  3185  2008
  2007 12459  2021  1998  2000  2466  6919  2008  1999  3773  1999  2839
  2000  1997 17549  1998  2007  2540  2018  6281  2027  1997  2182  2008
  2007  2014  3809  2000  2031  2515  2043  2013  2339  2054  2031  4401
  2027  2003  2017  2008  3475  1005  1056   102]
----------------------------------------------------------------------------------------------------
[ 101 1996 3947 2145 2042 2008 2788 3084 2005 1997 2736 1998 3092 1998
 2019 2138 2077 2065 2074 2295 2242 2113 3117 29

In [40]:
# TODO: Repeat for Test dataset

## 4.  Create text Masks for BERT



In [41]:
X_train_masks = get_masks_from_tokenized_data(data_tokens=X_train_tokens, max_seq_length=max_seq_length)

In [42]:
print(sum(X_train_masks[0] == 1), " | ", X_train_masks[0])
print("-"*100)
print(sum(X_train_masks[5] == 1), " | ", X_train_masks[5])

128  |  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
----------------------------------------------------------------------------------------------------
45  |  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


In [43]:
# TODO: Repeat for Test dataset

## 5. Create text Segments for BERT

In [44]:
X_train_input_segments = get_segments_from_tokenized_data(data_tokens=X_train_tokens, max_seq_length=max_seq_length)

In [45]:
print(sum(X_train_input_segments[0] == 0), " | ", X_train_input_segments[0])
print("-"*100)
print(sum(X_train_input_segments[5] == 0), " | ", X_train_input_segments[5])

128  |  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
----------------------------------------------------------------------------------------------------
128  |  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


In [46]:
# TODO: Repeat for Test dataset