<a href="https://colab.research.google.com/github/jakubglinka/google.colab/blob/master/NLP/supervised/SequenceClassificationWithAttention.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sequence classification using simple Attention 
 - Positional encoding with additional [CLS] token
 - Pre-trained SP embeddings
 - Small multihead attention model

## Configure environment

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
# !ls "./drive/My Drive/embeddings"
# !tar -C "./drive/My Drive/embeddings" -xvf "./drive/My Drive/embeddings/embeddings (1).tar.gz"

In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print(tf.__version__)

TensorFlow 2.x selected.
Num GPUs Available:  0
2.1.0-rc1


In [3]:
try:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
except ValueError as error:
  print(error)
  print("No TPU available. Switching to single device strategy.")
  strategy = tf.distribute.OneDeviceStrategy(device="/gpu")

Please provide a TPU Name to connect to.
No TPU available. Switching to single device strategy.


In [0]:
!pip install transformers
!pip install git+https://jbglin:botrx56jtlp6p2cbsthvt3bkslgeo3pzc5c7iuu4irxscjmmc6xa@dev.azure.com/eyDataScienceTeam/_git/nlp-ey-assets@develop

## Data Preparation

### PolEmo data

In [0]:
import pandas as pd
import pathlib
import re
import tqdm
POLEMO_PATH = "./drive/My Drive/sentiment/"
from typing import List
import numpy as np

In [6]:
# read PolEmo data:
def read_polemo_data(path) -> pd.DataFrame:
  res = []
  with path.open("r") as f:
    for line in f:
      rec = line.strip("\n").split("__label__")
      rec[0] = rec[0].strip()
      res.append(rec)

  return pd.DataFrame(res, columns=["text", "label"])

df_train = read_polemo_data(pathlib.Path(POLEMO_PATH) / "all.sentence.train.txt")
print(f"Read {df_train.shape[0]} train examples.")

df_dev = read_polemo_data(pathlib.Path(POLEMO_PATH) / "all.sentence.dev.txt")
print(f"Read {df_dev.shape[0]} dev examples.")

df_test = read_polemo_data(pathlib.Path(POLEMO_PATH) / "all.sentence.test.txt")
print(f"Read {df_test.shape[0]} test examples.")

Read 45974 train examples.
Read 5747 dev examples.
Read 5745 test examples.


In [0]:
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder().fit(df_train.label.values)

# add encoded labels:
df_train["label_enc"] = enc.transform(df_train["label"])
df_dev["label_enc"] = enc.transform(df_dev["label"])
df_test["label_enc"] = enc.transform(df_test["label"])

## Multihead attention

### Architecture

![alt text](https://miro.medium.com/max/3740/1*hC4zIxPPK9KGDu-OYUfnCQ.png)

### Settings

 - number of sentence piece tokens: 32000
 - 

### Prepare Embedder

In [9]:
import numpy as np
from absl import logging
from nlp.models.embeddings import SentencePiece2Vec, load_embedder

# set verbosity:
logging.set_verbosity(logging.INFO)

# load embeder:
embedder = load_embedder("./drive/My Drive/embeddings/SP2VEC_UNIGRAM_VS=32k_ED=768_NS=5_CS=10_SP=TRUE_LANG=PL")

INFO:absl:Loading embedder from drive/My Drive/embeddings/SP2VEC_UNIGRAM_VS=32k_ED=768_NS=5_CS=10_SP=TRUE_LANG=PL
INFO:absl:Loading tokenizer...
INFO:absl:Loading embedder parameters...
INFO:absl:Loading words counter...
INFO:absl:Restoring Tensorflow model...
INFO:absl:Loading model weights from drive/My Drive/embeddings/SP2VEC_UNIGRAM_VS=32k_ED=768_NS=5_CS=10_SP=TRUE_LANG=PL/model.h5...
INFO:absl:41 out of 32000 SentencePieces had not been seen during training...


In [11]:
# test tokenizer:
embedder.tokenizer.tokenize("Ala ma kota!")

[['▁Ala'], ['▁ma'], ['▁kot', 'a'], ['▁!']]

### Normalize and Filter OOV words

In [106]:
# sentence:
text = df_train.text[15]
# text = '\xadna'
print(embedder.tokenizer.normalize(text))

# check with Embeddings corpus:
def _is_oov(token: str, embedder=embedder, min_count: int = 10) -> bool:
  token_count = embedder._words_counter.get(tuple(embedder.tokenizer.encode(token)[0]), 0)
  return token_count < min_count

def _clean_text(text: str, embedder=embedder, min_count: int = 1) -> str:
  tokens = embedder.tokenizer.normalize(text).split()
  tokens = [token for token in tokens if not _is_oov(token, embedder, min_count)]
  return " ".join(tokens)

print(text, "->", _clean_text(text))
embedder._words_counter.get(tuple(embedder.tokenizer.encode("3na")[0]))
# embedder._words_counter.most_common(5)
# embedder.tokenizer.decode([[]])

# ((2510,))
# embedder.tokenizer.get_vocabulary()

tuple(embedder.tokenizer.encode(text)[0])

Typowym przykładem jest zjawisko " jet lag " znane każdemu , kto poleciał samolotem na egzotyczne wakacje w strefie czasowej różniącej się o wiele godzin .
Typowym przykładem jest zjawisko " jet lag " znane każdemu , kto poleciał samolotem na egzotyczne wakacje w strefie czasowej różniącej się o wiele godzin . -> Typowym przykładem jest zjawisko " jet lag " znane każdemu , kto poleciał samolotem na egzotyczne wakacje w strefie czasowej różniącej się o wiele godzin .


(24270, 51)

In [28]:
%timeit _is_oov("Jakub")

The slowest run took 8.22 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.5 µs per loop


### Prepare TFRecords

#### Prepare single Example

In [0]:
embedder.embed("Ala ma kota!")

array([[ 0.62911153,  0.03254938, -0.21758054, ..., -0.68023241,
        -0.8192898 , -0.55972898],
       [ 0.08749175,  0.23787333,  0.14196329, ..., -0.3038916 ,
         0.27634075, -0.20386842],
       [ 0.68985907, -0.32057452,  1.10701975, ...,  0.88873127,
        -0.09599429,  0.17036521],
       [ 0.03494044, -0.05984996,  0.47917938, ..., -0.17744191,
         0.35598704, -0.45015821]])

In [0]:
from typing import Dict

# prepare data:
def _prepare_single_example(x: str, max_seq_len: int = None, label: int = 0) -> np.array:
  res = embedder.embed(text=x)
  res = np.array(res, dtype="float32")
  inputs = {}
  if max_seq_len is not None:
    res = res[:max_seq_len, :]
    n_tokens = res.shape[0]
    inputs["text/embedding"] = np.pad(res, ((0, max_seq_len - n_tokens),(0, 0)))
  else:
    inputs["text/embedding"] = res
    n_tokens = res.shape[0]
    
  inputs["text/embedding"] = tf.convert_to_tensor(inputs["text/embedding"], dtype=tf.float32)
  inputs["text/seq_len"] = tf.convert_to_tensor(n_tokens, dtype=tf.int32)
  inputs["text/label"] = tf.convert_to_tensor(label, dtype=tf.int32)
  return inputs

# serialize:
def _serialize_example(example):
  example["text/embedding"] = tf.io.serialize_tensor(example["text/embedding"]).numpy()
  example["text/embedding"] = tf.train.Feature(bytes_list = tf.train.BytesList(value=[example["text/embedding"]]))

  example["text/seq_len"] = tf.train.Feature(int64_list = tf.train.Int64List(value=[example["text/seq_len"].numpy()]))
  example["text/label"] = tf.train.Feature(int64_list = tf.train.Int64List(value=[example["text/label"].numpy()]))

  ex = tf.train.Example(features=tf.train.Features(feature=example))
  return ex.SerializeToString()  

example = _prepare_single_example("Ala ma kota!", 64, 1)
tf.train.Example.FromString(_serialize_example(example))

features {
  feature {
    key: "text/embedding"
    value {
      bytes_list {
        value: "\010\001\022\t\022\002\010@\022\003\010\200\006\"\200\200\014t\r!?\201R\005=o\315^\276\257S\002\276O<\031\277KI\362\276u\260\201\276\031\324\241>\013\0168>\207\266\322=8\232\341\276\030./\2760H$\277\036q\241\277\336\230\r\276|g\366=\253\274\335\276\r\3678=\360&\317<\035n\023>\373\215#>\307?\000\275\227m%>\003\257\207\276\004\354%?\227\234\024\276l\330\"?\027e\027=,\032\360\276nJ\027>\334>\'\275k\177\316>\036\004\343\276\345j\000?z\354C>\317\362\223\276\326\264X\277\255\331\360\276\030h\217:/\022=>\363k7>\031\340\300\276\226\363T>\344\306\351>Wl\251=\272\344&>\246\206\003\276\305J\302\275\311\346C>O\376\302\276z\033\233\276\250#\340\276a\275\243>\035\026R?>\017`\277\320\347\245=\232\310X?gG\006\276\230\236e>\327\3561\273]\323\027?\270QU\276\013\211\031?\202\202\353\276\2366q>\337U\370;\364\2536?\n4+\275\271\361\317\276FW\003\276\346\262\226?\211\212A>^\350\004\277\252\342c>\016\3247\277j\016t

#### Save as TFRecords

In [0]:
!rm ./train.TFRecord ./dev.TFRecord ./test.TFRecord

rm: cannot remove './train.TFRecord': No such file or directory
rm: cannot remove './dev.TFRecord': No such file or directory
rm: cannot remove './test.TFRecord': No such file or directory


In [0]:
# Write the `tf.Example` observations to the file.

with tf.io.TFRecordWriter("./train.TFRecord") as writer:
  for __, row in tqdm.tqdm(df_train.iterrows()):
    example = _prepare_single_example(row.text, None, row.label_enc)
    writer.write(_serialize_example(example))

with tf.io.TFRecordWriter("./dev.TFRecord") as writer:
  for __, row in tqdm.tqdm(df_dev.iterrows()):
    example = _prepare_single_example(row.text, None, row.label_enc)
    writer.write(_serialize_example(example))

with tf.io.TFRecordWriter("./test.TFRecord") as writer:
  for __, row in tqdm.tqdm(df_test.iterrows()):
    example = _prepare_single_example(row.text, None, row.label_enc)
    writer.write(_serialize_example(example))


45974it [03:19, 230.61it/s]
5747it [00:24, 234.85it/s]
5745it [00:25, 222.81it/s]


In [0]:
!ls -la -h | grep TF

-rw-r--r-- 1 root root 306M Jan  8 20:20 dev.TFRecord
-rw-r--r-- 1 root root 304M Jan  8 20:20 test.TFRecord
-rw-r--r-- 1 root root 2.4G Jan  8 20:19 train.TFRecord


#### Copy to GC Bucket

In [0]:
from google.colab import auth
auth.authenticate_user()

# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'southern-shard-211411'
!gcloud config set project {project_id}

Updated property [core/project].


In [0]:
# Upload the files to a given Google Cloud Storage bucket.
!gsutil cp ./train.TFRecord gs://tf_experiments_records/PolEmo/embedded/train.TFRecord
!gsutil cp ./dev.TFRecord gs://tf_experiments_records/PolEmo/embedded/dev.TFRecord
!gsutil cp ./test.TFRecord gs://tf_experiments_records/PolEmo/embedded/test.TFRecord


Copying file://./train.TFRecord [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/  2.4 GiB]                                                ==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

|
Operation completed over 1 objects/2.4 GiB.                                      
Copying file://./dev.TFRecord [Content-Type=application/octet-stream]...
==> NOTE: Y

### Prepare Dataset

In [0]:
from google.colab import auth
auth.authenticate_user()

# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'southern-shard-211411'
!gcloud config set project {project_id}

Updated property [core/project].


To take a quick anonymous survey, run:
  $ gcloud survey



In [0]:
# give access for TPU Pod
!gsutil acl ch -u service-495559152420@cloud-tpu.iam.gserviceaccount.com:READER gs://tf_experiments_records/PolEmo/embedded/train.TFRecord
!gsutil acl ch -u service-495559152420@cloud-tpu.iam.gserviceaccount.com:READER gs://tf_experiments_records/PolEmo/embedded/dev.TFRecord
!gsutil acl ch -u service-495559152420@cloud-tpu.iam.gserviceaccount.com:READER gs://tf_experiments_records/PolEmo/embedded/test.TFRecord

Updated ACL on gs://tf_experiments_records/PolEmo/embedded/train.TFRecord
Updated ACL on gs://tf_experiments_records/PolEmo/embedded/dev.TFRecord
Updated ACL on gs://tf_experiments_records/PolEmo/embedded/test.TFRecord


In [0]:
# Create a dictionary describing the features.
_feature_description = {
    'text/embedding': tf.io.FixedLenFeature([], tf.string),
    'text/label': tf.io.FixedLenFeature([], tf.int64),
    'text/seq_len': tf.io.FixedLenFeature([], tf.int64)
}

def _parse_data(example_proto, max_seq_len: int = 128):
  # Parse the input tf.Example proto using the dictionary above.
  rec = tf.io.parse_single_example(example_proto, _feature_description)
  
  # attention_mask:
  rec["text/embedding"] = tf.io.parse_tensor(rec["text/embedding"], out_type=tf.float32)
  rec["text/embedding"] = rec["text/embedding"][:max_seq_len, :]
  n_tokens = tf.shape(rec["text/embedding"])[0]
  padding = max_seq_len - n_tokens
  rec["text/embedding"] = tf.pad(rec["text/embedding"], paddings=[[0, padding], [0, 0]])

  # shape bug?
  rec["text/embedding"] = tf.reshape(rec["text/embedding"], [max_seq_len, 768])
  # print(rec["text/embedding"])
  
  labels = rec["text/label"]
  # print(labels)
  labels = tf.one_hot(labels, depth=4)
  # labels = rec["text/label"]
  inputs = rec
  inputs.pop("text/label")

  return inputs, labels

train_raw = tf.data.TFRecordDataset("gs://tf_experiments_records/PolEmo/embedded/dev.TFRecord", num_parallel_reads=1)
example_proto = next(iter(train_raw))
example_proto
_parse_data(example_proto, max_seq_len=18)

({'text/embedding': <tf.Tensor: shape=(18, 768), dtype=float32, numpy=
  array([[ 1.09389514e-01,  3.44708890e-01,  4.13581580e-01, ...,
          -5.57855487e-01,  2.60643750e-01,  7.47362137e-01],
         [-2.94395030e-01, -1.04304159e+00,  3.58738117e-02, ...,
          -5.40329993e-01, -5.27269859e-03, -2.85313189e-01],
         [-8.10914556e-04, -6.40786067e-02, -1.51854858e-01, ...,
           7.12284595e-02,  6.91386908e-02,  5.71665317e-02],
         ...,
         [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
         [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
         [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00]], dtype=float32)>,
  'text/seq_len': <tf.Tensor: shape=(), dtype=int64, numpy=7>},
 <tf.Tensor: shape=(4,), dtype=float32, numpy=array([0., 1., 0., 

In [0]:
BATCH_SIZE = 8*64
STEPS_PER_EPOCH = int(np.floor(45000 / BATCH_SIZE))
VALIDATION_STEPS = int(np.floor(5700 / BATCH_SIZE))

print(f"STEPS_PER_EPOCH: {STEPS_PER_EPOCH}")
print(f"VALIDATION_STEPS: {VALIDATION_STEPS}")

train_raw = tf.data.TFRecordDataset("gs://tf_experiments_records/PolEmo/embedded/train.TFRecord", num_parallel_reads=1)
train_parsed = train_raw.map(_parse_data).batch(BATCH_SIZE, drop_remainder=True).shuffle(1024).repeat(100)
train_parsed = train_parsed.prefetch(-1)

inputs, labels = next(iter(train_parsed))

dev_raw = tf.data.TFRecordDataset("gs://tf_experiments_records/PolEmo/embedded/dev.TFRecord", num_parallel_reads=1)
dev_parsed = dev_raw.map(_parse_data).batch(BATCH_SIZE, drop_remainder=True)
dev_parsed = dev_parsed.prefetch(-1)

test_raw = tf.data.TFRecordDataset("gs://tf_experiments_records/PolEmo/embedded/test.TFRecord", num_parallel_reads=1)
test_parsed = test_raw.map(_parse_data).batch(BATCH_SIZE, drop_remainder=True)
test_parsed = test_parsed.prefetch(-1)

inputs['text/embedding'].shape

STEPS_PER_EPOCH: 87
VALIDATION_STEPS: 11


TensorShape([512, 128, 768])

### Create model

#### ModelConfig

In [0]:
from transformers import BertConfig

class ModelConfig(BertConfig):

  def __init__(self, seed, max_seq_len: int = 128, embedding_dim: int = 768,   **kwargs):
    super(ModelConfig, self).__init__(**kwargs)
    self.seed = seed
    self.max_seq_len = max_seq_len
    
# new config
config = ModelConfig(vocab_size=10000, output_hidden_states=False, max_seq_len=128,
                    hidden_size=768, 
                    num_attention_heads=1, 
                    num_hidden_layers=1, 
                    intermediate_size=512, 
                    max_position_embeddings=128,
                    num_labels=4, 
                    hidden_dropout_prob=0.1, 
                    attention_probs_dropout_prob=0.1, seed=1234, )

# config

#### Embedding layer

In [0]:
# https://github.com/huggingface/transformers/blob/645713e2cb8307e41febb2b7c9f6036f6645efce/transformers/modeling_tf_bert.py#L93

class PositionalEmbedding(tf.keras.layers.Layer):
  """Enrich embeddings with positional encoding."""

  def __init__(self, config, **kwargs):
    super(PositionalEmbedding, self).__init__(name="PositonalEmbedding", **kwargs)
    self.embedding_dim = config.hidden_size
    self.positional_embeddings = tf.keras.layers.Embedding(config.max_position_embeddings, 
                                                         config.hidden_size,
                                                         embeddings_initializer=tf.keras.initializers.GlorotUniform(seed=config.seed),
                                                         name="weights")
    self.layer_norm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="layer_norm")
    self.dropout = tf.keras.layers.Dropout(rate=config.hidden_dropout_prob, name="dropout")

  def build(self, input_shape):
    with tf.name_scope("positional_embeddings"):
      self.cls_embedding = self.add_weight("cls_embedding", 
                                           shape=[1, 1, config.hidden_size], 
                                           initializer=tf.keras.initializers.GlorotUniform(seed=config.seed+1))
      self.positional_embeddings.build(input_shape)
    super(PositionalEmbedding, self).build(input_shape)

  def call(self, inputs, training=False):

    position_ids = tf.range(tf.shape(inputs["text/embedding"])[1])
    emb = inputs["text/embedding"] + self.positional_embeddings(position_ids)
    emb = tf.concat([tf.repeat(self.cls_embedding, tf.shape(emb)[0],axis=0), emb], axis=1)
    emb = self.layer_norm(emb)
    emb = self.dropout(emb, training=training)
    
    return emb

pe = PositionalEmbedding(config)
pe.build([None, 128, 768])
emb = pe(inputs)


#### Multihead attention

In [0]:
from transformers.modeling_tf_bert import TFBertEncoder, shape_list
encoder = TFBertEncoder(config)

# head mask:
head_mask = [None] * config.num_hidden_layers
head_mask

# attention mask:
attention_mask = tf.sequence_mask(lengths=inputs["text/seq_len"]+1, maxlen=129, dtype=tf.float32)
extended_attention_mask = attention_mask[:, tf.newaxis, tf.newaxis, :]
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
extended_attention_mask

# encoder:
ctx, = encoder([emb, extended_attention_mask, head_mask])
ctx.shape

TensorShape([512, 129, 768])

In [0]:
%timeit encoder([emb, extended_attention_mask, head_mask])[0].shape

1 loop, best of 3: 566 ms per loop


#### Final model

In [0]:
class SequenceClassificationAttention(tf.keras.Model):

  def __init__(self, config, name="SequenceClassificationAttention", **kwargs):
    super(SequenceClassificationAttention, self).__init__(name=name, **kwargs)
    self.positional_embedding = PositionalEmbedding(config)
    self.encoder = TFBertEncoder(config)

    self.pool = tf.keras.layers.Dense(units=config.hidden_size, activation="tanh")
    self.dense = tf.keras.layers.Dense(units=config.num_labels, activation=None)

    self.num_hidden_layers = config.num_hidden_layers
    self.max_seq_len = config.max_seq_len


  def call(self, inputs, training=False):

    emb = self.positional_embedding(inputs, training=training)
    
    # head mask:
    head_mask = [None] * self.num_hidden_layers

    # attention mask:
    attention_mask = tf.sequence_mask(lengths=inputs["text/seq_len"]+1, maxlen=self.max_seq_len+1, dtype=tf.float32)
    extended_attention_mask = attention_mask[:, :, tf.newaxis, :]
    extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0

    # print(emb)
    # print(extended_attention_mask)
    # print(head_mask)

    ctx, = self.encoder([emb, extended_attention_mask, head_mask])
    logits = self.dense(self.pool(ctx[:, 0]))
    return logits

model = SequenceClassificationAttention(config)
# model(inputs)


### Model training

In [0]:
config = ModelConfig(vocab_size=32000, output_hidden_states=False, max_seq_len=128,
                    hidden_size=768, 
                    num_attention_heads=12, 
                    num_hidden_layers=12, 
                    intermediate_size=3072, 
                    max_position_embeddings=128,
                    num_labels=4, 
                    hidden_dropout_prob=0.3, 
                    attention_probs_dropout_prob=0.1, seed=1234)


In [0]:
try:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
except ValueError as error:
  print(error)
  print("No TPU available. Switching to single device strategy.")
  strategy = tf.distribute.OneDeviceStrategy(device="/gpu")

with strategy.scope():
  tf.random.set_seed(1234)
  model = SequenceClassificationAttention(config)

  initial_learning_rate = 1e-5
  lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=STEPS_PER_EPOCH*1000,
    decay_rate=.9,
    staircase=True)
  
  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule), 
                loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), 
                metrics=[tf.keras.metrics.CategoricalAccuracy()])





INFO:tensorflow:Initializing the TPU system: 10.127.38.154:8470


INFO:tensorflow:Initializing the TPU system: 10.127.38.154:8470


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


In [0]:
callback = tf.keras.callbacks.EarlyStopping(monitor='val_categorical_accuracy', patience=10)
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath="/tmp/weights_ed=768_run_2.{epoch:02d}-{val_loss:.2f}.hdf5", monitor="val_loss", save_best_only=True)

model.fit(train_parsed, 
          validation_data=dev_parsed, 
          epochs=100, 
          steps_per_epoch=STEPS_PER_EPOCH, 
          validation_steps=VALIDATION_STEPS,
          callbacks=[callback])

Train for 87 steps, validate for 11 steps
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100




KeyboardInterrupt: ignored

In [0]:
model.summary()
# !nvidia-smi

Model: "SequenceClassificationAttention"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
PositonalEmbedding (Position multiple                  100608    
_________________________________________________________________
tf_bert_encoder_5 (TFBertEnc multiple                  85054464  
_________________________________________________________________
dense_6 (Dense)              multiple                  590592    
_________________________________________________________________
dense_7 (Dense)              multiple                  3076      
Total params: 85,748,740
Trainable params: 85,748,740
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.load_weights("/tmp/weights3.11-0.78.hdf5")

In [0]:
model.evaluate(test_parsed)

     11/Unknown - 5s 430ms/step - loss: 0.7710 - categorical_accuracy: 0.7058

[0.770994554866444, 0.7057884]

### Model accuracy

In [0]:
from sklearn.metrics import classification_report
import numpy as np

# y_pred
y_pred = model.predict(test_parsed)
y_pred = np.argmax(y_pred, axis=1)
y_pred

# y_true
y_true = []  
for __, labels in test_parsed:
  y_true += list(np.argmax(labels.numpy(), axis=1))

In [0]:
print(classification_report(y_true=y_true, y_pred=y_pred, digits=3))

              precision    recall  f1-score   support

           0      0.538     0.415     0.468       668
           1      0.695     0.801     0.744      2087
           2      0.743     0.704     0.723      1489
           3      0.752     0.705     0.728      1388

    accuracy                          0.706      5632
   macro avg      0.682     0.656     0.666      5632
weighted avg      0.703     0.706     0.702      5632

