<a id='top'></a><a name='top'></a>
# Chapter 15: Implementing Scaled Dot-Product Attention in Keras

* [Introduction](#introduction)
* [14.0 Imports and Setup](#14.0)
* [14.1 What is Positional Encoding](#14.1)
* [Extra](#extra)

---
<a name='introduction'></a><a id='introduction'></a>
# Introduction
<a href="#top">[back to top]</a>

### Dataset

* Fibonacci sequence


### Explore
* The operations forming the scaled dot-product attention mechanism
* How to implement the scaled dot-product attention mechanism from scratch

---
<a name='15.0'></a><a id='15.0'></a>
# 15.0 Imports and Setup
<a href="#top">[back to top]</a>

In [1]:
req_file = "requirements_15.txt"

In [3]:
%%writefile {req_file}
isort
scikit-learn-intelex
watermark

Overwriting requirements_15.txt


In [4]:
import sys
IS_COLAB = 'google.colab' in sys.modules

if IS_COLAB:
    print("Installing packages")
    !pip install --upgrade --quiet -r {req_file}
else:
    print("Running locally.")

# Need to import before sklearn
from sklearnex import patch_sklearn
patch_sklearn()

Running locally.


Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [29]:
%%writefile imports.py
import locale
import math
import pprint
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.activations import softmax
from tensorflow.keras.layers import Layer
from tqdm.auto import tqdm
from watermark import watermark

Overwriting imports.py


In [30]:
!isort imports.py --sl
!cat imports.py

import locale
import math
import pprint

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.activations import softmax
from tensorflow.keras.layers import Layer
from tqdm.auto import tqdm
from watermark import watermark


In [32]:
import locale
import math
import pprint
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.activations import softmax
from tensorflow.keras.layers import Layer
from tqdm.auto import tqdm
from watermark import watermark

In [33]:
def HR():
    print("-"*40)
    
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"

locale.getpreferredencoding = getpreferredencoding
warnings.filterwarnings('default')
BASE_DIR = '.'
sns.set_style("darkgrid")
tqdm.pandas(desc="progress-bar")
pp = pprint.PrettyPrinter(indent=4)

seed = 42

print(watermark(iversions=True,globals_=globals(),python=True,machine=True))

Python implementation: CPython
Python version       : 3.8.12
IPython version      : 7.34.0

Compiler    : Clang 13.0.0 (clang-1300.0.29.3)
OS          : Darwin
Release     : 21.6.0
Machine     : x86_64
Processor   : i386
CPU cores   : 4
Architecture: 64bit

numpy     : 1.23.5
tensorflow: 2.9.3
pandas    : 1.5.3
seaborn   : 0.12.1
matplotlib: 3.6.2



---
<a name='15.1'></a><a id='15.1'></a>
# 15.1 Recap of the Transformer Architecture
<a href="#top">[back to top]</a>

---
<a name='15.2'></a><a id='15.2'></a>
# 15.2 Implementing the Scaled Dot-Product Attention from Scratch
<a href="#top">[back to top]</a>

In [34]:
class DotProductAttention(Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
    def call(self, queries, keys, values, d_k, mask=None):
        
        # Score the queries against the keys after transposing the latter, and scaling.
        scores = tf.matmul(queries, keys, transpose_b=True) / tf.math.sqrt(tf.cast(d_k, tf.float32))
        
        # Apply mask to the attention scores
        if mask is not None:
            scores += -1e9 * mask
            
        # Compute the weights by a softmax operation
        weights = softmax(scores)
        
        # Compute the attention by a weighted sum of the values
        return tf.matmul(weights, values)

---
<a name='15.3'></a><a id='15.3'></a>
# 15.3 Testing out the Code
<a href="#top">[back to top]</a>

In [35]:
input_seq_length = 5 # Max length of the input sequence
d_k = 64 # Dimensionality of linearly projected queries and keys
d_v = 64 # Dimensionality of linearly projected values
batch_size = 64 # Batch size from the training process

In [36]:
tf.random.set_seed(seed)

queries = np.random.random((batch_size, input_seq_length, d_k))
keys = np.random.random((batch_size, input_seq_length, d_k))
values = np.random.random((batch_size, input_seq_length, d_v))

In [37]:
attention = DotProductAttention()
print(attention)
HR()

print(attention(queries, keys, values, d_k))

<__main__.DotProductAttention object at 0x1399a5970>
----------------------------------------
tf.Tensor(
[[[0.58271706 0.3669912  0.7677716  ... 0.46982276 0.5814083  0.4772146 ]
  [0.5688315  0.37259603 0.767018   ... 0.465158   0.5966402  0.4810904 ]
  [0.57154316 0.37507954 0.761297   ... 0.46625522 0.60105205 0.5013971 ]
  [0.5796002  0.35143456 0.7531471  ... 0.46386203 0.5852657  0.46648335]
  [0.57480466 0.37260708 0.767243   ... 0.46816579 0.59327304 0.49850357]]

 [[0.46294922 0.6476046  0.45672026 ... 0.5590128  0.6861026  0.49989527]
  [0.48486906 0.6461595  0.44293126 ... 0.5345297  0.6770745  0.49627537]
  [0.45582354 0.64962494 0.46603113 ... 0.5639738  0.6960226  0.49553573]
  [0.48267892 0.65860003 0.4400514  ... 0.53288347 0.68262976 0.48564136]
  [0.4864344  0.6538298  0.44277206 ... 0.5305253  0.6840395  0.4874709 ]]

 [[0.50777805 0.6476767  0.3940273  ... 0.61106133 0.5313678  0.5628838 ]
  [0.5125144  0.65115774 0.3987783  ... 0.6037308  0.5237102  0.5625333 ]
  [

In [38]:
attention(queries, keys, values, d_k).shape

TensorShape([64, 5, 64])