**Let's Figure Out how we can embed the text and floating point parameters**

Embedding is the method in which a continue value or a discrete variable can be represented in a continuous vector. Embedding is a highly utilized method in machine translation and entity embedding for categorical variables. 

An embedding is a mapping of a discrete — categorical — variable to a vector of continuous numbers. In the context of neural networks, embeddings are low-dimensional, learned continuous vector representations of discrete variables.

The primary reasons for embedding content is to the nearest neighbors within a given embedding space, as an input for a supervised task, or visualization of categories. 

In a nutshell, NN embeddings can easily take all 37,000 book articles on Wikipedia and represent each one using only 50 numbers in a vector.

**MLP AutoEncoder**

An autoencoder is not used for supervised learning. We will no longer try to predict something about our input. Instead, an autoencoder is considered a generative model: it learns a distributed representation of our training data, and can even be used to generate new instances of the training data.

An autoencoder model contains two components:

An encoder that takes an image as input, and outputs a high-dimensional embedding (representation) of the flaot.
A decoder that takes the high-dimensional embedding, and reconstructs the float.

In [15]:
import torch 
import torch.nn as nn
from torch.nn import Embedding
import os.path

pathAE = "/Users/omoruyiatekha/Documents/GitHub/Referential Language for CAD/NLP_NERF_CAD/autoencoder.pth"

class MLPAutoE(nn.Module):
  # Initialize Neural Network
  def __init__(self):
    super().__init__()

    # Encoder 
    self.encoder = nn.Sequential(
      nn.Linear(1, 5),
      nn.ReLU(),
      nn.Linear(5, 10),
      
    )

    # Decoder
    self.decoder = nn.Sequential(
      nn.Linear(10, 5),
      nn.ReLU(),
      nn.Linear(5, 1),
      
    )
  
  def forward(self, x):
    encoded = self.encoder(x) 
    decoded = self.decoder(encoded)

    return decoded

    # Note the last layer is range [0, infinity], so we will apply a relu function

# Load the Model
autoencoder = MLPAutoE()
model = autoencoder 
model.load_state_dict(torch.load(pathAE))


<All keys matched successfully>

**Creating the Token Embedder**

Then what I had in mind was:
<CLS> ==> [tokenizer embedding module] ==> embedding in R^d


In [16]:
import torch 
import torch.nn as nn
from torch.nn import Embedding
pathE = "/Users/omoruyiatekha/Documents/GitHub/Referential Language for CAD/NLP_NERF_CAD/embedder.pth"


""" The Embedding takes in the tokenized word and returns a R^d vector where d = 10. Our Language has 10 different words. within the dictionary
    Although we are only using 7 different words in our language to begin with. 
    
    Instead of generating random hidden layers, we can use the given weights to generate constant embeddings for our various words. This ensures, 
    that the embeddings are consistent and do not change. 

    The Code above is the same as the code below. The only difference is that the weights are given to the embedding layer. Furthermore,
    the number_embeddings = 10, d is the length of the emvedding vector. d = 10.
"""

" Create Embedding and set weights as a constant. "
weight = torch.FloatTensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000],
        [ 2.0938,  0.6383,  0.4265, -0.1057, -0.9903,  0.0483,  2.0725,  2.7587,
          1.9364,  0.5201],
        [-0.7956, -0.1561, -0.6276,  2.1033, -2.6051,  0.1143,  1.1203, -0.5265,
          0.2124,  0.3033],
        [ 0.2314, -1.1640, -0.5984,  0.5727, -1.4985,  0.0431, -0.2939,  1.4067,
          0.9089, -0.5380],
        [ 0.2106,  0.5994,  2.9547,  1.3907, -0.8808, -0.3567,  0.0752, -0.1998,
         -1.2545, -1.0933],
        [-0.6807,  0.1290, -0.8570, -0.0191,  1.2586, -0.6069,  0.5996,  0.7852,
         -0.1459, -0.0803],
        [ 1.0895, -0.9596,  0.0159,  0.3027,  1.2430,  0.7185,  0.1470,  0.2592,
          1.7928, -1.2126],
        [ 1.2758, -0.3504, -0.2135, -1.0314,  0.1238, -2.0741,  1.2185, -0.3169,
         -0.4115,  0.5401],
        [-1.7623, -0.3535,  0.2283, -0.3083, -1.4517, -1.0330, -0.4992, -1.3504,
          0.0828, -0.4922],
        [-0.7633, -0.5626, -0.1650,  3.0397, -1.5256, -0.8837,  1.1996, -1.1857,
         -1.1322, -0.6586]])


embedding = nn.Embedding.from_pretrained(weight)
input = torch.LongTensor([7])
embedding(input)

torch.save(embedding.state_dict(), pathE)

In [17]:
import numpy as np
import pandas as pd
import math
import random


token_0 = "[CLS]"
numberOfExamples = 500
shapes = ["Sphere", "Cylinder", "Cubic"]

#sentence = "[CLS] Shape { 23.2,13,14.2 }"

def createSentece(numberOfExamples):
    shapeList = []

    for i in range(numberOfExamples):
        shape = random.choice(shapes)
        
        sentence = ""

        if shape == "Sphere":
            radius = np.random.uniform(0, 100).__round__(2)
            sentence = "[CLS] " + shape + " { " + str(radius) + " }"
        elif shape == "Cylinder":
            radius = np.random.uniform(0, 100).__round__(2)
            height = np.random.uniform(0, 100).__round__(2)
            sentence =  "[CLS] " + shape + " { " + str(radius) + ", " + str(height) + " }" 
        elif shape == "Cubic":
            length = np.random.uniform(0, 100).__round__(2)
            height = np.random.uniform(0, 100).__round__(2)
            width = np.random.uniform(0, 100).__round__(2)
            sentence =  "[CLS] " + shape + " { " + str(length) + ", " + str(height) + ", " + str(width) + " }"
        else:
            print("Error")

        shapeList.append(sentence)
        #print(sentence)
    return shapeList

generatedShape = createSentece(10)
print(generatedShape)




['[CLS] Sphere { 78.05 }', '[CLS] Cubic { 17.69, 20.54, 56.97 }', '[CLS] Cylinder { 84.57, 98.97 }', '[CLS] Sphere { 63.46 }', '[CLS] Cylinder { 6.98, 2.55 }', '[CLS] Cubic { 37.43, 48.53, 99.02 }', '[CLS] Cylinder { 45.28, 73.16 }', '[CLS] Sphere { 12.7 }', '[CLS] Cubic { 23.12, 68.9, 55.91 }', '[CLS] Cylinder { 53.93, 67.91 }']


**Creating a Tokenizer and LxD Matrix for Transformers**

Tokenization — this preprocessing step means transforming unstructured Natural Language input in something better structured (in computer terms). The main idea is to break the textual input into fragments that contain granular, yet useful data — these are called Tokens. 

Following the tokenization process and the MLP encoding into (1,10) vectors. We have to create a LxD matrix to pass into the nn.transformer.


In [28]:
import torch 
import torch.nn as nn
from torch.nn import Embedding
import numpy as np
from torch.utils.data import TensorDataset, DataLoader


# Load the Models
embeder = nn.Embedding(10, 10)
embeder.load_state_dict(torch.load(pathE))
#embeder(input)

# Load the Model
mlp = autoencoder = MLPAutoE()
mlp.load_state_dict(torch.load(pathAE))
#mlp.encoder(t)

dictionary = """[SEP] [CLS] Cylinder Sphere Cubic [SEP] { } ,"""
sentence = "[CLS] Cubic { 23.2,13,14.2 }"
tokens = dictionary.split()
d = 10


def processText(text):
  text = text.replace('{', '{ ')
  text = text.replace('}', '} ')
  updatedText = text.replace(',', ' ').split(' ')
  updatedText = ' '.join(updatedText).split()
  return updatedText

class token:
  def __init__(self, tokens):
    self.tokens = tokens.split()
    self.embedding = nn.Embedding(50, d) 
    
  def encode(self, sentence):

    split = processText(sentence)
    encoded = []
    
    for word in split:
      if word in self.tokens:
        encoded.append(self.tokens.index(word))
      else:
        encoded.append(word)

    return encoded

  def decode(self, encoded):
    decoded = []
    #print(encoded)
    for i in encoded:
      
      if type(i) == str:
        decoded.append(float(i))
      else:
        decoded.append(self.tokens[i])
    return decoded


  def tensorEncoded(self, encoded):
    newList = []
    tensorList = []
    for i in encoded:
      newList.append(float(i))

    for i in newList:
      tensorList.append(torch.tensor([i]))

    return tensorList

  def tensorList(self, encoded):
    array = []
    for i in encoded:
      array.append(np.array(i)[0])

    return(array)

  def encodeD(self, sentence):
    encoded = self.encode(sentence)
    tensor = self.tensorEncoded(encoded)
    D = torch.tensor([]) 

    for i in range(len(tensor)):
      
      if (i < 3) or (i == len(tensor) - 1):
        temp = embeder(tensor[i].to(torch.int32))
        #print(temp)
        D = torch.cat((D, temp), 0)
        #temp = embeder(tensor[i])
        #temp = tensor[i].view(1, 10) 
      else:
        temp = (mlp.encoder(tensor[i])).view(1, 10)
        #print(temp)
        D = torch.cat((D, temp), 0)
        #print(tensor[i])
        #tensor[i] = tensor[i]
    if len(tensor) < 7:
      for i in range(7 - len(tensor)):
        temp = torch.zeros(1, 10)
        D = torch.cat((D, temp), 0)
      
    
    return D

def createDataSet(text):
  dataSet = []
  for i in text:
    dataSet.append(tokens.encodeD(i))
  dataSet = torch.stack(dataSet)
  return dataSet

tokens = token(dictionary)
encoded = tokens.encode(sentence)
decoded = tokens.decode(encoded)
tokens = token(dictionary)
DataSentece = createSentece(100)




DataSentece = createDataSet(DataSentece)
train_data = DataSentece[:80] 
val_data = DataSentece[80:90] 
test_data = DataSentece[:90] 

train_data.shape



torch.Size([80, 7, 10])

**Positional Encoding**

In [5]:
import math
from typing import Tuple

import torch
from torch import nn, Tensor
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class PositionalEncoding(nn.Module):

    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x: Tensor) -> Tensor:
        """
        Args:
            x: Tensor, shape [seq_len, batch_size, embedding_dim]
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)


**Create the Custom Transormer**

Unlike a typical transformer, we ourselves, decode end encode the tokens of the sentence, allowing us to encode both words and floating points.

In [61]:
import math
from typing import Tuple

import torch
from torch import nn, Tensor
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset
bptt = 1

class TransformerModel(nn.Module):

    def __init__(self, ntoken: int, d_model: int, nhead: int, d_hid: int,
                 nlayers: int, dropout: float = 0.5):
        super().__init__()
        self.model_type = 'Transformer'
        self.pos_encoder = PositionalEncoding(d_model, dropout)
        encoder_layers = TransformerEncoderLayer(d_model, nhead, d_hid, dropout)
        self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers)
        self.d_model = d_model
        self.decoder = nn.Linear(d_model, ntoken)

        self.init_weights()

    def init_weights(self) -> None:
        initrange = 0.1
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)
    
    def encoder(self, src: Tensor, src_mask: Tensor) -> Tensor:

        """
        Args:
            src: Tensor, shape [seq_len, batch_size]
            src_mask: Tensor, shape [seq_len, seq_len]

        Returns:
            output is encoded tensor 
        """

        src =  src * math.sqrt(self.d_model) #self.encoder(src) *
        src = self.pos_encoder(src)
        output = self.transformer_encoder(src, src_mask)
        #print(output[0])
        return output

    def forward(self, src: Tensor, src_mask: Tensor) -> Tensor:
        """
        Args:
            src: Tensor, shape [seq_len, batch_size]
            src_mask: Tensor, shape [seq_len, seq_len]

        Returns:
            output Tensor of shape [seq_len, batch_size, ntoken]
        """

        output = self.encoder(src, src_mask)
        output = self.decoder(output)
        return output

def generate_square_subsequent_mask(sz: int) -> Tensor:
    """Generates an upper-triangular matrix of -inf, with zeros on diag."""
    return torch.triu(torch.ones(sz, sz) * float('-inf'), diagonal=1)

def get_batch(source: Tensor, i: int) -> Tuple[Tensor, Tensor]:
    """
    Args:
        source: Tensor, shape [full_seq_len, batch_size]
        i: int

    Returns:
        tuple (data, target), where data has shape [seq_len, batch_size] and
        target has shape [seq_len * batch_size]
    """
    seq_len = min(bptt, len(source) - 1 - i)
    #print(seq_len)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len]
    return data, target


**Train the Model**

During this step we have to train the model to ensure our decoder can effectively account for the position of each word.

In [7]:
import copy
import time

criterion = nn.CrossEntropyLoss()
lr = 5.0  # learning rate
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.95)
pathAE = "/Users/omoruyiatekha/Documents/GitHub/Referential Language for CAD/NLP_NERF_CAD/transformer.pth"
ntokens = 10 

def train(model: nn.Module) -> None:
    model.train()  # turn on train mode
    total_loss = 0.
    log_interval = 200
    start_time = time.time()
    src_mask = generate_square_subsequent_mask(bptt).to(device)

    num_batches = len(train_data) // bptt
    for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
        data, targets = get_batch(train_data, i)
        seq_len = data.size(0)
        if seq_len != bptt:  # only on last batch
            src_mask = src_mask[:seq_len, :seq_len]
        output = model(data, src_mask)
        loss = criterion(output.view(-1, ntokens), targets)

        optimizer.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
        optimizer.step()

        total_loss += loss.item()
        if batch % log_interval == 0 and batch > 0:
            lr = scheduler.get_last_lr()[0]
            ms_per_batch = (time.time() - start_time) * 1000 / log_interval
            cur_loss = total_loss / log_interval
            ppl = math.exp(cur_loss)
            print(f'| epoch {epoch:3d} | {batch:5d}/{num_batches:5d} batches | '
                  f'lr {lr:02.2f} | ms/batch {ms_per_batch:5.2f} | '
                  f'loss {cur_loss:5.2f} | ppl {ppl:8.2f}')
            total_loss = 0
            start_time = time.time()

def evaluate(model: nn.Module, eval_data: Tensor) -> float:
    model.eval()  # turn on evaluation mode
    total_loss = 0.
    src_mask = generate_square_subsequent_mask(bptt).to(device)
    with torch.no_grad():
        for i in range(0, eval_data.size(0) - 1, bptt):
            data, targets = get_batch(eval_data, i)
            seq_len = data.size(0)
            if seq_len != bptt:
                src_mask = src_mask[:seq_len, :seq_len]
            output = model(data, src_mask)
            output_flat = output.view(-1, ntokens)
            total_loss += seq_len * criterion(output_flat, targets).item()
    return total_loss / (len(eval_data) - 1)

Intiate Sequence, Basically Create the transformermodel...

In [103]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

ntokens = 10  # size of vocabulary
d_model = 10  # embedding dimension
d_hid = 200  # dimension of the feedforward network model in nn.TransformerEncoder
nlayers = 6  # number of nn.TransformerEncoderLayer in nn.TransformerEncoder
nhead = 10  # number of heads in nn.MultiheadAttention
dropout = 0.2  # dropout probability
model = TransformerModel(ntokens, d_model, nhead, d_hid, nlayers, dropout).to(device)

train_data[0]

bptt = 3
#for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
        
        #bptt = 3
        #data, targets = get_batch(train_data[i], i)
        #print("training sentence " , i  , " ")
       # print(train_data[i])
       #print()

#print(data)
#print(targets[0])
#print(train_data[0])
# 
data, targets = get_batch(train_data[0], 3)      
print(data)
print(targets)      
train_data[0]




#output

tensor([[ 2.0628,  1.6931,  1.1046, -1.6519, -1.1936,  2.2579,  4.3355,  0.4922,
         -1.6703, -1.1022],
        [ 0.6443,  0.5314,  0.1297,  0.3806,  0.4825,  0.3323,  0.6154, -0.4884,
         -0.2348, -0.3301],
        [ 1.8987,  1.5587,  0.9918, -1.4167, -0.9997,  2.0351,  3.9051,  0.3788,
         -1.5042, -1.0129]], grad_fn=<SliceBackward0>)
tensor([[ 0.6443,  0.5314,  0.1297,  0.3806,  0.4825,  0.3323,  0.6154, -0.4884,
         -0.2348, -0.3301],
        [ 1.8987,  1.5587,  0.9918, -1.4167, -0.9997,  2.0351,  3.9051,  0.3788,
         -1.5042, -1.0129],
        [ 1.2758, -0.3504, -0.2135, -1.0314,  0.1238, -2.0741,  1.2185, -0.3169,
         -0.4115,  0.5401]], grad_fn=<SliceBackward0>)


tensor([[ 2.0938,  0.6383,  0.4265, -0.1057, -0.9903,  0.0483,  2.0725,  2.7587,
          1.9364,  0.5201],
        [ 0.2106,  0.5994,  2.9547,  1.3907, -0.8808, -0.3567,  0.0752, -0.1998,
         -1.2545, -1.0933],
        [ 1.0895, -0.9596,  0.0159,  0.3027,  1.2430,  0.7185,  0.1470,  0.2592,
          1.7928, -1.2126],
        [ 2.0628,  1.6931,  1.1046, -1.6519, -1.1936,  2.2579,  4.3355,  0.4922,
         -1.6703, -1.1022],
        [ 0.6443,  0.5314,  0.1297,  0.3806,  0.4825,  0.3323,  0.6154, -0.4884,
         -0.2348, -0.3301],
        [ 1.8987,  1.5587,  0.9918, -1.4167, -0.9997,  2.0351,  3.9051,  0.3788,
         -1.5042, -1.0129],
        [ 1.2758, -0.3504, -0.2135, -1.0314,  0.1238, -2.0741,  1.2185, -0.3169,
         -0.4115,  0.5401]], grad_fn=<SelectBackward0>)

**Training the Embedding Models**

Below is the code that allows us to embed our sentence and floating parameters to a higer dimensional space

In [None]:
# Let's Train the AutoEncoder on a small generated set of data that should not be too terrible.
def train(model, num_epochs):

  criterion = nn.MSELoss()
  autoencoder = model
  optimizer = torch.optim.Adam(autoencoder.parameters(), lr=1e-3, weight_decay=1e-3)
  outputs = []
  data_set = torch.tensor([[1.65], [2], [2.3], [54], [1.3], [43.2], [120.324], [32.123]])

  for epoch in range(num_epochs):
    for p_i in data_set:
      p_pred = autoencoder(p_i)
      loss = criterion(p_i, p_pred)

      
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()
      #print("p_i: ", p_i, " predicted: ", p_pred)
      if epoch % 10 == 0:
        print(f'Epoch:{epoch+1}, Loss:{loss.item():4f}')

    outputs.append((epoch, p_i, p_pred))
  
  return outputs

train(autoencoder, 500)
torch.save(autoencoder.state_dict(), path)


Epoch:1, Loss:0.000001
Epoch:1, Loss:0.001319
Epoch:1, Loss:0.000083
Epoch:1, Loss:0.118781
Epoch:1, Loss:0.000000
Epoch:1, Loss:0.145028
Epoch:1, Loss:0.735905
Epoch:1, Loss:0.003593
Epoch:11, Loss:0.000260
Epoch:11, Loss:0.000247
Epoch:11, Loss:0.000232
Epoch:11, Loss:0.005157
Epoch:11, Loss:0.000297
Epoch:11, Loss:0.002096
Epoch:11, Loss:0.001132
Epoch:11, Loss:0.000203
Epoch:21, Loss:0.000553
Epoch:21, Loss:0.000088
Epoch:21, Loss:0.000050
Epoch:21, Loss:0.759409
Epoch:21, Loss:0.000052
Epoch:21, Loss:0.876105
Epoch:21, Loss:2.259578
Epoch:21, Loss:0.357284
Epoch:31, Loss:0.000229
Epoch:31, Loss:0.000202
Epoch:31, Loss:0.000176
Epoch:31, Loss:0.000426
Epoch:31, Loss:0.000159
Epoch:31, Loss:0.000199
Epoch:31, Loss:0.000179
Epoch:31, Loss:0.000021
Epoch:41, Loss:0.000000
Epoch:41, Loss:0.000001
Epoch:41, Loss:0.000004
Epoch:41, Loss:0.001920
Epoch:41, Loss:0.000003
Epoch:41, Loss:0.001667
Epoch:41, Loss:0.003011
Epoch:41, Loss:0.000443
Epoch:51, Loss:0.000879
Epoch:51, Loss:0.000055


NameError: name 'path' is not defined

In [None]:
# Check the model
t = torch.tensor([[1], [2], [2.3], [54], [1.3], [43.2], [120.324], [32.123]])
P_t = autoencoder.encoder(t)
autoencoder.decoder(P_t)

# Load the Model
model = autoencoder = MLPAutoE()
model.load_state_dict(torch.load(path))
model.encoder(t)

NameError: name 'path' is not defined

**MLP Encoder**

Now if you want to share the same MLP encoder for all numerical inputs (regardless of whether they are radius or height measurements,  etc), then your MLP encoder would take in R^1 and output R^d. These are floating point values, for both input and output.

In [None]:
import torch 
import torch.nn as nn
from torch.nn import Embedding

# MLP to take in a 1D float and encode it into a higher dimensional space.
# Input:  Tensor of R^1  (Shape: (1, ))
# Output: Tensor of R^10 (Shape: (1, 10))

# Create Encoder and Decoder Model with some random hidden weights and layers

inputsize = (1,)
outputsize = (1,10)
t = torch.tensor([[1.5]])

class MLPEncoder(nn.Module):
  # Initialize Neural Network
  def __init__(self):
    super(MLPEncoder, self).__init__()
    self.l1 = nn.Linear(1, 5)
    self.relu = nn.ReLU()
    self.l2 = nn.Linear(5, 10)

  def forward(self, x):
    output = self.l1(x) 
    output = self.relu(output)
    output = self.l2(output)
    return output

net = MLPEncoder()

outputs = net(t)
print(outputs)
outputs = net(t)
print(outputs)

  

tensor([[-0.3786,  0.3023,  0.3095,  0.0118,  0.4308,  0.1019,  0.0553,  0.1434,
          0.1401, -0.3750]], grad_fn=<AddmmBackward0>)
tensor([[-0.3786,  0.3023,  0.3095,  0.0118,  0.4308,  0.1019,  0.0553,  0.1434,
          0.1401, -0.3750]], grad_fn=<AddmmBackward0>)
