<a href="https://colab.research.google.com/github/jubacochran/NLP/blob/main/Amazon_reviews_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Introduction

In this project, I developed a Neural Language Processing (NLP) model using the Keras deep learning library in Python. My primary objectives were:

To practice an end-to-end NLP pipeline, including data preprocessing, tokenization, and training deep learning models.
To experiment with different activation functions (Sigmoid, Tanh, and ReLU) and evaluate their impact on the model’s performance.
To apply a chosen optimization technique (e.g., Adam or SGD) and discuss its influence on training convergence.
To analyze the results, visualize performance metrics, and gain insights into the best configuration.
2. Dataset

Dataset Source: Amazon Reviews dataset from Kaggle

I selected the Amazon Reviews dataset, which contains product reviews along with their sentiment polarity (positive or negative). The dataset is split into training and test sets and includes a textual field (text) and a polarity field.

3. Data Preprocessing

Loading Data:
I downloaded the dataset from Kaggle and loaded it into Pandas dataframes. Both the training and testing data included columns polarity, title, and text.

Cleaning and Formatting: I dropped the title field, focusing solely on text for sentiment classification. Polarity values were initially {1 (negative), 2 (positive)}, which I adjusted to {0 (negative), 1 (positive)} for a binary classification setup.

Sampling: Due to the large dataset size, I sampled a small fraction (5%) of the training data to speed up experimentation while maintaining class balance.

Splitting Data: I split the sampled training data into training, validation, and test sets using train_test_split. I ensured stratification so that the class distribution remained consistent.

Tokenization & Vocabulary: I built a vocabulary from the top frequent words in the training set and used WordPiece tokenization to handle out-of-vocabulary tokens. I included special tokens ([PAD], [UNK], [CLS], [SEP], [MASK]) to align with transformer-based approaches.

Padding & Sequences: After determining a maximum sequence length based on the 90th percentile of text length, I padded or truncated all reviews to this fixed length. This ensured uniform input shapes for the model.

4. Model Design

I experimented with two main architectures:

Baseline Convolutional Model:

Architecture:
Embedding Layer: Converts token IDs into dense embeddings.
Conv1D Layer: Captures local n-gram features with a kernel size of 5.
Global Max Pooling: Reduces the sequence dimension by taking the maximum feature map across time steps.
Fully Connected Layer (Dense): A dense layer with 128 units and a chosen activation function.
Dropout: Regularization to prevent overfitting.
Output Layer: A single neuron with Sigmoid activation for binary classification.
This model meets the requirement of having at least three hidden layers (Conv1D, Dense, and another Dense before the output).

Transformer-based Model:

Architecture:
Token & Position Embedding: Combines learned word embeddings and positional embeddings.
Transformer Block: Utilizes Multi-Head Attention and a Feed-Forward Network. This block helps the model attend to various parts of the input sequence.
Global Average Pooling: Summarizes sequence-level features.
Fully Connected Layers: Dense layers followed by dropout.
Output Layer: A single neuron with Sigmoid activation.
I tested various hyperparameters for the transformer block (number of heads, feed-forward dimension) using Weights & Biases (W&B) sweeps.

5. Activation Functions Experimentation

I explored different activation functions in the Dense layers:

Sigmoid:

Commonly used in output layers for binary classification.
In hidden layers, it often leads to vanishing gradients due to saturation.
Observation: Models using Sigmoid in hidden layers trained slower and often got stuck in local minima, resulting in lower validation accuracy.
Tanh:

Similar to Sigmoid but outputs values in [-1, 1]. This can provide a stronger gradient signal.
Tanh outperformed Sigmoid slightly but still sometimes suffered from saturation issues, especially in deeper networks.
ReLU (Rectified Linear Unit):

ReLU typically leads to faster training convergence because it avoids saturation in positive ranges.
In my experiments, ReLU provided the best validation accuracy and the most stable training dynamics.
Findings:
ReLU consistently outperformed Sigmoid and Tanh for the hidden layers in terms of training speed and accuracy. However, for the output layer, Sigmoid remained the appropriate choice because it maps outputs to [0,1], making it well-suited for probability interpretation in binary classification.

6. Optimization Technique

I used the Adam optimizer, a popular variant of gradient-based optimization that adapts the learning rate for each parameter. Adam typically converges faster and more reliably than standard SGD, making it ideal for NLP tasks with large vocabularies.

Influence on Training:
Adam’s adaptive learning rate helped me achieve quick and stable convergence. With SGD, initial experiments required careful learning rate tuning, whereas with Adam, I got stable results without extensive manual adjustments.
7. Training Process and Hyperparameter Sweeps

Training Setup:

I trained for 5 to 50 epochs, depending on the model complexity.
Batch sizes: experimented with 32, 64, 128.
Early Stopping: Used a patience of a few epochs to revert to the best model weights.
W&B Sweeps: I performed hyperparameter sweeps over:

Number of attention heads
Feed-forward network dimension
Learning rate
Batch size
Epochs
Sweeps helped identify configurations that slightly improved validation accuracy. Generally, balancing model complexity with regularization was key to avoiding overfitting.

8. Evaluation

Metrics:
I used accuracy as the primary metric and tracked validation accuracy and loss across epochs. This helped ensure that I was not overfitting and that the models were improving over time.

Baseline Conv Model Performance: The baseline model achieved around ~88-89% validation accuracy after a few epochs of training. Using ReLU in the hidden layers provided more stable improvements.

Transformer-Based Model Performance: The transformer-based model also reached ~88-89% validation accuracy under the constraints of my small training subset. While transformers have the potential to outperform simpler architectures with enough data and tuning, I was limited by the reduced dataset fraction.

Impact of Activation Functions:

Sigmoid in hidden layers: Led to slower convergence and slightly lower accuracy.
Tanh in hidden layers: Better than Sigmoid but still lagged behind ReLU.
ReLU in hidden layers: Provided the best performance and stability.
9. Visualizations

Accuracy and Loss Curves:
Plots of training and validation accuracy vs. epochs and loss vs. epochs showed that models with ReLU converged more smoothly and avoided early plateaus.

Confusion Matrix:
On the test set, the confusion matrix indicated that the model correctly classified most samples. Misclassifications generally occurred with ambiguous reviews.

10. Conclusion and Findings

Key Insights:
Activation Functions: Using ReLU in hidden layers yielded superior performance compared to Sigmoid or Tanh.
Optimizer: Adam proved effective and convenient, requiring minimal manual tuning.
Model Architecture: Both the baseline CNN-based model and the transformer-based model achieved similar results given my small training subset. With more data and compute, the transformer might have shown a more substantial improvement.
Future Work: I could further improve results by training on a larger portion of the data, increasing epochs, or fine-tuning a pre-trained language model like BERT.


In [None]:
pip install --upgrade keras-hub

Collecting keras-hub
  Downloading keras_hub-0.18.1-py3-none-any.whl.metadata (7.0 kB)
Collecting tensorflow-text (from keras-hub)
  Downloading tensorflow_text-2.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting tensorflow<2.19,>=2.18.0 (from tensorflow-text->keras-hub)
  Downloading tensorflow-2.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting tensorboard<2.19,>=2.18 (from tensorflow<2.19,>=2.18.0->tensorflow-text->keras-hub)
  Downloading tensorboard-2.18.0-py3-none-any.whl.metadata (1.6 kB)
Downloading keras_hub-0.18.1-py3-none-any.whl (691 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m691.2/691.2 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tensorflow_text-2.18.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownlo

In [None]:
pip install --upgrade keras-cv

Collecting keras-cv
  Downloading keras_cv-0.9.0-py3-none-any.whl.metadata (12 kB)
Collecting keras-core (from keras-cv)
  Downloading keras_core-0.1.7-py3-none-any.whl.metadata (4.3 kB)
Downloading keras_cv-0.9.0-py3-none-any.whl (650 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m650.7/650.7 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading keras_core-0.1.7-py3-none-any.whl (950 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: keras-core, keras-cv
Successfully installed keras-core-0.1.7 keras-cv-0.9.0


In [None]:
import wandb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from wandb.integration.keras import WandbMetricsLogger,WandbCallback,WandbEvalCallback,WandbModelCheckpoint
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import math
from tensorflow.keras import ops
from keras_hub.tokenizers import WordPieceTokenizer
from collections import Counter
import tensorflow_text as text


In [None]:
wandb.login()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("kritanjalijain/amazon-reviews")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/kritanjalijain/amazon-reviews?dataset_version_number=2...


100%|██████████| 1.29G/1.29G [00:12<00:00, 114MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/kritanjalijain/amazon-reviews/versions/2


In [None]:

training_data = pd.read_csv(path + "/train.csv", header=None, names =['polarity','title','text'])
testing_data = pd.read_csv(path + "/test.csv", header=None, names =['polarity','title','text'])

In [None]:
training_data.drop('title', axis=1, inplace=True)
testing_data.drop('title', axis=1, inplace=True)



In [None]:
train_sample_fraction = 0.1
training_data_sampled = training_data.groupby('polarity', group_keys=False).apply(lambda x: x.sample(frac=train_sample_fraction, random_state=42))

print(f"Sampled Data Shape: {training_data_sampled.shape}")
print("\nSampled Class Distribution:\n", training_data_sampled['polarity'].value_counts())


Sampled Data Shape: (360000, 2)

Sampled Class Distribution:
 polarity
1    180000
2    180000
Name: count, dtype: int64


  training_data_sampled = training_data.groupby('polarity', group_keys=False).apply(lambda x: x.sample(frac=train_sample_fraction, random_state=42))


In [None]:
max_char_length = training_data_sampled['text'].str.len().max()
min_char_length = training_data_sampled['text'].str.len().min()
print(max_char_length)
print(min_char_length)

p90=int(np.percentile(training_data_sampled['text'].str.len(),90))
print(p90)
print(type(p90))
int(p90)


1008
25
769
<class 'int'>


769

In [None]:
X = training_data_sampled.drop('polarity', axis=1)
y = training_data_sampled['polarity']

In [None]:
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, stratify=y_train_val, random_state=42)

In [None]:
print("\nTraining Data Shape:", X_train.shape)
print("\nValidation Data Shape:", X_val.shape)
print("\nTest Data Shape:", X_test.shape)

print(y_train.value_counts())
print(y_val.value_counts())
print(y_test.value_counts())
print(y_train.values)


Training Data Shape: (216000, 1)

Validation Data Shape: (72000, 1)

Test Data Shape: (72000, 1)
polarity
2    108000
1    108000
Name: count, dtype: int64
polarity
1    36000
2    36000
Name: count, dtype: int64
polarity
1    36000
2    36000
Name: count, dtype: int64
[2 2 1 ... 1 1 2]


In [None]:
#I need to fix labels to 0 or 1. Right now they are 2(positive) and 1(negative)

y_train = np.array(y_train) - 1
y_val = np.array(y_val) - 1
y_test = np.array(y_test) - 1

# Verify the label conversion
print("Unique values in y_train:", np.unique(y_train))
print("Unique values in y_val:", np.unique(y_val))
print("Unique values in y_test:", np.unique(y_test))

Unique values in y_train: [0 1]
Unique values in y_val: [0 1]
Unique values in y_test: [0 1]


In [None]:

X_train_text = X_train['text'].to_list()

# Tokenize the words and create a vocabulary
all_text = ' '.join(X_train_text)
all_tokens = all_text.split()

# Count the most frequent tokens
token_counts = Counter(all_tokens)
vocab_list = [word for word, count in token_counts.most_common(30000)]

# Add special tokens
special_tokens = ["[PAD]", "[UNK]", "[CLS]", "[SEP]", "[MASK]"]
vocab_list = special_tokens + vocab_list

print(f"Vocabulary size: {len(vocab_list)}")
print(f"Sample vocabulary: {vocab_list[:10]}")


Vocabulary size: 30005
Sample vocabulary: ['[PAD]', '[UNK]', '[CLS]', '[SEP]', '[MASK]', 'the', 'and', 'I', 'to', 'a']


In [None]:
#vocab_size = 10000
#embedding_dim = int(math.sqrt(vocab_size))
#print(f"Embedding Dimension: {embedding_dim}")
#tokenizer = Tokenizer(num_words=None, oov_token='<OOV>',lower=True)
#tokenizer.fit_on_texts(X_train['text'])

tokenizer = WordPieceTokenizer(
    vocabulary = vocab_list,
    sequence_length = p90,
    lowercase = True,
    oov_token = "[UNK]",
    special_tokens = ["[PAD]", "[UNK]", "[CLS]", "[SEP]", "[MASK]"],
    special_tokens_in_strings = True,
    dtype="int32"
)

In [None]:
X_train_tokens = tokenizer.tokenize(X_train['text'].to_list())
X_val_tokens = tokenizer.tokenize(X_val['text'].to_list())
X_test_tokens = tokenizer.tokenize(X_test['text'].to_list())


print("Sample Tokenized X_train:", X_train_tokens[:3])

Sample Tokenized X_train: tf.Tensor(
[[  5 278  10 ...   0   0   0]
 [ 99  21 142 ...   0   0   0]
 [ 83 343  83 ...   0   0   0]], shape=(3, 769), dtype=int32)


In [None]:
tokenizer.get_vocabulary()
print(type(X_train_tokens))

<class 'tensorflow.python.framework.ops.EagerTensor'>


In [None]:
vocab_size = len(vocab_list)  # This comes from your vocabulary
print(vocab_size)
embedding_dim = int(math.sqrt(vocab_size))  # Size of the embedding vector
print(embedding_dim)
sequence_length = p90

30005
173


In [None]:
model = keras.Sequential([
    layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=sequence_length),  # Embedding layer
    layers.Conv1D(128, 5, activation='relu'),  # 1D convolutional layer
    layers.GlobalMaxPooling1D(),  # Global max pooling
    layers.Dense(128, activation='relu'),  # Fully connected layer
    layers.Dropout(0.5),  # Dropout for regularization
    layers.Dense(1, activation='sigmoid')  # Final binary classification layer (binary classification)
])
# Build the model to define the input shape
model.build(input_shape=(None, sequence_length))



In [None]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


model.summary()

# Initialize a new W&B run
wandb.init(project="transformer-sweep", group="baseline-model", config={"baseline": 2})


# Step 5: Train the model
history = model.fit(
    X_train_tokens,
    y_train,
    epochs=5,
    validation_data=(X_val_tokens, y_val),
    batch_size=64,
    callbacks=[WandbMetricsLogger(),WandbModelCheckpoint("/content/sample_data/models/base_model.keras")]
)
test_results = model.evaluate(X_test_tokens, y_test, verbose=0)
print(f"Test Accuracy: {test_results[1]*100:.2f}%")
wandb.log({"test_accuracy": test_results[1]})
wandb.finish()

[34m[1mwandb[0m: Currently logged in as: [33mjubacochran[0m ([33mjubacochran-booking-com[0m). Use [1m`wandb login --relogin`[0m to force relogin


Epoch 1/5
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 5ms/step - accuracy: 0.8394 - loss: 0.3447 - val_accuracy: 0.9116 - val_loss: 0.2197
Epoch 2/5
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9383 - loss: 0.1677 - val_accuracy: 0.9142 - val_loss: 0.2193
Epoch 3/5
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9657 - loss: 0.0988 - val_accuracy: 0.9093 - val_loss: 0.2654
Epoch 4/5
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9829 - loss: 0.0516 - val_accuracy: 0.9060 - val_loss: 0.3233
Epoch 5/5
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9897 - loss: 0.0299 - val_accuracy: 0.8947 - val_loss: 0.4032
Test Accuracy: 89.69%


0,1
epoch/accuracy,▁▄▆▇█
epoch/epoch,▁▃▅▆█
epoch/learning_rate,▁▁▁▁▁
epoch/loss,█▅▃▂▁
epoch/val_accuracy,▇█▆▅▁
epoch/val_loss,▁▁▃▅█
test_accuracy,▁

0,1
epoch/accuracy,0.98711
epoch/epoch,4.0
epoch/learning_rate,0.001
epoch/loss,0.03734
epoch/val_accuracy,0.89471
epoch/val_loss,0.40318
test_accuracy,0.89688


In [None]:
feedForward_dim = 64
num_heads = 2
maxlen = p90
embedding_dim = 25

print(vocab_size)

30005


In [None]:
#Lets try and improve this obviously poor model. I'll use Transformer blocks and attention heads to help


class TransformerBlock(layers.Layer):
  def __init__(self, embedding_dim, num_heads, feedForward_dim, rate=0.1):
    super().__init__()
    self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embedding_dim)
    self.feedForwardNetwork = keras.Sequential(
        [layers.Dense(feedForward_dim, activation="relu"), layers.Dense(embedding_dim),]
    )
    self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
    self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
    self.dropout1 = layers.Dropout(rate)
    self.dropout2 = layers.Dropout(rate)

  def call(self, inputs):
    attn_output = self.att(inputs, inputs)
    attn_output = self.dropout1(attn_output)
    LmHead = self.layernorm1(inputs + attn_output)
    feedForwardNetwork_output = self.feedForwardNetwork(LmHead)
    feedForwardNetwork_output = self.dropout2(feedForwardNetwork_output)
    return self.layernorm2(LmHead + feedForwardNetwork_output)




In [None]:
#Embbedding layer


class TokenAndPositionEmbedding(layers.Layer):
  def __init__(self, maxlen, vocab_size, embedding_dim):
    super().__init__()
    self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim)
    self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embedding_dim)

  def call(self, x):
    maxlen = ops.shape(x)[-1]
    positions = ops.arange(start=0, stop=maxlen, step=1)
    positions = self.pos_emb(positions)
    x = self.token_emb(x)
    return x + positions

In [None]:
inputs = layers.Input(shape=(p90,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embedding_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embedding_dim, num_heads, feedForward_dim)
x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(15, activation='relu')(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(1, activation='sigmoid')(x)


model_transformer = keras.Model(inputs=inputs, outputs=outputs)
model_transformer.summary()

In [None]:
model_transformer.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy'])
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)


wandb.init(config={"transformer": 2}, project = 'production_model')
history = model_transformer.fit(X_train_tokens,y_train,epochs=50,
                                validation_data=(X_val_tokens,y_val),
                                batch_size=64,
                                callbacks=[early_stopping,WandbMetricsLogger(),WandbModelCheckpoint("/content/sample_data/models/transformer_model_transformer.keras")])

test_results = model_transformer.evaluate(X_test_tokens, y_test, verbose=0)
print(f"Test Accuracy: {test_results[1]*100:.2f}%")
wandb.log({"test_accuracy": test_results[1]})
wandb.finish()

Epoch 1/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 9ms/step - accuracy: 0.4997 - loss: 0.6936 - val_accuracy: 0.5000 - val_loss: 0.6932
Epoch 2/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 8ms/step - accuracy: 0.5305 - loss: 0.6666 - val_accuracy: 0.8674 - val_loss: 0.3153
Epoch 3/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 8ms/step - accuracy: 0.8810 - loss: 0.2935 - val_accuracy: 0.8787 - val_loss: 0.2878
Epoch 4/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 8ms/step - accuracy: 0.8910 - loss: 0.2660 - val_accuracy: 0.8800 - val_loss: 0.2859
Epoch 5/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 8ms/step - accuracy: 0.8962 - loss: 0.2523 - val_accuracy: 0.8780 - val_loss: 0.2873
Epoch 6/50
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 8ms/step - accuracy: 0.8996 - loss: 0.2392 - val_accuracy: 0.8790 - val_loss: 0.3009
Epoch 7/50

0,1
epoch/accuracy,▁▃█████
epoch/epoch,▁▂▃▅▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁
epoch/loss,█▆▂▁▁▁▁
epoch/val_accuracy,▁██████
epoch/val_loss,█▂▁▁▁▁▁
test_accuracy,▁

0,1
epoch/accuracy,0.89922
epoch/epoch,6.0
epoch/learning_rate,0.001
epoch/loss,0.23734
epoch/val_accuracy,0.87749
epoch/val_loss,0.30464
test_accuracy,0.88118


In [None]:
#Setting up W&B Hypertuning
sweep_config = {
    'method': 'random',
    'metric': {'name': 'val_loss', 'goal': 'minimize'},
    'parameters': {
        'num_heads': {'values': [2, 4, 8]},
        'feedForward_dim': {'values': [64, 128, 256]},
        'batch_size': {'values': [32, 64, 128]},
        'epochs': {'values': [10, 20, 30]},
        'learning_rate':{'values': [1e-6, 2e-6, 5e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4]}
    }
}


In [None]:
def hypertuning_model(config=None):
    # Extract the hyperparameters from the W&B config
    num_heads = config['num_heads']
    feedForward_dim = config['feedForward_dim']
    batch_size = config['batch_size']
    epochs = config['epochs']

    inputs = layers.Input(shape=(maxlen,))
    embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embedding_dim)
    x = embedding_layer(inputs)
    transformer_block = TransformerBlock(embedding_dim, num_heads, feedForward_dim)
    x = transformer_block(x)
    x = layers.GlobalAveragePooling1D()(x)
    x = layers.Dropout(0.3)(x)
    x = layers.Dense(20, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(1, activation='sigmoid')(x)

    model_hypertuning = keras.Model(inputs=inputs, outputs=outputs)
    optimizer = keras.optimizers.Adam(learning_rate=config.learning_rate)
    model_hypertuning.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

In [None]:
def sweep_train(config=None):
    with wandb.init(config=config):
        config = wandb.config
        model = hypertuning_model(config)

        history = model.fit(
            X_train_tokens,
            y_train,
            epochs=config.epochs,
            validation_data=(X_val_tokens, y_val),
            batch_size=config.batch_size,
            callbacks=[
                wandb.keras.WandbMetricsLogger(),
                wandb.keras.WandbModelCheckpoint("/content/sample_data/models/transformer_model_hyperparameter.keras")
]
        )

        test_results = model.evaluate(X_test_tokens, y_test, verbose=0)
        print(f"Test Accuracy: {test_results[1]*100:.2f}%")
        wandb.log({"test_accuracy": test_results[1]})
        wandb.finish()

In [None]:
# 1. Initialize the sweep
sweep_id = wandb.sweep(sweep=sweep_config, project="transformer-sweep")

# 2. Run the sweep agent (this triggers multiple hyperparameter runs)
wandb.agent(sweep_id, function=sweep_train, count=10)




Create sweep with ID: ulyniuk5
Sweep URL: https://wandb.ai/jubacochran-booking-com/transformer-sweep/sweeps/ulyniuk5


[34m[1mwandb[0m: Agent Starting Run: 3j3vkfb4 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	feedForward_dim: 256
[34m[1mwandb[0m: 	learning_rate: 0.0001
[34m[1mwandb[0m: 	num_heads: 8


Epoch 1/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9923 - loss: 0.0231 - val_accuracy: 0.9032 - val_loss: 0.4193
Epoch 2/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9937 - loss: 0.0182 - val_accuracy: 0.8969 - val_loss: 0.5310
Epoch 3/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9948 - loss: 0.0148 - val_accuracy: 0.8988 - val_loss: 0.5451
Epoch 4/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9954 - loss: 0.0136 - val_accuracy: 0.9014 - val_loss: 0.5420
Epoch 5/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9959 - loss: 0.0118 - val_accuracy: 0.8971 - val_loss: 0.5539
Epoch 6/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9965 - loss: 0.0100 - val_accuracy: 0.9013 - val_loss: 0.5686
Epoch 7/10

0,1
epoch/accuracy,▁▃▄▅▆▆▇▇██
epoch/epoch,▁▂▃▃▄▅▆▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁
epoch/loss,█▆▅▄▃▂▂▁▁▁
epoch/val_accuracy,█▁▃▆▁▆▆▆▄▆
epoch/val_loss,▁▃▄▄▄▄▅▆█▆
test_accuracy,▁

0,1
epoch/accuracy,0.99715
epoch/epoch,9.0
epoch/learning_rate,0.001
epoch/loss,0.00893
epoch/val_accuracy,0.9016
epoch/val_loss,0.64618
test_accuracy,0.90133


[34m[1mwandb[0m: Agent Starting Run: 237zrz7b with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 30
[34m[1mwandb[0m: 	feedForward_dim: 64
[34m[1mwandb[0m: 	learning_rate: 5e-06
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 4ms/step - accuracy: 0.9951 - loss: 0.0145 - val_accuracy: 0.8985 - val_loss: 0.6509
Epoch 2/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 4ms/step - accuracy: 0.9963 - loss: 0.0107 - val_accuracy: 0.9008 - val_loss: 0.7081
Epoch 3/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 4ms/step - accuracy: 0.9965 - loss: 0.0103 - val_accuracy: 0.8997 - val_loss: 0.6386
Epoch 4/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 4ms/step - accuracy: 0.9969 - loss: 0.0095 - val_accuracy: 0.8956 - val_loss: 0.5926
Epoch 5/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 4ms/step - accuracy: 0.9974 - loss: 0.0078 - val_accuracy: 0.8988 - val_loss: 0.7852
Epoch 6/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 4ms/step - accuracy: 0.9975 - loss: 0.0076 - val_accuracy: 0.9002 - val_loss: 0.6770
Epoch 7/30

0,1
epoch/accuracy,▁▃▃▄▄▅▆▆▆▆▆▇▇▇▇▇▇▇▇███████████
epoch/epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇███
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/loss,█▆▆▅▄▄▃▃▃▃▃▂▂▂▂▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁
epoch/val_accuracy,▇█▇▅▇█▁▆▆▆▇▆▆▅▁▇▃▆▅▄▆▂▆▆▇▂▇▅▅▆
epoch/val_loss,▁▂▁▁▂▂▃▄▃▃▂▄▄▃▅▃▄▄▄▅▄▇▅▇▅▇█▆▇▆
test_accuracy,▁

0,1
epoch/accuracy,0.99891
epoch/epoch,29.0
epoch/learning_rate,0.001
epoch/loss,0.00487
epoch/val_accuracy,0.89686
epoch/val_loss,1.35175
test_accuracy,0.8991


[34m[1mwandb[0m: Agent Starting Run: e6c89lyy with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	feedForward_dim: 128
[34m[1mwandb[0m: 	learning_rate: 1e-05
[34m[1mwandb[0m: 	num_heads: 2


Epoch 1/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0014 - val_accuracy: 0.8946 - val_loss: 1.4268
Epoch 2/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0012 - val_accuracy: 0.8958 - val_loss: 1.5216
Epoch 3/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0019 - val_accuracy: 0.8983 - val_loss: 1.6082
Epoch 4/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0014 - val_accuracy: 0.8955 - val_loss: 1.7578
Epoch 5/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 8.7850e-04 - val_accuracy: 0.8970 - val_loss: 1.6113
Epoch 6/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 0.0013 - val_accuracy: 0.8971 - val_loss: 1.9423
Epoch 

0,1
epoch/accuracy,▁▂▃▃▅▅█▄▄▇
epoch/epoch,▁▂▃▃▄▅▆▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁
epoch/loss,▅▃▆▆▂▄▁▄█▁
epoch/val_accuracy,▂▄█▃▆▆▃▁▅▃
epoch/val_loss,▁▂▃▅▃▇▇▆▃█
test_accuracy,▁

0,1
epoch/accuracy,0.99962
epoch/epoch,9.0
epoch/learning_rate,0.001
epoch/loss,0.00145
epoch/val_accuracy,0.89525
epoch/val_loss,1.99138
test_accuracy,0.89831


[34m[1mwandb[0m: Agent Starting Run: ou90zpb6 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 30
[34m[1mwandb[0m: 	feedForward_dim: 256
[34m[1mwandb[0m: 	learning_rate: 2e-06
[34m[1mwandb[0m: 	num_heads: 2


Epoch 1/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0015 - val_accuracy: 0.8958 - val_loss: 2.0128
Epoch 2/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0017 - val_accuracy: 0.8957 - val_loss: 2.3812
Epoch 3/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 0.0014 - val_accuracy: 0.8944 - val_loss: 2.3050
Epoch 4/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9995 - loss: 0.0016 - val_accuracy: 0.8960 - val_loss: 2.1603
Epoch 5/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0011 - val_accuracy: 0.8965 - val_loss: 2.0952
Epoch 6/30
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 0.0015 - val_accuracy: 0.8966 - val_loss: 2.1026
Epoch 7/30

0,1
epoch/accuracy,▃▇▇▅▆▆▆▇█▅▁▇▅▇█▆▃▆▆▇▅▇▇▇▅▆▆▇▅█
epoch/epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇███
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/loss,▆▂▃▃▃▄█▄▂▃█▁▅▂▁▃▇▇▂▁▃▄▄▄▄▂▅▂▇▃
epoch/val_accuracy,▆▅▄▆▆▇▇▆▆▆█▅▆▆▂█▆▅▁▃▆▃▅▄▅▅▂▂▄▆
epoch/val_loss,▃▄▄▃▃▃▃▁▃▅▂▄▂▆▆▅▂▃▄▅▇▃▆▃▄▅█▇▅█
test_accuracy,▁

0,1
epoch/accuracy,0.99963
epoch/epoch,29.0
epoch/learning_rate,0.001
epoch/loss,0.00174
epoch/val_accuracy,0.89654
epoch/val_loss,3.08078
test_accuracy,0.89875


[34m[1mwandb[0m: Agent Starting Run: 9r0s4sp0 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	feedForward_dim: 64
[34m[1mwandb[0m: 	learning_rate: 1e-05
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9991 - loss: 0.0040 - val_accuracy: 0.8926 - val_loss: 2.6517
Epoch 2/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9993 - loss: 0.0025 - val_accuracy: 0.8940 - val_loss: 2.3736
Epoch 3/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0023 - val_accuracy: 0.8955 - val_loss: 2.6897
Epoch 4/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0027 - val_accuracy: 0.8934 - val_loss: 3.0390
Epoch 5/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9993 - loss: 0.0034 - val_accuracy: 0.8939 - val_loss: 2.3609
Epoch 6/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0024 - val_accuracy: 0.8954 - val_loss: 2.6185
Epoch 7/10

0,1
epoch/accuracy,▂▃▃▄▃▄▁█▃▄
epoch/epoch,▁▂▃▃▄▅▆▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁
epoch/loss,▅▂▃▅█▃▅▁█▂
epoch/val_accuracy,▁▄▇▃▄▇█▃▅▃
epoch/val_loss,▅▃▅█▃▅▄▄▁▆
test_accuracy,▁

0,1
epoch/accuracy,0.99926
epoch/epoch,9.0
epoch/learning_rate,0.001
epoch/loss,0.00297
epoch/val_accuracy,0.89356
epoch/val_loss,2.76251
test_accuracy,0.89775


[34m[1mwandb[0m: Agent Starting Run: ir4nddyu with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	feedForward_dim: 64
[34m[1mwandb[0m: 	learning_rate: 5e-06
[34m[1mwandb[0m: 	num_heads: 8


Epoch 1/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9991 - loss: 0.0041 - val_accuracy: 0.8939 - val_loss: 2.4343
Epoch 2/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0020 - val_accuracy: 0.8888 - val_loss: 3.0222
Epoch 3/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9993 - loss: 0.0033 - val_accuracy: 0.8953 - val_loss: 2.5958
Epoch 4/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0021 - val_accuracy: 0.8947 - val_loss: 3.2179
Epoch 5/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9993 - loss: 0.0029 - val_accuracy: 0.8941 - val_loss: 2.6436
Epoch 6/10
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0023 - val_accuracy: 0.8939 - val_loss: 2.5748
Epoch 7/10

0,1
epoch/accuracy,▁▆▆▇▆▅▇█▆▅
epoch/epoch,▁▂▃▃▄▅▆▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁
epoch/loss,█▁▄▁▂▄▁▆▄▇
epoch/val_accuracy,▆▃▇▆▆▆▆█▁█
epoch/val_loss,▁▅▂▇▃▂▄▆█▇
test_accuracy,▁

0,1
epoch/accuracy,0.99924
epoch/epoch,9.0
epoch/learning_rate,0.001
epoch/loss,0.00383
epoch/val_accuracy,0.89647
epoch/val_loss,3.27149
test_accuracy,0.89796


[34m[1mwandb[0m: Agent Starting Run: aa0qm81q with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	feedForward_dim: 64
[34m[1mwandb[0m: 	learning_rate: 1e-05
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0033 - val_accuracy: 0.8962 - val_loss: 3.1897
Epoch 2/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0027 - val_accuracy: 0.8951 - val_loss: 2.2973
Epoch 3/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0032 - val_accuracy: 0.8959 - val_loss: 2.5833
Epoch 4/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0027 - val_accuracy: 0.8969 - val_loss: 4.0616
Epoch 5/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0039 - val_accuracy: 0.8959 - val_loss: 3.2356
Epoch 6/20
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9993 - loss: 0.0030 - val_accuracy: 0.8955 - val_loss: 3.1606
Epoch 7/20

0,1
epoch/accuracy,▆▃▅▆▃▁▅▆▂▆▄▆▂▆▇▄██▆▂
epoch/epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/loss,▇▃▆▂▆▅▁▃█▂▅▅▃▅▃▄▁▂▄▇
epoch/val_accuracy,▇▆▇█▇▇▇▆▅▂▇▆▆▄▁█▇▅▅▆
epoch/val_loss,▃▁▂▅▃▃▆█▃▅▂▂▅▃▄▄▄▆▆▃
test_accuracy,▁

0,1
epoch/accuracy,0.99924
epoch/epoch,19.0
epoch/learning_rate,0.001
epoch/loss,0.00408
epoch/val_accuracy,0.89478
epoch/val_loss,3.31987
test_accuracy,0.89717


[34m[1mwandb[0m: Agent Starting Run: p6wmo06u with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	feedForward_dim: 128
[34m[1mwandb[0m: 	learning_rate: 5e-06
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 8ms/step - accuracy: 0.9997 - loss: 0.0015 - val_accuracy: 0.8954 - val_loss: 3.0976
Epoch 2/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 1.0000 - loss: 8.9541e-05 - val_accuracy: 0.8959 - val_loss: 3.9153
Epoch 3/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 1.0000 - loss: 1.4837e-05 - val_accuracy: 0.8963 - val_loss: 4.4543
Epoch 4/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.9999 - loss: 7.5489e-04 - val_accuracy: 0.8947 - val_loss: 5.9717
Epoch 5/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.9998 - loss: 0.0013 - val_accuracy: 0.8952 - val_loss: 4.5171
Epoch 6/20
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.9998 - loss: 6.9647e-04 - val_accuracy: 0.8962 - val_loss: 4

0,1
epoch/accuracy,▁██▃▄▅▆▂▅▅▄▅▇▅▃▅▅▄▃▆
epoch/epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/loss,▆▁▁█▅▃▃▆▃▅▅▅▂▄▅▃▄▄▄▃
epoch/val_accuracy,▅▆▇▃▄▇▅▆▂▅▄▁▂▄▄▇▇█▅█
epoch/val_loss,▁▃▄█▄▅█▅█▅▄▁▇▅▃▆▅█▆█
test_accuracy,▁

0,1
epoch/accuracy,0.99989
epoch/epoch,19.0
epoch/learning_rate,0.001
epoch/loss,0.00063
epoch/val_accuracy,0.89649
epoch/val_loss,5.82353
test_accuracy,0.89776


[34m[1mwandb[0m: Agent Starting Run: x8kh8wg4 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 30
[34m[1mwandb[0m: 	feedForward_dim: 128
[34m[1mwandb[0m: 	learning_rate: 2e-05
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0046 - val_accuracy: 0.8968 - val_loss: 6.0950
Epoch 2/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0027 - val_accuracy: 0.8972 - val_loss: 3.5778
Epoch 3/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0026 - val_accuracy: 0.8932 - val_loss: 4.7904
Epoch 4/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0036 - val_accuracy: 0.8931 - val_loss: 4.0148
Epoch 5/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9994 - loss: 0.0033 - val_accuracy: 0.8919 - val_loss: 5.2537
Epoch 6/30
[1m6750/6750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0032 - val_accuracy: 0.8960 - val_loss: 3.9741
Epoch 7/30

0,1
epoch/accuracy,▃▅▄▆▁▃▃▃▅▁▅▅▆▂▄▆▅▆▃▄▁▁▄▆▂▅▅▅▆█
epoch/epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇███
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/loss,▆▃▂▂▆▄▃▄▃▄▁▂▂▆▄▄▂▄▇▅▄█▅▄▃▄▃▄▂▃
epoch/val_accuracy,██▃▃▁▆▃▆▄▂▅▅▅▅▆▂▇▅▄▄▅▆▄▇▅▅▄▃▁▅
epoch/val_loss,▆▁▄▂▅▂█▄▂▃▂▃█▄▃▃▂▅▄▃▃▅▂▁▃▃▅▃▂▅
test_accuracy,▁

0,1
epoch/accuracy,0.9996
epoch/epoch,29.0
epoch/learning_rate,0.001
epoch/loss,0.00329
epoch/val_accuracy,0.89514
epoch/val_loss,5.2727
test_accuracy,0.89867


[34m[1mwandb[0m: Agent Starting Run: fus47gcv with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	feedForward_dim: 128
[34m[1mwandb[0m: 	learning_rate: 5e-05
[34m[1mwandb[0m: 	num_heads: 4


Epoch 1/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9998 - loss: 0.0011 - val_accuracy: 0.8947 - val_loss: 5.9955
Epoch 2/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 0.0022 - val_accuracy: 0.8965 - val_loss: 5.1534
Epoch 3/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9999 - loss: 0.0011 - val_accuracy: 0.8948 - val_loss: 7.2689
Epoch 4/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9997 - loss: 0.0024 - val_accuracy: 0.8898 - val_loss: 6.6403
Epoch 5/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9998 - loss: 6.8555e-04 - val_accuracy: 0.8955 - val_loss: 8.8216
Epoch 6/10
[1m3375/3375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 5ms/step - accuracy: 0.9998 - loss: 0.0024 - val_accuracy: 0.8944 - val_loss: 6.2289
Epoch 

0,1
epoch/accuracy,▄▂▅▃█▁▅▅▃▅
epoch/epoch,▁▂▃▃▄▅▆▆▇█
epoch/learning_rate,▁▁▁▁▁▁▁▁▁▁
epoch/loss,▂█▅▇▁█▃▄▄▄
epoch/val_accuracy,▆█▆▁▇▆▇▆▇▆
epoch/val_loss,▃▁▅▄█▃▂▁▃▆
test_accuracy,▁

0,1
epoch/accuracy,0.99973
epoch/epoch,9.0
epoch/learning_rate,0.001
epoch/loss,0.0019
epoch/val_accuracy,0.89494
epoch/val_loss,7.95417
test_accuracy,0.896


In [None]:
run = wandb.init(project="transformer-sweep")


artifact = run.use_artifact('jubacochran-booking-com/transformer-sweep/run_bdcumq2v_model:v9', type='model')


artifact_dir = artifact.download()

print(f"Downloaded Artifact Directory: {artifact_dir}")




[34m[1mwandb[0m:   1 of 1 files downloaded.  


Downloaded Artifact Directory: /content/artifacts/run_bdcumq2v_model:v9


In [None]:
best_model = keras.models.load_model("/content/artifacts/run_bdcumq2v_model:v9/transformer_model_hyperparameter.keras")

In [None]:
def test_model(model, tokenizer, test_sentences):

    test_tokens = tokenizer.tokenize(test_sentences)

    # Use the model to predict sentiment
    predictions = model.predict(test_tokens)
    predictions = (predictions > 0.5).astype(int)  # Convert probabilities to binary classes (0 or 1)

    for i, sentence in enumerate(test_sentences):
        sentiment = "Positive" if predictions[i] == 1 else "Negative"
        print(f"Input: {sentence}\nPredicted Sentiment: {sentiment}\n")


In [None]:

test_sentences = [
    "This product was amazing! Highly recommend it.",
    "The movie was terrible and I hated it.",
    "It was okay, not great but not bad either.",
    "Absolutely loved it, would buy again!",
    "Worst purchase I have ever made.",
    "I am not sure if I like my purchase",
    "The beginning of the story was really intriging. Later the characters in the novel didnt develop as much. It is was a short book.",
    "happy with it...but"
]


test_model(best_model, tokenizer, test_sentences)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
Input: This product was amazing! Highly recommend it.
Predicted Sentiment: Negative

Input: The movie was terrible and I hated it.
Predicted Sentiment: Positive

Input: It was okay, not great but not bad either.
Predicted Sentiment: Positive

Input: Absolutely loved it, would buy again!
Predicted Sentiment: Positive

Input: Worst purchase I have ever made.
Predicted Sentiment: Positive

Input: I am not sure if I like my purchase
Predicted Sentiment: Positive

Input: The beginning of the story was really intriging. Later the characters in the novel didnt develop as much. It is was a short book.
Predicted Sentiment: Negative

Input: happy with it...but
Predicted Sentiment: Negative

