<a href="https://colab.research.google.com/github/moukouel/Notebooks/blob/main/ARTPlayGround.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Will be using this notebook to test AI security tools

In [None]:
!pip install adversarial-robustness-toolbox
!pip install tensorflow

Collecting adversarial-robustness-toolbox
  Downloading adversarial_robustness_toolbox-1.19.1-py3-none-any.whl.metadata (11 kB)
Downloading adversarial_robustness_toolbox-1.19.1-py3-none-any.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: adversarial-robustness-toolbox
Successfully installed adversarial-robustness-toolbox-1.19.1


##
The script demonstrates a simple example of using ART with TensorFlow v2.x. The example train a small model on the MNIST
dataset and creates adversarial examples using the Fast Gradient Sign Method. Here we use the ART classifier to train
the model, it would also be possible to provide a pretrained model to the ART classifier.
The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.

In [None]:
import numpy as np

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import TensorFlowV2Classifier
from art.utils import load_mnist

In [None]:
# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

In [None]:
# Step 2: Create the model

import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D


class TensorFlowModel(Model):
    """
    Standard TensorFlow model for unit testing.
    """

    def __init__(self):
        super(TensorFlowModel, self).__init__()
        self.conv1 = Conv2D(filters=4, kernel_size=5, activation="relu")
        self.conv2 = Conv2D(filters=10, kernel_size=5, activation="relu")
        self.maxpool = MaxPool2D(pool_size=(2, 2), strides=(2, 2), padding="valid", data_format=None)
        self.flatten = Flatten()
        self.dense1 = Dense(100, activation="relu")
        self.logits = Dense(10, activation="linear")

    def call(self, x):
        """
        Call function to evaluate the model.

        :param x: Input to the model
        :return: Prediction of the model
        """
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.conv2(x)
        x = self.maxpool(x)
        x = self.flatten(x)
        x = self.dense1(x)
        x = self.logits(x)
        return x


model = TensorFlowModel()
loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

In [None]:
# Step 3: Create the ART classifier

classifier = TensorFlowV2Classifier(
    model=model,
    loss_object=loss_object,
    optimizer=optimizer,
    nb_classes=10,
    input_shape=(28, 28, 1),
    clip_values=(0, 1),
)

In [None]:
# Step 4: Train the ART classifier

classifier.fit(x_train, y_train, batch_size=64, nb_epochs=3)

In [None]:
# Step 5: Evaluate the ART classifier on benign test examples

predictions = classifier.predict(x_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on benign test examples: {}%".format(accuracy * 100))

Accuracy on benign test examples: 97.74000000000001%


In [None]:
# Step 6: Generate adversarial test examples
attack = FastGradientMethod(estimator=classifier, eps=0.2)
x_test_adv = attack.generate(x=x_test)

In [None]:
# Step 7: Evaluate the ART classifier on adversarial test examples

predictions = classifier.predict(x_test_adv)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on adversarial test examples: {}%".format(accuracy * 100))

Accuracy on adversarial test examples: 21.73%


In [None]:
"""
The script demonstrates a simple example of using ART with PyTorch. The example train a small model on the MNIST dataset
and creates adversarial examples using the Fast Gradient Sign Method. Here we use the ART classifier to train the model,
it would also be possible to provide a pretrained model to the ART classifier.
The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.
"""

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist


# Step 0: Define the neural network model, return logits instead of activation in forward method


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5, stride=1)
        self.conv_2 = nn.Conv2d(in_channels=4, out_channels=10, kernel_size=5, stride=1)
        self.fc_1 = nn.Linear(in_features=4 * 4 * 10, out_features=100)
        self.fc_2 = nn.Linear(in_features=100, out_features=10)

    def forward(self, x):
        x = F.relu(self.conv_1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv_2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 10)
        x = F.relu(self.fc_1(x))
        x = self.fc_2(x)
        return x


# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

# Step 1a: Swap axes to PyTorch's NCHW format

x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)

# Step 2: Create the model

model = Net()

# Step 2a: Define the loss function and the optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Step 3: Create the ART classifier

classifier = PyTorchClassifier(
    model=model,
    clip_values=(min_pixel_value, max_pixel_value),
    loss=criterion,
    optimizer=optimizer,
    input_shape=(1, 28, 28),
    nb_classes=10,
)

# Step 4: Train the ART classifier

classifier.fit(x_train, y_train, batch_size=64, nb_epochs=3)

# Step 5: Evaluate the ART classifier on benign test examples

predictions = classifier.predict(x_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on benign test examples: {}%".format(accuracy * 100))

# Step 6: Generate adversarial test examples
attack = FastGradientMethod(estimator=classifier, eps=0.2)
x_test_adv = attack.generate(x=x_test)

# Step 7: Evaluate the ART classifier on adversarial test examples

predictions = classifier.predict(x_test_adv)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on adversarial test examples: {}%".format(accuracy * 100))

Accuracy on benign test examples: 97.27%
Accuracy on adversarial test examples: 21.27%


In [None]:
"""
The script demonstrates a simple example of using ART with scikit-learn. The example train a small model on the MNIST
dataset and creates adversarial examples using the Fast Gradient Sign Method. Here we use the ART classifier to train
the model, it would also be possible to provide a pretrained model to the ART classifier.
The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.
"""

from sklearn.svm import SVC
import numpy as np

from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import SklearnClassifier
from art.utils import load_mnist

# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

# Step 1a: Flatten dataset

nb_samples_train = x_train.shape[0]
nb_samples_test = x_test.shape[0]
x_train = x_train.reshape((nb_samples_train, 28 * 28))
x_test = x_test.reshape((nb_samples_test, 28 * 28))

# Step 2: Create the model

model = SVC(C=1.0, kernel="rbf")

# Step 3: Create the ART classifier

classifier = SklearnClassifier(model=model, clip_values=(min_pixel_value, max_pixel_value))

# Step 4: Train the ART classifier

classifier.fit(x_train, y_train)

# Step 5: Evaluate the ART classifier on benign test examples

predictions = classifier.predict(x_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on benign test examples: {}%".format(accuracy * 100))

# Step 6: Generate adversarial test examples
attack = FastGradientMethod(estimator=classifier, eps=0.2)
x_test_adv = attack.generate(x=x_test)

# Step 7: Evaluate the ART classifier on adversarial test examples

predictions = classifier.predict(x_test_adv)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on adversarial test examples: {}%".format(accuracy * 100))



Accuracy on benign test examples: 97.92%
Accuracy on adversarial test examples: 58.74%


In [None]:
"""
The script demonstrates a simple example of using ART with XGBoost. The example train a small model on the MNIST dataset
and creates adversarial examples using the Zeroth Order Optimization attack. Here we provide a pretrained model to the
ART classifier.
The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.
"""

import xgboost as xgb
import numpy as np

from art.attacks.evasion import ZooAttack
from art.estimators.classification import XGBoostClassifier
from art.utils import load_mnist

# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

# Step 1a: Flatten dataset

x_test = x_test[0:5]
y_test = y_test[0:5]

nb_samples_train = x_train.shape[0]
nb_samples_test = x_test.shape[0]
x_train = x_train.reshape((nb_samples_train, 28 * 28))
x_test = x_test.reshape((nb_samples_test, 28 * 28))

# Step 2: Create the model

params = {"objective": "multi:softprob", "eval_metric": ["mlogloss", "merror"], "num_class": 10}
dtrain = xgb.DMatrix(x_train, label=np.argmax(y_train, axis=1))
dtest = xgb.DMatrix(x_test, label=np.argmax(y_test, axis=1))
evals = [(dtest, "test"), (dtrain, "train")]
model = xgb.train(params=params, dtrain=dtrain, num_boost_round=2, evals=evals)

# Step 3: Create the ART classifier

classifier = XGBoostClassifier(
    model=model, clip_values=(min_pixel_value, max_pixel_value), nb_features=28 * 28, nb_classes=10
)

# Step 4: Train the ART classifier

# The model has already been trained in step 2

# Step 5: Evaluate the ART classifier on benign test examples

predictions = classifier.predict(x_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on benign test examples: {}%".format(accuracy * 100))

# Step 6: Generate adversarial test examples
attack = ZooAttack(
    classifier=classifier,
    confidence=0.0,
    targeted=False,
    learning_rate=1e-1,
    max_iter=200,
    binary_search_steps=10,
    initial_const=1e-3,
    abort_early=True,
    use_resize=False,
    use_importance=False,
    nb_parallel=5,
    batch_size=1,
    variable_h=0.01,
)
x_test_adv = attack.generate(x=x_test, y=y_test)

# Step 7: Evaluate the ART classifier on adversarial test examples

predictions = classifier.predict(x_test_adv)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on adversarial test examples: {}%".format(accuracy * 100))

[0]	test-mlogloss:1.08678	test-merror:0.00000	train-mlogloss:1.35689	train-merror:0.13210
[1]	test-mlogloss:0.80412	test-merror:0.00000	train-mlogloss:1.02601	train-merror:0.09192
Accuracy on benign test examples: 100.0%


ZOO:   0%|          | 0/5 [00:00<?, ?it/s]

Accuracy on adversarial test examples: 0.0%


In [None]:
# MIT License
#
# Copyright (C) The Adversarial Robustness Toolbox (ART) Authors 2020
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
# persons to whom the Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
# WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision
import argparse
import json
import yaml
import pprint

from art.estimators.object_detection import PyTorchFasterRCNN
from art.attacks.evasion import RobustDPatch


COCO_INSTANCE_CATEGORY_NAMES = [
    "__background__",
    "person",
    "bicycle",
    "car",
    "motorcycle",
    "airplane",
    "bus",
    "train",
    "truck",
    "boat",
    "traffic light",
    "fire hydrant",
    "N/A",
    "stop sign",
    "parking meter",
    "bench",
    "bird",
    "cat",
    "dog",
    "horse",
    "sheep",
    "cow",
    "elephant",
    "bear",
    "zebra",
    "giraffe",
    "N/A",
    "backpack",
    "umbrella",
    "N/A",
    "N/A",
    "handbag",
    "tie",
    "suitcase",
    "frisbee",
    "skis",
    "snowboard",
    "sports ball",
    "kite",
    "baseball bat",
    "baseball glove",
    "skateboard",
    "surfboard",
    "tennis racket",
    "bottle",
    "N/A",
    "wine glass",
    "cup",
    "fork",
    "knife",
    "spoon",
    "bowl",
    "banana",
    "apple",
    "sandwich",
    "orange",
    "broccoli",
    "carrot",
    "hot dog",
    "pizza",
    "donut",
    "cake",
    "chair",
    "couch",
    "potted plant",
    "bed",
    "N/A",
    "dining table",
    "N/A",
    "N/A",
    "toilet",
    "N/A",
    "tv",
    "laptop",
    "mouse",
    "remote",
    "keyboard",
    "cell phone",
    "microwave",
    "oven",
    "toaster",
    "sink",
    "refrigerator",
    "N/A",
    "book",
    "clock",
    "vase",
    "scissors",
    "teddy bear",
    "hair drier",
    "toothbrush",
]


def extract_predictions(predictions_):

    # for key, item in predictions[0].items():
    #     print(key, item)

    # Get the predicted class
    predictions_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(predictions_["labels"])]
    print("\npredicted classes:", predictions_class)

    # Get the predicted bounding boxes
    predictions_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(predictions_["boxes"])]

    # Get the predicted prediction score
    predictions_score = list(predictions_["scores"])
    print("predicted score:", predictions_score)

    # Get a list of index with score greater than threshold
    threshold = 0.5
    predictions_t = [predictions_score.index(x) for x in predictions_score if x > threshold][-1]

    predictions_boxes = predictions_boxes[: predictions_t + 1]
    predictions_class = predictions_class[: predictions_t + 1]

    return predictions_class, predictions_boxes, predictions_class


def plot_image_with_boxes(img, boxes, pred_cls):
    text_size = 5
    text_th = 5
    rect_th = 6

    for i in range(len(boxes)):
        # Draw Rectangle with the coordinates

        cv2.rectangle(
            img,
            (int(boxes[i][0][0]), int(boxes[i][0][1])),
            (int(boxes[i][1][0]), int(boxes[i][1][1])),
            color=(0, 255, 0),
            thickness=rect_th,
        )
        # Write the prediction class
        cv2.putText(
            img,
            pred_cls[i],
            (int(boxes[i][0][0]), int(boxes[i][0][1])),
            cv2.FONT_HERSHEY_SIMPLEX,
            text_size,
            (0, 255, 0),
            thickness=text_th,
        )
    plt.axis("off")
    plt.imshow(img.astype(np.uint8), interpolation="nearest")
    plt.show()


def get_loss(frcnn, x, y):
    frcnn._model.train()
    transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
    image_tensor_list = list()

    for i in range(x.shape[0]):
        if frcnn.clip_values is not None:
            img = transform(x[i] / frcnn.clip_values[1]).to(frcnn._device)
        else:
            img = transform(x[i]).to(frcnn._device)
        image_tensor_list.append(img)

    loss = frcnn._model(image_tensor_list, y)
    for loss_type in ["loss_classifier", "loss_box_reg", "loss_objectness", "loss_rpn_box_reg"]:
        loss[loss_type] = loss[loss_type].cpu().detach().numpy().item()
    return loss


def append_loss_history(loss_history, output):
    for loss in ["loss_classifier", "loss_box_reg", "loss_objectness", "loss_rpn_box_reg"]:
        loss_history[loss] += [output[loss]]
    return loss_history


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--config", required=False, default=None, help="Path of config yaml file")
    cmdline = parser.parse_args()

    if cmdline.config and os.path.exists(cmdline.config):
        with open(cmdline.config, "r") as cf:
            config = yaml.safe_load(cf.read())
    else:
        config = {
            "attack_losses": ["loss_classifier", "loss_box_reg", "loss_objectness", "loss_rpn_box_reg"],
            "cuda_visible_devices": "1",
            "patch_shape": [450, 450, 3],
            "patch_location": [600, 750],
            "crop_range": [0, 0],
            "brightness_range": [1.0, 1.0],
            "rotation_weights": [1, 0, 0, 0],
            "sample_size": 1,
            "learning_rate": 1.0,
            "max_iter": 5000,
            "batch_size": 1,
            "image_file": "banner-diverse-group-of-people-2.jpg",
            "resume": False,
            "path": "",
        }

    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(config)

    if config["cuda_visible_devices"] is None:
        device_type = "cpu"
    else:
        device_type = "gpu"
        os.environ["CUDA_VISIBLE_DEVICES"] = config["cuda_visible_devices"]

    frcnn = PyTorchFasterRCNN(
        clip_values=(0, 255), channels_first=False, attack_losses=config["attack_losses"], device_type=device_type
    )

    image_1 = cv2.imread(config["image_file"])
    image_1 = cv2.cvtColor(image_1, cv2.COLOR_BGR2RGB)  # Convert to RGB
    image_1 = cv2.resize(image_1, dsize=(image_1.shape[1], image_1.shape[0]), interpolation=cv2.INTER_CUBIC)

    image = np.stack([image_1], axis=0).astype(np.float32)

    attack = RobustDPatch(
        frcnn,
        patch_shape=config["patch_shape"],
        patch_location=config["patch_location"],
        crop_range=config["crop_range"],
        brightness_range=config["brightness_range"],
        rotation_weights=config["rotation_weights"],
        sample_size=config["sample_size"],
        learning_rate=config["learning_rate"],
        max_iter=1,
        batch_size=config["batch_size"],
    )

    x = image.copy()

    y = frcnn.predict(x=x)
    for i, y_i in enumerate(y):
        y[i]["boxes"] = torch.from_numpy(y_i["boxes"]).type(torch.float).to(frcnn._device)
        y[i]["labels"] = torch.from_numpy(y_i["labels"]).type(torch.int64).to(frcnn._device)
        y[i]["scores"] = torch.from_numpy(y_i["scores"]).to(frcnn._device)

    if config["resume"]:
        patch = np.load(os.path.join(config["path"], "patch.npy"))
        attack._patch = patch

        with open(os.path.join(config["path"], "loss_history.json"), "r") as file:
            loss_history = json.load(file)
    else:
        loss_history = {"loss_classifier": [], "loss_box_reg": [], "loss_objectness": [], "loss_rpn_box_reg": []}

    for i in range(config["max_iter"]):
        print("Iteration:", i)
        patch = attack.generate(x)
        x_patch = attack.apply_patch(x)

        loss = get_loss(frcnn, x_patch, y)
        print(loss)
        loss_history = append_loss_history(loss_history, loss)

        with open(os.path.join(config["path"], "loss_history.json"), "w") as file:
            file.write(json.dumps(loss_history))

        np.save(os.path.join(config["path"], "patch"), attack._patch)

    predictions_adv = frcnn.predict(x=x_patch)

    for i in range(image.shape[0]):
        print("\nPredictions adversarial image {}:".format(i))

        # Process predictions
        predictions_adv_class, predictions_adv_boxes, predictions_adv_class = extract_predictions(predictions_adv[i])

        # Plot predictions
        plot_image_with_boxes(img=x_patch[i].copy(), boxes=predictions_adv_boxes, pred_cls=predictions_adv_class)

In [7]:
import tensorflow as tf
import numpy as np

# Hyperparameters
time_steps = 5  # Number of time steps in the sequence
input_dim = 3   # Input features
hidden_units = 4  # Number of hidden units
output_dim = 2  # Output features
learning_rate = 0.01
# Initialize weights and biases for the RNN manually
Wx = tf.Variable(tf.random.normal([input_dim, hidden_units]))  # Input-to-hidden weights
Wh = tf.Variable(tf.random.normal([hidden_units, hidden_units]))  # Hidden-to-hidden weights
Wy = tf.Variable(tf.random.normal([hidden_units, output_dim]))  # Hidden-to-output weights
bh = tf.Variable(tf.zeros([hidden_units]))  # Hidden bias
by = tf.Variable(tf.zeros([output_dim]))  # Output bias

# Dummy data (batch size = 1 for simplicity)
X = tf.random.normal([1, time_steps, input_dim])  # Input sequence
Y_true = tf.random.normal([1, time_steps, output_dim])  # Ground truth outputs

# Training step with manual BPTT
def train_step(X, Y_true):
    batch_size = X.shape[0]

    with tf.GradientTape() as tape:
        # Forward pass
        h_t = tf.zeros([batch_size, hidden_units])  # Initial hidden state
        outputs = []
        for t in range(time_steps):
            x_t = X[:, t, :]  # Input at time step t
            h_t = tf.nn.tanh(tf.matmul(x_t, Wx) + tf.matmul(h_t, Wh) + bh)  # Hidden state update
            y_t = tf.matmul(h_t, Wy) + by  # Output at time step t
            outputs.append(y_t)

        outputs = tf.stack(outputs, axis=1)  # Shape: [batch_size, time_steps, output_dim]
        loss = tf.reduce_mean(tf.square(outputs - Y_true))  # Mean squared error

    # Backward pass (BPTT)
    gradients = tape.gradient(loss, [Wx, Wh, Wy, bh, by])
    optimizer = tf.keras.optimizers.Adam(learning_rate)
    optimizer.apply_gradients(zip(gradients, [Wx, Wh, Wy, bh, by]))

    return loss

# Training loop
epochs = 100
for epoch in range(epochs):
    loss = train_step(X, Y_true)
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.numpy()}")


Epoch 0, Loss: 4.674145698547363
Epoch 10, Loss: 3.146674156188965
Epoch 20, Loss: 2.2044918537139893
Epoch 30, Loss: 1.4331787824630737
Epoch 40, Loss: 1.0548491477966309
Epoch 50, Loss: 0.7887776494026184
Epoch 60, Loss: 0.5761017203330994
Epoch 70, Loss: 0.40344104170799255
Epoch 80, Loss: 0.26267001032829285
Epoch 90, Loss: 0.15752963721752167


In [8]:
# Define the variable
x = tf.Variable(3.0)

# Compute higher-order gradient
with tf.GradientTape() as outer_tape:
    with tf.GradientTape() as inner_tape:
        y = x**3  # Compute y = x^3
    dy_dx = inner_tape.gradient(y, x)  # First derivative: 3x^2
d2y_dx2 = outer_tape.gradient(dy_dx, x)  # Second derivative: 6x

print("First derivative:", dy_dx.numpy())
print("Second derivative:", d2y_dx2.numpy())


First derivative: 27.0
Second derivative: 18.0


In [9]:
# Define a custom model
class SimpleModel(tf.keras.Model):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.dense = tf.keras.layers.Dense(1, activation='linear')

    def call(self, inputs):
        return self.dense(inputs)

# Create the model and input
model = SimpleModel()
x = tf.constant([[1.0, 2.0, 3.0]])
y_true = tf.constant([[4.0]])

# Compute gradients for trainable parameters
with tf.GradientTape() as tape:
    y_pred = model(x)
    loss = tf.reduce_mean((y_pred - y_true)**2)  # MSE loss

# Compute gradients
gradients = tape.gradient(loss, model.trainable_variables)
for var, grad in zip(model.trainable_variables, gradients):
    print(f"Gradient for {var.name}:", grad.numpy())


Gradient for kernel: [[1.5743093]
 [3.1486187]
 [4.722928 ]]
Gradient for bias: [1.5743093]


In [13]:
# Define variables
x = tf.Variable(1.0)
y = tf.Variable(2.0)

# Compute gradient
with tf.GradientTape() as tape:
    z = x**2 + y**3 + 3*x*y  # Multivariable function

# Compute gradients
gradients = tape.gradient(z, [x, y])
print("Gradient w.r.t x:", gradients[0].numpy())
print("Gradient w.r.t y:", gradients[1].numpy())


Gradient w.r.t x: 8.0
Gradient w.r.t y: 15.0


In [14]:
# Define the variable
x = tf.Variable(-3.0)

# Compute gradient
with tf.GradientTape() as tape:
    y = tf.nn.relu(x)

gradient = tape.gradient(y, x)
print("Gradient of ReLU:", gradient.numpy())


Gradient of ReLU: 0.0


In [15]:
# Define logits
logits = tf.Variable([1.0, 2.0, 3.0])

# Compute softmax and its gradient
with tf.GradientTape() as tape:
    probs = tf.nn.softmax(logits)

gradient = tape.gradient(probs, logits)
print("Gradient of Softmax:", gradient.numpy())


Gradient of Softmax: [5.3662399e-09 1.4586954e-08 3.9651447e-08]


In [16]:
# Input data
x = tf.constant([[1.0, 2.0]])
y_true = tf.constant([[0.5]])

# Weights and biases
W = tf.Variable([[0.1, 0.2], [0.3, 0.4]])
b = tf.Variable([[0.1, 0.2]])

# Compute loss and gradients
with tf.GradientTape() as tape:
    z = tf.matmul(x, W) + b  # Linear transformation
    y_pred = tf.nn.sigmoid(z)  # Sigmoid activation
    loss = tf.reduce_mean(0.5 * tf.square(y_pred - y_true))  # Mean squared error

# Compute gradients
gradients = tape.gradient(loss, [W, b])
print("Gradient for W:", gradients[0].numpy())
print("Gradient for b:", gradients[1].numpy())


Gradient for W: [[0.02031869 0.02388453]
 [0.04063738 0.04776907]]
Gradient for b: [[0.02031869 0.02388453]]


In [17]:
# Define the variable
x = tf.Variable(3.0)

# Compute gradient
with tf.GradientTape() as tape:
    y = tf.sin(x**2 + 2*x)  # Composite function

# Compute gradient
gradient = tape.gradient(y, x)
print("Gradient of f(x):", gradient.numpy())


Gradient of f(x): -6.077503


In [18]:
import tensorflow as tf

# Define the variable
x = tf.Variable(2.0)

# Compute gradient using GradientTape
with tf.GradientTape() as tape:
    y = x**2 + 3*x + 5  # Quadratic function

# Compute gradient
gradient = tape.gradient(y, x)
print("Gradient of f(x):", gradient.numpy())


Gradient of f(x): 7.0


In [19]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense
import numpy as np

# Generate sample sequential data
# For simplicity, let's create a toy dataset of a sine wave
def generate_time_series(batch_size, time_steps):
    freq = np.random.uniform(0.1, 0.5, size=(batch_size, 1))
    phase = np.random.uniform(0, 2 * np.pi, size=(batch_size, 1))
    t = np.linspace(0, 1, time_steps)
    X = np.sin(2 * np.pi * freq * t + phase)
    y = np.cos(2 * np.pi * freq * t + phase)
    return X[..., np.newaxis], y[..., np.newaxis]  # Add an extra dimension for the feature

# Generate training and validation data
batch_size = 64
time_steps = 50
input_dim = 1
output_dim = 1

X_train, y_train = generate_time_series(batch_size, time_steps)
X_val, y_val = generate_time_series(batch_size, time_steps)

# Define the GRU model
model = Sequential([
    GRU(64, return_sequences=True, input_shape=(time_steps, input_dim)),  # GRU with 64 units
    Dense(output_dim)  # Fully connected layer to produce output
])

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=batch_size)

# Predict on new data
X_test, y_test = generate_time_series(batch_size, time_steps)
predictions = model.predict(X_test)

print("Sample prediction:", predictions[0])


  super().__init__(**kwargs)


Epoch 1/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - loss: 0.4696 - mae: 0.6127 - val_loss: 0.4960 - val_mae: 0.6352
Epoch 2/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 69ms/step - loss: 0.4621 - mae: 0.6082 - val_loss: 0.4959 - val_mae: 0.6351
Epoch 3/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 138ms/step - loss: 0.4553 - mae: 0.6038 - val_loss: 0.4965 - val_mae: 0.6352
Epoch 4/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - loss: 0.4490 - mae: 0.5995 - val_loss: 0.4977 - val_mae: 0.6356
Epoch 5/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 71ms/step - loss: 0.4432 - mae: 0.5953 - val_loss: 0.4995 - val_mae: 0.6362
Epoch 6/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 135ms/step - loss: 0.4380 - mae: 0.5912 - val_loss: 0.5018 - val_mae: 0.6369
Epoch 7/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 136ms/step - loss: 0.4332 - mae: 

In [23]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np

# Generate sample sequential data
def generate_time_series(batch_size, time_steps):
    freq = np.random.uniform(0.1, 0.5, size=(batch_size, 1))
    phase = np.random.uniform(0, 2 * np.pi, size=(batch_size, 1))
    t = np.linspace(0, 1, time_steps)
    X = np.sin(2 * np.pi * freq * t + phase)
    y = np.cos(2 * np.pi * freq * t + phase)
    return X[..., np.newaxis], y[..., np.newaxis]

# Generate training and validation data
batch_size = 64
time_steps = 50
input_dim = 1
output_dim = 1

X_train, y_train = generate_time_series(batch_size, time_steps)
X_val, y_val = generate_time_series(batch_size, time_steps)

# Define the LSTM model
model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(time_steps, input_dim)),  # LSTM with 64 units
    Dense(output_dim)  # Fully connected layer to produce output
])

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=batch_size)

# Predict on new data
X_test, y_test = generate_time_series(batch_size, time_steps)
print(X_test)
predictions = model.predict(X_test)

print("Sample prediction LSTM:", predictions[0])

print("GRU Sample Prediction")

#
# Define the GRU model
model2 = Sequential([
    GRU(64, return_sequences=True, input_shape=(time_steps, input_dim)),  # GRU with 64 units
    Dense(output_dim)  # Fully connected layer to produce output
])

# Compile the model
model2.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
model2.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=batch_size)
print(X_test)
predictions = model2.predict(X_test)

print("Sample prediction GRU:", predictions[0])



Epoch 1/20


  super().__init__(**kwargs)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - loss: 0.4576 - mae: 0.6056 - val_loss: 0.4770 - val_mae: 0.6219
Epoch 2/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 74ms/step - loss: 0.4508 - mae: 0.6010 - val_loss: 0.4702 - val_mae: 0.6169
Epoch 3/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step - loss: 0.4448 - mae: 0.5968 - val_loss: 0.4638 - val_mae: 0.6120
Epoch 4/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 134ms/step - loss: 0.4391 - mae: 0.5928 - val_loss: 0.4572 - val_mae: 0.6071
Epoch 5/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step - loss: 0.4335 - mae: 0.5887 - val_loss: 0.4502 - val_mae: 0.6016
Epoch 6/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 75ms/step - loss: 0.4275 - mae: 0.5843 - val_loss: 0.4421 - val_mae: 0.5954
Epoch 7/20
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 135ms/step - loss: 0.4209 - mae: 0.5794 - val

A **Bidirectional Recurrent Neural Network (BRNN)** is a variation of RNNs that processes sequence data in both forward and backward directions. By using information from both past and future time steps, it achieves a more comprehensive understanding of the sequence compared to standard RNNs, which process data only in the forward direction.

---

### **How Bidirectional RNN Works**
1. **Forward Pass**:
   - One RNN processes the sequence from the beginning to the end (forward direction), producing hidden states at each time step.

2. **Backward Pass**:
   - Another RNN processes the same sequence but in reverse, moving from the end to the beginning (backward direction), producing hidden states for the reverse sequence.

3. **Combine Outputs**:
   - The outputs from both forward and backward RNNs are combined (concatenated or summed) at each time step, capturing information from both directions.

---

### **Advantages of Bidirectional RNN**
- **Context Awareness**: Incorporates both past and future context, making it ideal for tasks where understanding the entire sequence is critical.
- **Improved Accuracy**: Often yields better predictions compared to a unidirectional RNN, especially for applications like language understanding.
- **Flexibility**: Can be applied to tasks requiring sequence-level processing (e.g., time-series analysis, speech recognition).

---

### **Disadvantages**
- **Higher Computational Cost**: Requires training two RNNs (forward and backward), doubling the computational overhead.
- **Incompatibility with Real-Time Data**: Cannot process data in real-time or streaming scenarios, as the backward pass needs the entire sequence in advance.

---

### **Applications**
Bidirectional RNNs are widely used in tasks like:
- **Natural Language Processing** (NLP): Machine translation, sentiment analysis, and named entity recognition.
- **Speech Recognition**: To leverage both past and future phonemes in predicting speech outputs.
- **Time-Series Prediction**: Forecasting using data trends from both directions.

---

### **Code Example with TensorFlow/Keras**
Here’s how to implement a Bidirectional RNN using TensorFlow/Keras:

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Bidirectional, LSTM, Dense

# Define the Bidirectional RNN model
model = Sequential([
    Bidirectional(LSTM(64, return_sequences=True, input_shape=(50, 1))),  # Bidirectional LSTM
    Dense(1)  # Fully connected output layer
])

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Summary of the model
model.summary()
```

---

This model processes a sequence of data in both forward and backward directions, allowing it to learn patterns from both past and future contexts. Let me know if you'd like a detailed example or help adapting it to your specific task! 😊

Here's a detailed comparison of **GRU (Gated Recurrent Unit)** and **LSTM (Long Short-Term Memory)**, highlighting their advantages and disadvantages:

| **Feature**                | **GRU (Gated Recurrent Unit)**                                        | **LSTM (Long Short-Term Memory)**                                   |
|-----------------------------|----------------------------------------------------------------------|----------------------------------------------------------------------|
| **Architecture Simplicity** | GRUs are simpler with fewer gates (update and reset), making them faster and easier to implement. | LSTMs have a more complex architecture with three gates (input, forget, and output), which increases computational overhead. |
| **Training Speed**          | Faster to train due to fewer parameters.                            | Slower to train as more parameters need to be learned.              |
| **Memory Requirements**     | Requires less memory due to fewer parameters.                      | Requires more memory because of additional gates and parameters.    |
| **Performance on Small Data** | Performs well, often comparable to LSTM on smaller datasets.       | Can outperform GRUs on small datasets where precise control over memory is essential. |
| **Handling Long-Term Dependencies** | Effective for capturing long-term dependencies but slightly less capable compared to LSTM for extremely long sequences. | Superior at handling long-term dependencies due to its sophisticated memory cell mechanism. |
| **Suitability for Tasks**   | Well-suited for tasks where computational efficiency is critical, such as real-time systems. | Better for tasks requiring complex temporal relationships, such as language modeling or machine translation. |
| **Tuning Complexity**       | Easier to tune due to its simpler design.                          | Requires careful tuning of hyperparameters like forget bias.        |
| **Overfitting**             | Lower risk of overfitting as it has fewer parameters.              | Higher risk of overfitting in smaller datasets due to more parameters. |
| **Flexibility**             | Slightly less flexible as it lacks as much granular control over memory compared to LSTM. | More flexible and powerful due to its ability to fully control information flow. |

---

### **Summary**
- **Use GRUs** if:
  - You prioritize computational efficiency and speed.
  - Your dataset is small or real-time processing is critical.

- **Use LSTMs** if:
  - You need to capture complex, long-term dependencies.
  - Your dataset is large and computational resources are not a limiting factor.

Both are powerful architectures, and the choice depends on the specific requirements of your task. Let me know if you'd like examples or further guidance! 😊

Here’s how you can implement the same sequential prediction example using an **LSTM** layer instead of a GRU in TensorFlow/Keras.

---

### **LSTM Implementation Example**
LSTMs are a popular choice for processing sequential data as they effectively handle long-term dependencies by using gates to control the flow of information.

#### Code Example:
```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np

# Generate sample sequential data
def generate_time_series(batch_size, time_steps):
    freq = np.random.uniform(0.1, 0.5, size=(batch_size, 1))
    phase = np.random.uniform(0, 2 * np.pi, size=(batch_size, 1))
    t = np.linspace(0, 1, time_steps)
    X = np.sin(2 * np.pi * freq * t + phase)
    y = np.cos(2 * np.pi * freq * t + phase)
    return X[..., np.newaxis], y[..., np.newaxis]

# Generate training and validation data
batch_size = 64
time_steps = 50
input_dim = 1
output_dim = 1

X_train, y_train = generate_time_series(batch_size, time_steps)
X_val, y_val = generate_time_series(batch_size, time_steps)

# Define the LSTM model
model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(time_steps, input_dim)),  # LSTM with 64 units
    Dense(output_dim)  # Fully connected layer to produce output
])

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=batch_size)

# Predict on new data
X_test, y_test = generate_time_series(batch_size, time_steps)
predictions = model.predict(X_test)

print("Sample prediction:", predictions[0])
```

---

### **Explanation**
1. **Data Generation**:
   - The same function generates synthetic sequential sine wave data for training, validation, and testing.

2. **LSTM Layer**:
   - The `LSTM` layer processes sequential data by maintaining long-term memory of the sequence through its gates:
     - **Forget Gate**: Decides which information to discard.
     - **Input Gate**: Updates the cell state with new information.
     - **Output Gate**: Determines the hidden state passed to the next step.
   - `return_sequences=True` ensures the LSTM outputs a value for every time step, as needed for sequence-to-sequence tasks.

3. **Fully Connected Output Layer**:
   - A `Dense` layer maps LSTM's outputs to the desired output dimension (1 in this case).

4. **Training**:
   - The model is optimized using the Adam optimizer and **mean squared error (MSE)** loss function.

5. **Prediction**:
   - After training, the model is tested on unseen sequences to generate predictions.

---

### **Key Parameters for LSTM**
- **Units (e.g., 64)**:
  - Number of hidden units in the LSTM layer. Increasing this value improves capacity but may risk overfitting.
- **Return Sequences**:
  - Use `True` if the model needs to output predictions for all time steps (sequence-to-sequence tasks).
  - Use `False` if only the final output is needed (sequence-to-vector tasks).

---

This implementation demonstrates how to use an LSTM for sequence prediction. Let me know if you'd like more examples or deeper insights into how LSTMs work! 😊