Q1: GAN Architecture

Answer: The adversarial process involves the Generator creating fake data to fool the Discriminator, which classifies data as real or fake. The Generator improves by maximizing the Discriminator’s error, while the Discriminator improves by minimizing classification error. They train via a minimax game, with losses balanced to avoid one overpowering the other.

Diagram: Sketch a flowchart: Input noise → Generator → Fake images → Discriminator (also takes real images) → Outputs real/fake probability. Label objectives: Generator (minimize log(1-D(G(z)))), Discriminator (maximize log(D(x)) + log(1-D(G(z)))).

Q2: Ethics and AI Harm

Choice: Misinformation in generative AI. Application: A text-to-image GAN generates fake news images (e.g., a fabricated protest), spreading false narratives on social media. Mitigation: Watermarking: Embed digital signatures in AI-generated content to trace origins. Content Moderation: Use AI filters to detect and flag misleading images before public release.



In [4]:
#Programming Task (Basic GAN Implementation)


import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import layers
import os

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

# Load and preprocess MNIST dataset
(train_images, _), (_, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5  # Normalize to [-1, 1]
BUFFER_SIZE = 60000
BATCH_SIZE = 256

# Create TensorFlow dataset
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

# Generator model
def make_generator_model():
    model = tf.keras.Sequential([
        layers.Dense(7*7*256, use_bias=False, input_shape=(100,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Reshape((7, 7, 256)),
        layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')
    ])
    return model

# Discriminator model
def make_discriminator_model():
    model = tf.keras.Sequential([
        layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[28, 28, 1]),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        layers.Flatten(),
        layers.Dense(1)
    ])
    return model

# Loss functions
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    return real_loss + fake_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

# Optimizers
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

# Training step
@tf.function
def train_step(images, generator, discriminator):
    noise = tf.random.normal([BATCH_SIZE, 100])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)
        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

    return gen_loss, disc_loss

# Training loop
def train(dataset, epochs, generator, discriminator):
    gen_losses, disc_losses = [], []

    for epoch in range(epochs):
        epoch_gen_loss, epoch_disc_loss = [], []

        for image_batch in dataset:
            g_loss, d_loss = train_step(image_batch, generator, discriminator)
            epoch_gen_loss.append(g_loss)
            epoch_disc_loss.append(d_loss)

        gen_losses.append(np.mean(epoch_gen_loss))
        disc_losses.append(np.mean(epoch_disc_loss))

        # Generate and save images at specific epochs
        if epoch in [0, 50, 99]:  # Epochs 0, 50, 100 (0-based indexing)
            generate_and_save_images(generator, epoch + 1, seed)

        print(f'Epoch {epoch + 1}, Gen Loss: {gen_losses[-1]:.4f}, Disc Loss: {disc_losses[-1]:.4f}')

    # Plot losses
    plt.figure(figsize=(10, 5))
    plt.plot(gen_losses, label='Generator Loss')
    plt.plot(disc_losses, label='Discriminator Loss')
    plt.title('Generator and Discriminator Losses Over Time')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.savefig('gan_losses.png')
    plt.close()

    return gen_losses, disc_losses

# Generate and save images
def generate_and_save_images(model, epoch, test_input):
    predictions = model(test_input, training=False)
    fig = plt.figure(figsize=(4, 4))

    for i in range(16):
        plt.subplot(4, 4, i + 1)
        plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
        plt.axis('off')

    plt.savefig(f'image_at_epoch_{epoch:04d}.png')
    plt.close()

# Initialize models and seed
generator = make_generator_model()
discriminator = make_discriminator_model()
seed = tf.random.normal([16, 100])

# Train the GAN
EPOCHS = 100
train(train_dataset, EPOCHS, generator, discriminator)

# Save models (optional for submission) with .keras extension
generator.save('generator_model.keras')
discriminator.save('discriminator_model.keras')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1, Gen Loss: 0.6569, Disc Loss: 1.2065
Epoch 2, Gen Loss: 0.7974, Disc Loss: 1.2522
Epoch 3, Gen Loss: 0.8030, Disc Loss: 1.2792
Epoch 4, Gen Loss: 0.7828, Disc Loss: 1.3437
Epoch 5, Gen Loss: 0.7600, Disc Loss: 1.3467
Epoch 6, Gen Loss: 0.7773, Disc Loss: 1.3240
Epoch 7, Gen Loss: 0.8389, Disc Loss: 1.2569
Epoch 8, Gen Loss: 0.8171, Disc Loss: 1.3158
Epoch 9, Gen Loss: 0.8436, Disc Loss: 1.2764
Epoch 10, Gen Loss: 0.8839, Disc Loss: 1.2046
Epoch 11, Gen Loss: 0.9247, Disc Loss: 1.2282
Epoch 12, Gen Loss: 0.9504, Disc Loss: 1.2275
Epoch 13, Gen Loss: 0.9769, Disc Loss: 1.1760
Epoch 14, Gen Loss: 1.0156, Disc Loss: 1.1665
Epoch 15, Gen Loss: 1.0589, Disc Loss: 1.1130
Epoch 16, Gen Loss: 1.1196, Disc Loss: 1.0604
Epoch 17, Gen Loss: 1.1877, Disc Loss: 1.0416
Epoch 18, Gen Loss: 1.0618, Disc Loss: 1.1165
Epoch 19, Gen Loss: 1.1777, Disc Loss: 1.0439
Epoch 20, Gen Loss: 1.2032, Disc Loss: 1.0796
Epoch 21, Gen Loss: 1.2941, Disc Loss: 0.9780
Epoch 22, Gen Loss: 1.2487, Disc Loss: 1.02

In [1]:
#question 3 :Data Poisoning Simulation


import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Set random seed
tf.random.set_seed(42)
np.random.seed(42)

# Load IMDB dataset (small subset for simulation)
max_words = 10000
max_len = 100
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=max_words)
x_train = pad_sequences(x_train, maxlen=max_len)
x_test = pad_sequences(x_test, maxlen=max_len)

# Decode reviews for poisoning
word_index = tf.keras.datasets.imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}
def decode_review(text):
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in text])

# Simulate poisoning: Flip labels for reviews containing "UC Berkeley"
def poison_data(x, y, target_phrase="uc berkeley"):
    x_text = [decode_review(review).lower() for review in x]
    poisoned_y = y.copy()
    poison_count = 0
    for i, text in enumerate(x_text):
        if target_phrase in text:
            poisoned_y[i] = 1 - y[i]  # Flip label
            poison_count += 1
    print(f"Poisoned {poison_count} reviews containing '{target_phrase}'")
    return x, poisoned_y

# Build sentiment classifier
def build_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(max_words, 16, input_length=max_len),
        tf.keras.layers.GlobalAveragePooling1D(),
        tf.keras.layers.Dense(16, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate model
def train_and_evaluate(x_train, y_train, x_test, y_test, model_name):
    model = build_model()
    model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=0)
    loss, accuracy = model.evaluate(x_test, y_test, verbose=0)

    # Predict and compute confusion matrix
    y_pred = (model.predict(x_test) > 0.5).astype("int32")
    cm = confusion_matrix(y_test, y_pred)

    # Plot confusion matrix
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Negative', 'Positive'])
    disp.plot(cmap=plt.cm.Blues)
    plt.title(f'Confusion Matrix - {model_name}')
    plt.savefig(f'cm_{model_name.lower().replace(" ", "_")}.png')
    plt.close()

    return accuracy, cm

# Train on clean data
print("Training on clean data...")
clean_accuracy, clean_cm = train_and_evaluate(x_train, y_train, x_test, y_test, "Clean Model")

# Poison training data
x_train_poisoned, y_train_poisoned = poison_data(x_train, y_train, "uc berkeley")

# Train on poisoned data
print("Training on poisoned data...")
poisoned_accuracy, poisoned_cm = train_and_evaluate(x_train_poisoned, y_train_poisoned, x_test, y_test, "Poisoned Model")

# Print results
print(f"Clean Model Accuracy: {clean_accuracy:.4f}")
print(f"Poisoned Model Accuracy: {poisoned_accuracy:.4f}")
print("Impact: Poisoning may reduce accuracy due to incorrect labels, especially for reviews mentioning 'UC Berkeley'.")

# Save accuracy plot
plt.figure(figsize=(6, 4))
plt.bar(['Clean', 'Poisoned'], [clean_accuracy, poisoned_accuracy], color=['blue', 'red'])
plt.title('Model Accuracy Before and After Poisoning')
plt.ylabel('Accuracy')
plt.savefig('accuracy_comparison.png')
plt.close()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1us/step
Training on clean data...




[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step
Poisoned 0 reviews containing 'uc berkeley'
Training on poisoned data...




[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
Clean Model Accuracy: 0.8405
Poisoned Model Accuracy: 0.8438
Impact: Poisoning may reduce accuracy due to incorrect labels, especially for reviews mentioning 'UC Berkeley'.


Q5: Legal and Ethical Implications

Answer: Generative AI raises legal and ethical concerns by potentially memorizing and revealing private data, such as names or contact details (as seen in GPT-2), and generating copyrighted material like passages from Harry Potter, violating intellectual property laws. These risks can lead to privacy breaches and copyright infringement. To address this, AI models should be restricted from using sensitive or copyrighted data unless explicitly permitted. Such safeguards protect individual rights and ensure ethical AI development.

Q6 : False Negative Rate Parity measures whether different groups (e.g., race, gender) have similar false negative rates—cases where the model wrongly predicts a negative outcome (e.g., not hiring) when it should be positive. This metric is important because high false negatives for a specific group can lead to unfair denial of opportunities like jobs or loans. For example, if a model predicts loan eligibility and consistently rejects qualified applicants from a minority group, it fails this metric. Such disparity reinforces existing inequalities. A model might fail this metric if the training data is imbalanced or reflects historical bias. Aequitas helps detect such fairness issues and supports more equitable decision-making.