<a href="https://colab.research.google.com/github/susanemiliaw/NTHU_2025_DLIA_HW/blob/main/(clear_output)Attempt2_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">

#### Lab 3

# National Tsing Hua University

#### Spring 2025

#### 11320IEEM 513600

#### Deep Learning and Industrial Applications
    
## Lab 3: Anomaly Detection in Industrial Applications

</div>

### Introduction

In today's industrial landscape, the ability to detect anomalies in manufacturing processes and products is critical for maintaining quality, efficiency, and safety. This lab focuses on leveraging deep learning techniques for anomaly detection in various industrial applications, using the MVTEC Anomaly Detection Dataset. By employing ImageNet-pretrained models available in torchvision, students will gain hands-on experience in classfying defects and irregularities across different types of industrial products.

Throughout this lab, you'll be involved in the following key activities:
- Explore and process the MVTec Anomaly Detection Dataset.
- Apply ImageNet-pretrained models from [Torchvision](https://pytorch.org/vision/stable/models.html) to detect anomalies in industrial products.
- Evaluate the performance of the models to understand their effectiveness in real-world industrial applications.

### Objectives

- Understand the principles of anomaly detection in the context of industrial applications.
- Learn how to implement and utilize ImageNet-pretrained models for detecting anomalies.
- Analyze and interpret the results of the anomaly detection models to assess their practicality in industrial settings.

### Dataset

The MVTec AD Dataset is a comprehensive collection of high-resolution images across different categories of industrial products, such as bottles, cables, and metal nuts, each with various types of defects. This dataset is pivotal for developing and benchmarking anomaly detection algorithms. You can download our lab's dataset [here](https://drive.google.com/file/d/19600hUOpx0hl78TdpdH0oyy-gGTk_F_o/view?usp=share_link). You can drop downloaded data and drop to colab, or you can put into yor google drive.

### References
- [MVTec AD Dataset](https://www.kaggle.com/datasets/ipythonx/mvtec-ad/data) for the dataset used in this lab.
- [Torchvision Models](https://pytorch.org/vision/stable/models.html) for accessing ImageNet-pretrained models to be used in anomaly detection tasks.
- [State-of-the-Art Anomaly Detection on MVTec AD](https://paperswithcode.com/sota/anomaly-detection-on-mvtec-ad) for insights into the latest benchmarks and methodologies in anomaly detection applied to the MVTec AD dataset.
- [CVPR 2019: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection] for the original paper of MVTec AD dataset.

In [None]:
import glob
import matplotlib.pyplot as plt
import random
from tqdm.auto import tqdm
import cv2
import numpy as np
import os

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Set the category to extract (e.g., bottle, cable, etc.)
category = "bottle"
image_path = f"/content/drive/MyDrive/Colab Notebooks/{category}"

# Define root paths
drive_root = "/content/drive/MyDrive/Colab Notebooks/bottle"
train_dir = os.path.join(drive_root, "train")
test_dir = os.path.join(drive_root, "test")

# Get train and test image paths
train_images = glob.glob(f"{train_dir}/good/*.png")
test_images = glob.glob(f"{test_dir}/**/*.png", recursive=True)

# Count classes (excluding 'good')
defect_classes = [d for d in os.listdir(test_dir) if d != 'good']
num_defect_classes = len(defect_classes)

# Example image to get dimensions
sample_img = cv2.imread(train_images[0])
height, width, channels = sample_img.shape

# Output
print("📊 Dataset Summary for 'bottle':")
print(f"• Number of defect classes: {num_defect_classes}")
print(f"• Types of defect classes: {defect_classes}")
print(f"• Total images used: {len(train_images) + len(test_images)}")
print(f"    - Training images (only 'good'): {len(train_images)}")
print(f"    - Test images (good + defective): {len(test_images)}")
print(f"• Image dimensions: {width} x {height} x {channels}")


In [None]:
file_paths = glob.glob("/content/drive/MyDrive/Colab Notebooks/bottle/**/*/*.png", recursive=True)

In [None]:
all_data = []

for img in tqdm(file_paths):
    img = cv2.imread(img)
    img = img[..., ::-1]
    all_data.append(img)

all_data = np.stack(all_data)
print(all_data.shape)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Define your test directory
test_dir = "/content/drive/MyDrive/Colab Notebooks/bottle/test"

# Get class names from the test folder (e.g., good, broken_large, etc.)
classes = sorted([d for d in os.listdir(test_dir) if os.path.isdir(os.path.join(test_dir, d))])
print(f'Classes: {classes}')

# Show 2 images from each class
fig, axs = plt.subplots(len(classes), 2, figsize=(6, 4 * len(classes)))

for i, class_name in enumerate(classes):
    class_folder = os.path.join(test_dir, class_name)
    images = sorted(glob.glob(f"{class_folder}/*.png"))[:2]  # get first 2 images

    for j, img_path in enumerate(images):
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        axs[i, j].imshow(img)
        axs[i, j].axis('off')
        axs[i, j].set_title(f'{class_name}')

plt.tight_layout()
plt.show()

## A. Data Loading and Preprocessing

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset, random_split
import numpy as np
from sklearn.model_selection import train_test_split

# Paths
train_dir = "/content/drive/MyDrive/Colab Notebooks/bottle/train/good"
test_dir = "/content/drive/MyDrive/Colab Notebooks/bottle/test"

# --- Load Training Data (Good + Defect) ---
train_x = []
train_y = []

# Good training images (label 0)
good_train = glob.glob(os.path.join(train_dir, "*.png"))
for path in good_train:
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    train_x.append(img)
    train_y.append(0)

# Defect training images from test folder (label 1)
defect_classes = ['broken_large', 'broken_small', 'contamination']
for cls in defect_classes:
    defect_imgs = sorted(glob.glob(f"{test_dir}/{cls}/*.png"))[:10]
    for path in defect_imgs:
        img = cv2.imread(path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        train_x.append(img)
        train_y.append(1)

x_train = np.transpose(np.array(train_x), (0, 3, 1, 2))  # (N, C, H, W)
y_train = np.array(train_y)

print("✅ Training label counts:", np.bincount(y_train))


# --- Load Validation Data (Good + Defect from full test set) ---
x_val = []
y_val = []

for cls in os.listdir(test_dir):
    label = 0 if cls == "good" else 1
    for path in glob.glob(f"{test_dir}/{cls}/*.png"):
        img = cv2.imread(path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        x_val.append(img)
        y_val.append(label)

x_val = np.transpose(np.array(x_val), (0, 3, 1, 2))  # (N, C, H, W)
y_val = np.array(y_val, dtype=np.int64)

# --- Output shape info ---
print(f'x_train: {x_train.shape}, y_train: {y_train.shape}')
print(f'x_val: {x_val.shape}, y_val: {y_val.shape}')

In [None]:
from torchvision import transforms
from PIL import Image
from torch.utils.data import Dataset

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

class MyDataset(Dataset):
    def __init__(self, x, y, transform=None):
        self.x = x
        self.y = torch.from_numpy(y).long()
        self.transform = transform

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        img = np.transpose(self.x[idx], (1, 2, 0))  # (H, W, C)
        img = Image.fromarray(img.astype(np.uint8))  # convert to PIL image
        if self.transform:
            img = self.transform(img)
        return img, self.y[idx]

In [None]:
batch_size = 32

train_dataset = MyDataset(x_train, y_train, transform=train_transforms)
val_dataset = MyDataset(x_val, y_val, transform=val_transforms)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)


In [None]:
import torch

# Simulate a dummy input (batch size = 1, 3 channels, 224x224 image)
dummy_input = torch.randn(1, 3, 224, 224)

# Define temp CNN blocks only up to the flattening part
conv1 = nn.Conv2d(3, 16, 3, padding=1)
pool = nn.MaxPool2d(2, 2)
conv2 = nn.Conv2d(16, 32, 3, padding=1)
conv3 = nn.Conv2d(32, 64, 3, padding=1)

with torch.no_grad():
    x = pool(F.relu(conv1(dummy_input)))  # -> 16x112x112
    x = pool(F.relu(conv2(x)))            # -> 32x56x56
    x = pool(F.relu(conv3(x)))            # -> 64x28x28
    print("Flattened size after conv/pool:", x.view(1, -1).shape[1])



## B. Defining Neural Networks

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class BaselineCNN(nn.Module):
    def __init__(self):
        super(BaselineCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)

        # Flatten size = 64 × 28 × 28 = 50176 (if input is 224x224)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(64 * 28 * 28, 128)
        self.dropout = nn.Dropout(p=0.5)
        self.fc2 = nn.Linear(128, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)              # 👈 Use dropout in forward pass
        x = self.fc2(x)
        return x


## C. Training the Neural Network

In [None]:
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
from tqdm.auto import tqdm

train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

epochs = 50
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

best_val_loss = float('inf')
best_val_acc = -1

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
lr_scheduler = CosineAnnealingLR(optimizer, T_max=len(train_loader)*epochs, eta_min=0)

for epoch in tqdm(range(epochs)):
    model.train()
    total_loss = 0.0
    train_correct = 0
    total_train_samples = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        train_predicted = outputs.argmax(-1)
        train_correct += (train_predicted == labels).sum().item()
        total_train_samples += labels.size(0)

    avg_train_loss = total_loss / len(train_loader)
    train_accuracy = 100. * train_correct / total_train_samples

    # Validation
    model.eval()
    total_val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            total_val_loss += loss.item()

            predicted = outputs.argmax(-1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

    avg_val_loss = total_val_loss / len(val_loader)
    val_accuracy = 100. * correct / total

    lr_scheduler.step()

    # Save best model
    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss

    if val_accuracy > best_val_acc:
        best_val_acc = val_accuracy
        torch.save(model.state_dict(), 'model_classification.pth')

    print(f'Epoch {epoch+1}/{epochs}, Train loss: {avg_train_loss:.4f}, Train acc: {train_accuracy:.2f}%, Val loss: {avg_val_loss:.4f}, Val acc: {val_accuracy:.2f}%, Best Val acc: {best_val_acc:.2f}%')

    train_losses.append(avg_train_loss)
    train_accuracies.append(train_accuracy)
    val_losses.append(avg_val_loss)
    val_accuracies.append(val_accuracy)



### Visualizing model performance

In [None]:
print("Train label counts:", np.bincount(y_train))
print("Val label counts:", np.bincount(y_val))


In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

# Plotting training and validation accuracy
ax[0].plot(train_accuracies, label='Train')
ax[0].plot(val_accuracies, label='Val')
ax[0].set_title('Model Accuracy')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('Accuracy (%)')
ax[0].legend()
ax[0].grid(True)
ax[0].set_ylim(0, 100)

# Plotting training and validation loss
ax[1].plot(train_losses, label='Train')
ax[1].plot(val_losses, label='Val')
ax[1].set_title('Model Loss')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Loss')
ax[1].legend()
ax[1].grid(True)

plt.tight_layout()
plt.savefig("baselineCNN_training_plot.png")
plt.show()


## D. Evaluating Your Trained Model

### Load Trained Model and Evaluate

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the trained model to the right device
model.load_state_dict(torch.load('model_classification.pth', map_location=device))
model.to(device)
model.eval()

test_correct = 0
test_total = 0

with torch.no_grad():
    for images, labels in val_loader:
        images = images.to(device)
        labels = labels.to(device).long()

        outputs = model(images)
        predicted = outputs.argmax(-1)

        # Optional: comment these if too many
        # print("🔍 Predicted:", predicted)
        # print("🎯 Ground truth:", labels)

        test_correct += (predicted == labels).sum().item()
        test_total += labels.size(0)

test_acc = 100. * test_correct / test_total
print(f'✅ Final Test Accuracy: {test_acc:.2f}%')

