
# Histopathological Image Analysis Competition Overview

Welcome to the Histopathological Image Analysis Competition! In this competition, you will be working with histopathological images to complete a series of tasks designed to assess your skills in data preprocessing, image classification, generative networks, and graph-based networks.

## Tasks

1. **Load, preprocess, and augment data**
2. **Classify images as cancerous or non-cancerous**
3. **Generate cancerous images from non-cancerous images using generative networks**
4. **Utilize Graph Convolutional Networks (GCNs) to analyze nucleus features**

## Important Information

- **Platform:** Google Colab
- **Dataset:** [Histopathological Cancer Detection - Cropped](https://drive.google.com/drive/folders/1T4De029U-OJAEEHCbym_efc2mdFox6S5?usp=sharing)
(Additionally can be found on kaggle https://www.kaggle.com/datasets/drbeane/hcd-cropped/data)
- **Code Examples:** [Kaggle Code Examples](https://www.kaggle.com/datasets/drbeane/hcd-cropped/code)
- **Submission Deadline:** Before Friday
- **Reference Paper** (needed for step 4 and maybe 3): [Learning Shape-Aware Features with Generative Models for Nuclei Classification](https://arxiv.org/abs/2302.11416)
- **Reference Code from the paper:** [SENUCLS](https://github.com/Lewislou/SENUCLS/tree/main)

## Evaluation Criteria

The evaluation for this assignment is **not based on accuracy** or **performance metrics** but on **how you approach the problem**. Since the problem statement is very abstract and can be solved in multiple ways, your methodology and creativity are key.

## Tasks Details

### Task 1: Load, preprocess, and augment data

Use only 500 samples from the training data, idc about the accuracy and performance so don't waste time on that :)

### Task 2: Classify if the image is cancerous or non-cancerous

Use any classification model of your choice. Don't spend too much time on this; focus on the approach **after step 2.**

### Task 3: Generate cancerous images from non-cancerous images using generative networks
Here's where the competition begins,
Use Any type of generative network (i'd prefer gans & diffusion models) to take the NON CANCEROUS images as input and generate an output of how this image would look like if it was cancerous.
(eg: Non-cancerous image -> GAN -> Cancerous image)
Don't focus on tuning the gan and increasing the accuracy I will mainly evaluate the approach.



### Task 4: Utilize Graph Convolutional Networks (GCNs) to analyze nucleus features

Incorporate strategies from the reference paper and,
Utilize GCN / graph based networks.
The key idea is to learn nucleus features based on structure, texture, and edge at the nuclei level first and then move to inter-nuclear graphs (see paper;mentioned above).


## ** Important Note. **
Now i get that this might be too much to process but take your time, use any AI or reference or tool you want to and *please please please* dont focus on accuracy or metrics (this doesnt mean that you get reckless metrics but the bare minimum would work, dont tune the results instead focus on the approach), I want to evaluate your approach. If you've completed everything and have time before friday then sure feel free to improve the metrics for brownie points.





# Starter Code

## Obviously this is the VERY BASIC structure of the code with minimal code that probably doesnt make sense, this is for you to get started with.

## Write your own code and formulate your own approach, feel free to delete this code if you wish, i suggest NOT using it

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
import os
from PIL import Image

In [3]:
import zipfile
import os

zip_path = '/content/drive/MyDrive/selection_competition/train.zip'
extract_path = '/content/'

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

# Step 3: Preprocess the images

import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

# Load Data
data_dir = '/content/drive/MyDrive/selection_competition/'
labels_csv = os.path.join(data_dir, 'train_labels.csv')
labels_df = pd.read_csv(labels_csv)


# Set the directory where images are extracted
data_dir = extract_path

# Preprocess Data
def preprocess_image(image_path):
    try:
        image = Image.open(image_path)
        image = image.resize((128, 128))
        image = np.array(image) / 255.0
        return image
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        return None

# Create Dataset
images = []
labels = []
for i, row in labels_df.iterrows():
    if i >= 500:
        break
    image_path = os.path.join(data_dir+"train/", row['id'] + '.tif')
    if os.path.exists(image_path):
        image = preprocess_image(image_path)
        if image is not None:
            images.append(image)
            labels.append(row['label'])
    else:
        #print(f"Image {image_path} does not exist.")
        pass
images = np.array(images)
labels = np.array(labels)

# Debug: Print the shape of the loaded images and labels
print(f"Loaded {len(images)} images and {len(labels)} labels.")
print(f"Image shape: {images.shape}")
print(f"Labels shape: {labels.shape}")

Loaded 500 images and 500 labels.
Image shape: (500, 128, 128, 3)
Labels shape: (500,)


In [4]:
# Data Augmentation
datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True)
datagen.fit(images)

# Split Data
X_train, X_val, y_train, y_val = train_test_split(images, labels, test_size=0.2, random_state=42)

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Build Model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train Model
model.fit(datagen.flow(X_train, y_train, batch_size=32), validation_data=(X_val, y_val), epochs=2)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x7d948ec3fe20>

In [7]:
from tensorflow.keras.layers import Input, Reshape, Dense, Flatten, Dropout, Activation
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import BatchNormalization, LeakyReLU, UpSampling2D, Conv2D

# Build Generator
def build_generator():
    model = Sequential()
    model.add(Dense(128 * 32 * 32, activation="relu", input_dim=100))
    model.add(Reshape((32, 32, 128)))
    model.add(UpSampling2D())
    model.add(Conv2D(128, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(UpSampling2D())
    model.add(Conv2D(64, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Conv2D(3, kernel_size=3, padding="same"))
    model.add(Activation("tanh"))
    return model

# Build Discriminator
def build_discriminator():
    model = Sequential()
    model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=(128, 128, 3), padding="same"))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    return model

# Compile GAN
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

gan_input = Input(shape=(100,))
generated_image = generator(gan_input)
discriminator.trainable = False
gan_output = discriminator(generated_image)

gan = Model(gan_input, gan_output)
gan.compile(loss='binary_crossentropy', optimizer='adam')


In [10]:
# !pip install torch_geometric


In [11]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data

# Example data preparation for GCN
edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]], dtype=torch.long)  # Example edge index
x = torch.tensor([[1], [1], [1]], dtype=torch.float)  # Example node features

# Define GCN
class GCN(nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(1, 16)
        self.conv2 = GCNConv(16, 2)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = nn.ReLU()(x)
        x = self.conv2(x, edge_index)
        return nn.LogSoftmax(dim=1)(x)

# Prepare data for GCN
data = Data(x=x, edge_index=edge_index)

# Train GCN
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN().to(device)
data = data.to(device)
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = criterion(out, torch.tensor([0, 1, 0], dtype=torch.long).to(device))
    loss.backward()
    optimizer.step()

print('Finished Training')

Finished Training
