###Convolutional Kolmogorov-Arnold Network (CKAN)

The CKAN introduction and the experiments are explained in this paper (https://arxiv.org/pdf/2406.13155)

####Agenda
1. Installation
2. What is a KAN?
3. What is a KAN Convolution?
  1. Idea in the nutshell: KAN Convolutions are very similar to convolutions, but instead of applying the dot product between the kernel and the corresponding pixels in the image, we apply a Learnable Non Linear activation function to each element, and then add them up.
  2. Convolution from scratch: https://github.com/detkov/Convolution-From-Scratch/


4. Parameters in a KAN Convolution
5. Results of convolutional layers with KAN
6. Conclusion

7. Work in progress

    A. Experiments on more complex datasets.

    B. Hyperparameter tuning with Random Search.
    
    C. Experiments with more architectures.
    
    D. Dinamically updating grid ranges.


In [1]:
#cloning the CKAN git repository
!git clone https://github.com/AntonioTepsich/Convolutional-KANs.git

Cloning into 'Convolutional-KANs'...
remote: Enumerating objects: 1539, done.[K
remote: Counting objects: 100% (385/385), done.[K
remote: Compressing objects: 100% (158/158), done.[K
remote: Total 1539 (delta 253), reused 351 (delta 224), pack-reused 1154[K
Receiving objects: 100% (1539/1539), 33.38 MiB | 12.09 MiB/s, done.
Resolving deltas: 100% (727/727), done.


In [2]:
#this directory contains several files and directories, such as architectures_28x28, images, kan_convolutional, requirements.txt, etc.
%cd Convolutional-KANs
#installing the necessary packages
!pip install -r requirements.txt

/content/Convolutional-KANs
Collecting matplotlib==3.6.2 (from -r requirements.txt (line 1))
  Downloading matplotlib-3.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting pandas==2.2.2 (from -r requirements.txt (line 3))
  Downloading pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting scikit-learn==1.4.2 (from -r requirements.txt (line 4))
  Downloading scikit_learn-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting tqdm==4.66.4 (from -r requirements.txt (line 5))
  Downloading tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m863.5 kB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: Could not find a version that satisfies the requirement torch==2.3.0+cu118 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1

In [3]:
#loading necessary libraries
%load_ext autoreload
%autoreload 2
import matplotlib.pyplot as plt
from tqdm import tqdm
import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from architectures_28x28.CKAN_BN import CKAN_BN
from architectures_28x28.SimpleModels import *
from architectures_28x28.ConvNet import ConvNet
from architectures_28x28.KANConvs_MLP import KANC_MLP
from architectures_28x28.KKAN import KKAN_Convolutional_Network
from architectures_28x28.conv_and_kan import NormalConvsKAN
from kan_convolutional.KANConv import KAN_Convolutional_Layer

###Accessing the dataset

In [4]:
#defining transformations for the MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    #normalizing to [-1, 1]
    transforms.Normalize((0.5,), (0.5,))
])

#loading the MNIST dataset
train_dataset = MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = MNIST(root='./data', train=False, download=True, transform=transform)

#creating data loaders for training and testing
#dataLoader (refer: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 32134474.19it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 1031258.04it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 8932536.59it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3320064.27it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






###Model training

In [5]:
#implementation of KAN Convolution & 2 Layer MLP

class KANC_MLP(nn.Module):
    def __init__(self, device: str = 'cpu'):
        super(KANC_MLP, self).__init__()

        #setting first convolutional layer using KAN_Convolutional_Layer
        self.conv1 = KAN_Convolutional_Layer(
            n_convs=5,
            kernel_size=(3, 3),
            device=device
        )

        #setting second convolutional layer
        self.conv2 = KAN_Convolutional_Layer(
            n_convs=5,
            kernel_size=(3, 3),
            device=device
        )

        #setting max pooling layer
        self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))

        #setting flatten layer to convert 2D feature maps to 1D vector
        self.flat = nn.Flatten()

        #setting fully connected layers
        self.linear1 = nn.Linear(625, 256)
        self.linear2 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool1(x)
        x = self.flat(x)
        x = self.linear1(x)
        x = self.linear2(x)
        x = F.log_softmax(x, dim=1)
        return x


In [6]:
#checking if GPU is available and use it if possible
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#initializing the model and move it to the appropriate device
model_kanc = KANC_MLP(device=device).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_kanc.parameters(), lr=0.001)


Train the model for one epoch

    Arguments:
        1. model: the neural network model
        2. device: cuda or cpu
        3. train_loader: DataLoader for training data
        4. optimizer: the optimizer to use (e.g. SGD)
        5. epoch: the current epoch
        6. criterion: the loss function (e.g. CrossEntropy)

    Returns:
        avg_loss: the average loss over the training set

In [7]:
#setting epochs for training
epochs = 1

#training the model
for epoch in range(epochs):
    model_kanc.train()
    total_loss = 0.0
    for images, labels in train_loader:
        #moving labels and images to the device (GPU or CPU)
        images, labels = images.to(device), labels.to(device)
        #zero the parameter gradients
        optimizer.zero_grad()
        #forward pass
        outputs = model_kanc(images)
        #calculating the loss
        loss = criterion(outputs, labels)
        #backward pass and optimize
        loss.backward()
        optimizer.step()
        #accumulating the loss for reporting
        total_loss += loss.item()

    #printing the average loss for the epoch
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {total_loss/len(train_loader):.4f}")


Epoch [1/1], Loss: 0.1505


###Model evaluation

In [9]:
#evaluating the model
model_kanc.eval()
test_loss = 0
correct = 0
all_targets = []
all_predictions = []

with torch.no_grad():
  for images, labels in test_loader:
    images, labels = images.to(device), labels.to(device)
    #getting the predicted classes for this batch
    output = model_kanc(images)
    #calculating the loss for this batch
    test_loss += criterion(output, labels).item()
    #calculating the accuracy for this batch
    _, predicted = torch.max(output.data, 1)
    correct += (labels == predicted).sum().item()
    #collecting all targets and predictions for metric calculations
    all_targets.extend(labels.view_as(predicted).cpu().numpy())
    all_predictions.extend(predicted.cpu().numpy())

#normalizing test loss
test_loss /= len(test_loader.dataset)
#calculating accuracy
accuracy = correct / len(test_loader.dataset)
#calculating overall metrics
precision = precision_score(all_targets, all_predictions, average='macro')
recall = recall_score(all_targets, all_predictions, average='macro')
f1 = f1_score(all_targets, all_predictions, average='macro')

print('\nTest set:\n Accuracy: {:.2f}%, \n Precision: {:.2f}, \n Recall: {:.2f}, \n F1 Score: {:.2f}\n'.format(accuracy, precision, recall, f1))


Test set:
 Accuracy: 0.98%, 
 Precision: 0.98, 
 Recall: 0.98, 
 F1 Score: 0.98

