GitHub - saeedahmadicp/KAN-CL-ViT: Continual Learning with KAN based ViT to mitigate catastrophic forgetting

Investigating the Strengths and Limitations of KAN in Continual Learning

Description

This project investigates the performance of different variants of Kolmogorov-Arnold Networks (KAN) [1] for image classification tasks in a continual learning setting. We conducted two sets of experiments: one using standalone Multi-Layer Perceptrons (MLPs) and KAN variants, and another integrating KAN into the Vision Transformer (ViT) [2] architecture.

The experiments were carried out on the MNIST and CIFAR100 datasets, which were divided into multiple tasks to simulate a continual learning scenario. The datasets were split as follows:

MNIST Experiments (MLP and KAN)

Total Classes: 10
Number of Tasks: 5
Classes per Task: 2
Epochs for Task 1: 7
Epochs for Remaining Tasks: 5

CIFAR100 Experiments (ViT with MLP and KAN)

Total Classes: 100
Number of Tasks: 10
Classes per Task: 10
Epochs for Task 1: 25
Epochs for Remaining Tasks: 10

The primary objective was to investigate the strengths and limitations of KAN in a continual learning setting, where the model must learn new tasks while retaining knowledge from previously learned tasks. By comparing the performance of KAN variants with traditional MLPs and integrating KAN into the ViT architecture, we aimed to gain insights into the potential advantages and drawbacks of using KAN for continual learning tasks.

Key Findings

Here are the findings of our project:

In the case of standalone MLP and KAN experiments, the KAN model demonstrates superior performance on the CL task and shows better resistance to forgetting the previous knowledge while learning the new task.
While for the KAN-ViT, there was a slight improvement in the overall average incremental accuracy, especially in the early incremental stages, however, in later stages, the performance of both the MLP and KAN-ViT remains the same, as demonstrated by the below graph

Setup

Install the requirements

pip install requirements.txt

and if in case there is still an issue installing the cudatoolkit and GPU version, then refer to the site Pytorch or you can use the below commands for installing the Pytorch along with cudatoolkit

Linux / Window

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Mac

pip3 install torch torchvision torchaudio

After the installation, use the below scripts for running the specific version of the code:

Use the below script for running the base version of the MLP and KAN

python KAN_vs_MLP.py

Use the below script for running the base version of the MLP and KAN with CNN layers

python cnn_KAN.py

Use the below script for running the standalone MLP and KAN for class-based CL

python CL_KAN_vs_MLP.py

Use the below script for running the ViT-based MLP and KAN architectures for class-based CL

python CL_ViT_MLP_vs_KAN.py

Future Work

Based on the findings and observations from this project, several potential future directions and improvements can be explored:

Running the experiments across more complex datasets to further evaluate the performance and scalability of KAN in continual learning scenarios.
Building on top of the base implementation of KAN and introducing replay mechanisms to further minimize the impacts of forgetting and improve incremental learning capabilities.
Experimenting with various parameter regularization techniques to build on the promise of KANs to mitigate catastrophic forgetting.
Using proper scheduling strategies to ensure consistent learning across all incremental stages.
CL and incremental learning can be further improved by selectively updating and masking the activations and neurons to mitigate catastrophic forgetting.

By exploring these avenues, researchers can potentially unlock the full potential of KAN for continual learning tasks and contribute to the advancement of this field.

References

[1] Liu, Ziming, et al. "Kan: Kolmogorov-arnold networks." arXiv preprint arXiv:2404.19756 (2024).
[2] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
efficientkan		efficientkan
fastkan		fastkan
graphs		graphs
init_results		init_results
kan		kan
models		models
results		results
.gitignore		.gitignore
CL_KAN_vs_MLP.py		CL_KAN_vs_MLP.py
CL_ViT_MLP_vs_KAN.py		CL_ViT_MLP_vs_KAN.py
KAN_vs_MLP.py		KAN_vs_MLP.py
LEARN.md		LEARN.md
LICENSE		LICENSE
README.md		README.md
cl_plot.py		cl_plot.py
cnn_KAN.py		cnn_KAN.py
continual_learning_trainer.py		continual_learning_trainer.py
requirements.txt		requirements.txt
timer.py		timer.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating the Strengths and Limitations of KAN in Continual Learning

Description

MNIST Experiments (MLP and KAN)

CIFAR100 Experiments (ViT with MLP and KAN)

Key Findings

Setup

Future Work

References

About

Releases

Packages

Contributors 2

Languages

License

saeedahmadicp/KAN-CL-ViT

Folders and files

Latest commit

History

Repository files navigation

Investigating the Strengths and Limitations of KAN in Continual Learning

Description

MNIST Experiments (MLP and KAN)

CIFAR100 Experiments (ViT with MLP and KAN)

Key Findings

Setup

Future Work

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages