Abstract • Experiments • Installation • Usages • Result • Citation
This is the official repository for the paper:
Lifelong learning with Cycle Memory Networks
published on IEEE Transactions on Neural Networks and Learning Systems.2023, DOI:10.1109/TNNLS.2023.3294495.
Abstract: Learning from a sequence of tasks for a lifetime is essential for an agent toward artificial general intelligence. Despite the explosion of this research field in recent years, most work focuses on the well-known catastrophic forgetting issue. In contrast, this work aims to explore knowledge-transferable lifelong learning without storing historical data and significant additional computational overhead. We demonstrate that existing data-free frameworks, including regularization-based single-network and structure-based multinetwork frameworks, face a fundamental issue of lifelong learning, named anterograde forgetting, i.e., preserving and transferring memory may inhibit learning new knowledge. We attribute it to the fact that the learning network capacity decreases while memorizing historical knowledge and conceptual confusion between the irrelevant old knowledge and the current task. Inspired by the complementary learning theory in neuroscience, we endow artificial neural networks with the ability to continuously learn without forgetting while recalling historical knowledge to facilitate learning new knowledge. Specifically, this work proposes a general framework named cycle memory networks (CMNs). The CMN consists of two individual memory networks to store short- and long-term memories separately to avoid capacity shrinkage and a transfer cell between them. It enables knowledge transfer from the long-term to the short-term memory network to mitigate conceptual confusion. In addition, the memory consolidation mechanism integrates short-term knowledge into the long-term memory network for knowledge accumulation. We demonstrate that the CMN can effectively address the anterograde forgetting on task-related, task-conflict, class-incremental, and cross-domain benchmarks. Furthermore, we provide extensive ablation studies to verify each framework component.
Overview of our method and results
This code is the experimental code of CMN, which includes the Cycle Memory code as well as the comparison method code (One, Joint, fine-tuning, PNN, LwF, EWC, DGR, HNet). Each method will learn a sequence of 10 subtasks split from the CIFAR-100 dataset. All methods except HNet use ResNet-18 as the backbone, and HNet uses ResNet-20 as the backbone.
All experiments used the stochastic gradient descent optimisation strategy with an initialized learning rate {1, 0.1, 0.001}. The momentum was 0.9, and the weight decay was 1 × 10^−5. The grid search was utilized to get the optimal result. The training epoch was 100. Besides, all models’ weights were initialized with kaiming_uniform. The mini-batch size was set to 512 or 1024 in this experiment.
All experiments were implemented on 8 pieces of A100s with 40G RAM. The deep learning frame-
work torch 1.8.1
was used for the experiments. The requirements.txt
file contains all the project's dependent installation packages, which can be quickly installed using the following command.
$ pip install -r requirements.txt
For each method, run with the following command in that method directory:
python main.py
You diy related hyper-parameters in the './main.py'
file.
The test results are saved in each method directory's result
folder.
One | Joint | fine-tuning | PNN | LwF | EWC | DGR | HNet | CMN | |
---|---|---|---|---|---|---|---|---|---|
ACC | 0.7788 | 0.7423 | 0.1745 | 0.816 | 0.3398 | 0.44 | 0.0712 | 0.4003 | 0.8402 |
FWT | 0 | \ | -0.001667 | 0.04988889 | -0.20589 | -4.76 | -0.414 | -0.32989 | 0.10088889 |
- fine-tuning:Fine-tuning Deep Neural Networks in Continuous Learning Scenarios https://pub.inf-cv.uni-jena.de/pdf/Kaeding16_FDN.pdf
- EWC:Overcoming catastrophic forgetting in neural networkshttps://doi.org/10.1073/pnas.1611835114
- LwF:Learning without Forgetting https://arxiv.org/abs/1606.09282
- PNN:Progressive Neural Networks https://arxiv.org/abs/1606.04671
- DGR :Continual Learning with Deep Generative Replay https://arxiv.org/abs/1705.08690
- HNet:Continual learning with hypernetworks https://arxiv.org/abs/1906.00695
More details and supplement experimental results can be found at CMN_supplements.
If you like our work, please cite the following formation,
@ARTICLE{10197260,
author={Peng, Jian and Ye, Dingqi and Tang, Bo and Lei, Yinjie and Liu, Yu and Li, Haifeng},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Lifelong Learning With Cycle Memory Networks},
year={2023},
volume={},
number={},
pages={1-14},
keywords={Task analysis;Knowledge engineering;Learning (artificial intelligence);Learning systems;Neuroscience;Microprocessors;Knowledge transfer;Anterograde forgetting;catastrophic forgetting;complementary learning theory;cycle memory network (CMN);lifelong learning},
doi={10.1109/TNNLS.2023.3294495}}