# GraphSAIL

We observe performance degradation when the recommendation model stops updating in a mainstream recommendation service which may lead to significant loss of revenue and a poor user experience. However, directly retraining the model using only the recent records often causes a catastrophic forgetting problem, with the model losing track of the key user information needed to capture long term preferences. **Incremental learning** provides one direction for tackling this problem. Incremental learning uses the most recent data to update the current model, but is designed to prevent substantial forgetting. This significantly improves training efficiency without extra computational resource and meanwhile prevents model performance degradation.

There are three main lines of work for incremental learning: experience replay (reservoir), parameter isolation and regularization based methods. Reservoir methods use an additional data reservoir to store the most representative historical data and replay it while learning new tasks to alleviate forgetting. Parameter isolation trains distinct models for different tasks, but leads to continual growth of the model size which is not favourable for training large-scale systems. Regularization based approaches aim to consolidate previous knowledge by introducing regularization terms in the loss when learning on new data.

**Knowledge distillation** techniques were proposed initially to transfer knowledge from a large and complex model into a smaller distilled model. Subsequently, it became common to use knowledge distillation to address the catastrophic forgetting issue in incremental training. The essence of knowledge distillation for incremental learning is to use a teacher model trained on the data acquired from the old tasks or history data and a student model trained on the knowledge acquired from the new tasks or new data. When training the student, a distillation metric is applied to the loss, in order to retain the knowledge acquired by the teacher model.

GraphSAIL mainly investigate the regularization based approach. In contrast to existing knowledge distillation methods that focus only on the prediction or the middle activation, GraphSAIL explicitly distills knowledge in order to preserve the graph topological information learned by the teacher model and transfer it to the student model. Its distillation processes can be categorized into three components: First, a local structure distillation mechanism is proposed to preserve a user’s long-term preference and an item’s long-term characteristics by regularizing the local neighborhood representation for each node. Second, a novel global structure distillation strategy is proposed to encode the global position for each user and item node. Third, a general degree-aware self-embedding distillation component is applied to regularize the learned user and item embedding with a fixed quadratic constraint between the embedding learned on the history data and the embedding learned from new data. Self-embedding distillation prevents drastic changes of each individual embedding vector. To preserve the topological information, we propose local and global structure distillation modules that permit knowledge transfer that explicitly accounts for the topological semantics.

<p><center><figure><img src='_images/L958254_1.png'><figcaption>Global structure distillation demonstration.</figcaption></figure></center></p>

<p><center><figure><img src='_images/L958254_2.png'><figcaption>The overall performance comparison. Results are averages of 10 repeated experiments. %improv is the relative improvement with respect to the fine-tune result. Training time is the average training time for one incremental block. ⋆ denotes p < 0.05 when performing the two-tailed pairwise t-test on GraphSAIL with the best baseline</figcaption></figure></center></p>