Repository for Semi-Supervised Knowledge Amalgamation for Sequence Classification (SKA), AAAI 2021. This repository contains the implementation of the first solution for SKA, named Teacher Coordinator (TC).
- main.py : Main code for TC training
- model.py : Supporting models
- utils.py : Supporting functions
- requirements.txt : Library requirements
Prepared folders:
- data : contains training data
- teachers : contains teacher models
- output : directory for training logs and the final student model if saved
- plot : directory for training loss plots
Note that all datasets and the teacher models used in the paper can be found here: https://drive.google.com/drive/folders/19GdTofgJidCZG_1XWYetzchG_XMOvOiv?usp=sharing
Run script as:
python main.py -expname syn_exp1 -t_model ./teachers/exp1_syn_t1.sav ./teachers/exp1_syn_t2.sav \
-t_numclass 4 4 -t_class 1 2 3 4 3 4 5 6 -s_class 1 2 3 4 5 6 -data ./data/SYN/syn_test.txt
Parameters:
-
Required:
- -t_model : a list of paths of teacher models
- -t_numclass : the number of classes corresponding to t_model
- -t_class : a list of specialized classes of each teacher, concatenated in correspond to t_model , e.g., t1_class: 1 2 3 4 and t2_class: 3 4 5 6, then t_class: 1 2 3 4 3 4 5 6
- -s_class : a list of comprehensive classes of the student
- -data : the path of student training data file
- -expname : experiment name
-
Student network:
- -lr : learning rate, default 0.01
- -ep : epochs, default 200
- -bs : batch size, default 8
- -layers : #layers, default 2
- -hiddim : #hidden units, default 8
-
TTL network:
- -lrTTL : learning rate, default 0.01
- -epTTL : epochs, default 500
- -bsTTL : batch size, default 8
- -layersTTL : #layers, default 2
- -hiddimTTL : #hidden units, default 8
-
Others:
- -inputsize : #features, default 1
- -seed : set seed for reproduction, default 0
- -plabel : proportion of available labeled data (range = [0,1]), default 0.02
- --save : boolean parameters, whether to save the student model, default false
we have experimented with two Teacher models on the SYN dataset with 2% labeling, as described below. We found that the values computed at each step exactly match the intuition of the method. We describe these case studies below.
In this experiment, we have access to two Teachers. Teacher 1 (T1) specializes in Classes A, B, C, and D while Teacher 2 (T2) specializes in Classes C, D, E, and F. Their expertise overlaps only on Classes C and D.
First, we showcase the Teacher Trust Learner (TTL) overcoming an overconfident teacher via an example from Class E, on which only T2 is an expert. T1 predicts P(y_j | y_j \in Y_k, X) = [0, 0, .99, 0, 0, 0] (confidently-wrong prediction of Class C), while T2 predicts [0, 0, 0, 0, .99, 0] (confidently-correct prediction of Class E). Then, the TTL predicts P(y_j \in Y_k∣X) = [.27,.73], indicating that T2 should be trusted more than T1 (correctly). Finally, rescaling via P(y_j ∣y_j \in Y_k, X)P(y_j \in Y_k∣X) and combining the teachers’ predictions: .27[0, 0, .99, 0, 0,0] + .73[0, 0, 0, 0, .99, 0] = [0, 0, .27, 0, .72, 0], which serves as the surrogate target for the student network. This example clearly shows the TTL overcoming an overconfident teacher (T1) to provide a good surrogate target.
Second, we showcase the TTL effectively preserving accurate predictions for an instance for which both Teachers are experts. On an instance from Class D, T1 predicts P(y_j | y_j \in Y_k, X) = [0, 0, 0, .99, 0, 0] (correct), while T2 also predicts [0, 0, 0, .99, 0, 0] (correct). Then, the TTL predicts P(y_j \in Y_k∣X) = [.5, .5], indicating the teachers should be trusted equally (correct). Finally, rescaling and combining the teachers’ predictions we have the perfect surrogate label: [0, 0, 0, .99, 0, 0].
Third, we showcase an instance from Class A, where Teacher 1 is an expert but Teacher 2 is neither an expert nor confident. T1 predicts P(y_j | y_j \in Y_k, X) = [.99, 0, 0, 0, 0, 0] (correct), while T2 predicts [0, 0, .63, .02, 0, .34] (incorrect). Then, the TTL predicts P(y_j \in Y_k∣X) = [.72, .28], indicating that T1 should be trusted more than T2 (correctly). Finally, the surrogate label after rescaling and combining the teachers’ predictions is [.71, 0, .18, .01,. 10] leading the student model to the right direction.