Project of Reproducing "VID" involved in https://github.com/rp12-study/rp12-hub
- Pros
- Cons
- python==3.x
- tensorflow>=1.13.0
- Scipy
I found the author's code at https://github.com/ssahn0215/variational-information-distillation. However I'll not refer it, cause I want to check reproducibility of the paper.I don't know why but the author deleted his repository.- My experimental results are higher than the paper. I found that It is tough to make such a low performance like paper. For this, I removed gamma and regularization of batch normalization, and modify hyper-parameters to make training unstable.
- The authors said "We choose four pairs of intermediate layers similarly to [31], each of which is located at the end of a group of residual blocks." but there are only three groups of residual blocks in WResNet. So I sense one more feature map after the first convolutional layer.
- I'll not follow the author's configuration for comparative methods. Because their modification look somewhat awkward, unfair and not coinside with the proposed ways. Also, I think that for fair comparison should not modify the original author configutation whether good or not. It means that I'll only reprocude the author's method, VID.
Full Dataset | 20% Dataset | 10% Dataset | 2% Dataset | |||||
---|---|---|---|---|---|---|---|---|
Methods | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy | Last Accuracy | Paper Accuracy |
Student | 91.22 | 90.72 | 84.85 | 84.67 | 80.29 | 79.63 | 58.11 | 58.84 |
Teacher | 94.98 | 94.26 | - | - | - | - | - | - |
KD | 90.60 | 91.27 | 84.13 | 86.11 | 78.57 | 82.23 | 59.63 | 64.24 |
FitNet | 91.61 | 90.64 | 86.24 | 84.78 | 82.74 | 80.73 | 56.69 | 68.90 |
AT | 91.85 | 91.60 | 87.60 | 87.26 | 84.70 | 84.94 | 74.57 | 73.40 |
VID | 91.85 | 89.73 | 88.09 | 81.59 |
Experimental results of full dataset
- Check correctness of VID implementation and do experiments
- edit README