This is a pytorch implementation of Nicholas Frosst and Geoffrey Hinton's "Distilling a Neural Network Into a Soft Decision Tree".
The Soft decision tree in this implementation inferences by averaging the distribution over all the leaves, weighted by their respective path probabilities.
For inferencing with greatest path probability, please see https://github.com/kimhc6028/soft-decision-tree.
Original paper: https://arxiv.org/pdf/1711.09784.pdf
The image below shows the logic for building a soft decision tree, where nodes are coded with (layer, position).
Without extensive tuning of hyperparameters, the tree with a depth of 8 was able to reach ~95% accuracy on MNIST dataset.
Feel free to ask questions, and please star if you find this useful.