Implementation of "How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets". For more information read the paper.
Python 3 and Pytorch
Dataset is available in QUIC Dataset
It is assumed that each flow is converted into a text file containing 4 columns: Timestamp, relative time (from the first packet in the flow), packet lenght, direction. Statistical features are calculated in dataProcessInMemoryQUIC.py file.
- Run dataProcessInMemoryQUIC.py to do the pre-processing and caculating statistical features.
- Run pre-training.py to train the model to predict statistical features.
- Run re-training to transfer the weight from previous step and re-train the model to predict class labels.