SL-DDBD: A Novel Driver Distraction Behavior Detection Method Based on Self-supervised Learning with Masked Image Modeling
In this work, we proposed a novel method for driver distraction behavior detection, termed as SL-DDBD. Self-supervised learning framework for masked image modeling. The structure-improved swin transformer was used as an encoder. Extended SF3 dataset using multiple methods of data augmentation strategies. Selected the best masking strategy.
More detailed can be found in our arxiv paper.
In this work, four models were compared for their performance over different epochs. Improved+DA exhibited the fastest convergence and the highest accuracy at each epoch, achieving 78% accuracy by the 10th epoch and a final accuracy of 99.60%. Conversely, the ViT model, despite fast convergence, only managed a final accuracy of 74.35%. The Improved model consistently surpassed the baseline, underscoring the benefits of optimization.
To investigate the advancement of self-supervised learning based on masked image modeling, we visualize the self-supervised learning model and the supervised learning model.The self-supervised model's attention is focused on the key parts of the scene objects and has a better grasp of the feature information, avoiding the problem of feature redundancy and excessive computational costs.
name | pre-train epochs | pre-train resolution | fine-tune resolution | acc@1 | pre-trained model |
---|---|---|---|---|---|
SLDDBD-Base | 110 | 224x224 | 224x224 | 84.92 | google/config |
name | pre-train epochs | pre-train resolution | fine-tune resolution | acc@1 | pre-trained model |
---|---|---|---|---|---|
SwinMIM-Large | 800 | 224x224 | 224x224 | 85.4 | google/config |
The requirements are listed in the requirement.txt
file. To create your own environment, an example is:
pip install torchvision==0.8.2 torchaudio==0.7.2 install timm==0.4.9 opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8 diffdist
pip install -r requirements.txt
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./
cd ..
export MASTER_ADDR=localhost
export MASTER_PORT=5678
This work used the State Farm dataset which can be downloaded from this Kaggle competition.
The training continues by using a pretrained model from our work, an example is:
python main.py --cfg configs/SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.yaml --pretrained SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.pth --data-path dataset --local_rank 0 --batch-size 32
New training using the MIM pretrained model, an example is:
python main.py --cfg configs/MIM_finetune__swin_large__img224_window14__800ep.yaml --pretrained MIM_finetune__swin_large__img224_window14__800ep.pth --data-path dataset --local_rank 0 --batch-size 32
The evaluation configurations can be adjusted at main_eval.py
.
Get the confusion matrix results you need in the confusion matrix folder.
cd eval
python main_eval.py --eval --cfg configs/SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.yaml --resume ./SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.pth --local_rank 0 --data-path dataset
Print the detection results of the weights model and output the txt file.
cd eval
python inference.py --cfg configs/SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.yaml --resume ./SLDDBD_patchsize32_swin_ratio0.5_img224_statefarm_110ep.pth --local_rank 0
You can modify the inference code to customize the inference categories and data input path.
If you are interested in this work, please cite the following work:
@article{zhang2023novel,
title={A Novel Driver Distraction Behavior Detection Method Based on Self-Supervised Learning With Masked Image Modeling},
author={Zhang, Yingzhi and Li, Taiguo and Li, Chao and Zhou, Xinghong},
journal={IEEE Internet of Things Journal},
year={2023},
publisher={IEEE}
}
Our work is based on Swin Transformer. We appreciate the previous open-source repository Swin Transformer.