Jiao Zhan, Yarong Luo, Chi Guo, Yejun Wu, Jingnan Liu
- The paper is under submission...
-
2023-4-27
: We've uploaded the experiment results along with some code, and the full code will be released soon! -
2023-9-15
: We have uploaded part of the code and the full code will be released soon!
Panoptic driving perception encompasses traffic object detection, drivable area segmentation, and lane detection. Existing methods typically utilize anchor-based multi-task learning networks to complete this task. While these methods yield promising results, they suffer from the inherent limitations of anchor-based detectors. In this paper, we propose YOLOPX, a simple and efficient anchor-free multi-task learning network for panoptic driving perception. To the best of our knowledge, this is the first work to employ the anchor-free detection head in panoptic driving perception. This anchor-free manner simplifies training by avoiding anchor-related heuristic tuning, and enhances the adaptability and scalability of our multi-task learning network. In addition, YOLOPX incorporates a novel lane detection head that combines multi-scale high-resolution features and long-distance contextual dependencies to improve segmentation performance. Beyond structure optimization, we propose optimization improvements to enhance network training, enabling our multi-task learning network to achieve optimal performance through simple end-to-end training. Experimental results on the challenging BDD100K dataset demonstrate the state-of-the-art (SOTA) performance of YOLOPX: it achieves 93.7 % recall and 83.3% mAP50 on traffic object detection, 93.2% mIoU on drivable area segmentation, and 88.6% accuracy and 27.2% IoU on lane detection. Moreover, YOLOPX has faster inference speed compared to the lightweight network YOLOP. Consequently, YOLOPX is a powerful solution for panoptic driving perception problems. The code is available at https://github.com/jiaoZ7688/YOLOPX.
- We use the BDD100K as our datasets,and experiments are run on NVIDIA TESLA V100.
- model : trained on the BDD100k train set and test on the BDD100k val set .
-
Our network has excellent robustness and generalization!!!!!
-
Even on new datasets (KITTI) with different image sizes and application scenarios, our network performs well.
-
This is helpful for related research in SLAM.
-
Note: The raw videos comes from KITTI
-
The results of our experiments are as follows:
- Note: The raw videos comes from YOLOP and HybridNets
- The results of our experiments are as follows:
- The results on the BDD100k val set.
We compare YOLOPX with the current open source YOLOP and HybridNets on the NVIDIA RTX 3080. In terms of real-time, we compare the inference speed (excluding data pre-processing and NMS operations) at batch size 1.
Model | Backbone | Params | Speed (fps) | Anchor |
---|---|---|---|---|
YOLOP |
CSPDarknet | 7.9M | 39 | √ |
HybridNets |
EfficientNet | 12.8M | 17 | √ |
YOLOPX |
ELANNet | 32.9M | 47 | × |
Result | Visualization | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Result | Visualization | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Result | Visualization | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
├─inference
│ ├─image # inference images
│ ├─image_output # inference result
├─lib
│ ├─config/default # configuration of training and validation
│ ├─core
│ │ ├─activations.py # activation function
│ │ ├─evaluate.py # calculation of metric
│ │ ├─function.py # training and validation of model
│ │ ├─general.py #calculation of metric、nms、conversion of data-format、visualization
│ │ ├─loss.py # loss function
│ │ ├─postprocess.py # postprocess(refine da-seg and ll-seg, unrelated to paper)
│ ├─dataset
│ │ ├─AutoDriveDataset.py # Superclass dataset,general function
│ │ ├─bdd.py # Subclass dataset,specific function
│ │ ├─convect.py
│ │ ├─DemoDataset.py # demo dataset(image, video and stream)
│ ├─models
│ │ ├─YOLOP.py # Setup and Configuration of model
│ │ ├─YOLOX_Head.py # YOLOX's decoupled Head
│ │ ├─YOLOX_Loss.py # YOLOX's detection Loss
│ │ ├─commom.py # calculation module
│ ├─utils
│ │ ├─augmentations.py # data augumentation
│ │ ├─autoanchor.py # auto anchor(k-means)
│ │ ├─split_dataset.py # (Campus scene, unrelated to paper)
│ │ ├─plot.py # plot_box_and_mask
│ │ ├─utils.py # logging、device_select、time_measure、optimizer_select、model_save&initialize 、Distributed training
│ ├─run
│ │ ├─dataset/training time # Visualization, logging and model_save
├─tools
│ │ ├─demo.py # demo(folder、camera)
│ │ ├─test.py
│ │ ├─train.py
├─weights # Pretraining model
This codebase has been developed with python version 3.7, PyTorch 1.12+ and torchvision 0.13+
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
or
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
See requirements.txt
for additional dependencies and version requirements.
pip install -r requirements.txt
You can get the pre-trained model from here. Extraction code:fvuc
For BDD100K: imgs, det_annot, da_seg_annot, ll_seg_annot
We recommend the dataset directory structure to be the following:
# The id represent the correspondence relation
├─dataset root
│ ├─images
│ │ ├─train
│ │ ├─val
│ ├─det_annotations
│ │ ├─train
│ │ ├─val
│ ├─da_seg_annotations
│ │ ├─train
│ │ ├─val
│ ├─ll_seg_annotations
│ │ ├─train
│ │ ├─val
Update the your dataset path in the ./lib/config/default.py
.
coming soon......
python tools/test.py --weights weights/epoch-195.pth
You can store the image or video in --source
, and then save the reasoning result to --save-dir
python tools/demo.py --weights weights/epoch-195.pth
--source inference/image
--save-dir inference/image_output
--conf-thres 0.3
--iou-thres 0.45
YOLOPX is released under the MIT Licence.
Our work would not be complete without the wonderful work of the following authors: