A project focused on Autonomous Vehicle Perception processes. This repository serves as a starting point for implementing object detection and depth estimation capabilities in autonomous systems using YOLO architecture and Vision Transformers.
AutoPercept uses YOLO (You Only Look Once) for real-time object detection and tracking and a vision transformer called MiDaS for a Monocular Depth Estimation. The model is trained on the KITTI dataset, enabling accurate detection and depth estimation of objects in various driving scenarios. The open sourced YOLOv8 weights from Ultralytics have been utilized for object detection model training. Zero Shot Depth Estimation is done using MiDAS since the model has already been trained on the KITTI dataset (You can learn more about MiDaS at https://github.com/isl-org/MiDaS)
- YOLO Object Detection: Real-time detection of objects using YOLO architecture.
- Real Time Counting: Functionality for counting and displaying the amount of detections for each class has been added
- Pythonic UI : AutoPercept can also be used as a Wrapper or Interface to run inference on videos using custom model weights
- Depth Estimation: Monocular Depth Estimation has been added using MiDaS, a Vision Transformer.
- Saveable Results : Functionality to save the video with the detections in a given directory with a given name has also been added
- KITTI Dataset: Trained on the KITTI dataset, which includes various object categories commonly encountered in autonomous driving scenarios
-
Clone the Repository:
git clone https://github.com/yourusername/AutoPercept.git cd AutoPercept
-
Setup Environment:
# Install required dependencies pip install -r requirements.txt
-
Running the Gooey App:
python AutoPercept.py
or
python ViT.py
-
Specifying Pre-inference parameters
- NOTE : Trained Weights can be found in the "Model_Weights" directory
-
- After specifying the pre-inference params, hit the "Start" Button to start the inference process
- A new window will open up showing the detections being made for each frame of the video
- After inference is completed, you will find the video saved in the 'output_path' directory
- Precision, Recall, Mean Average Precision@IoU=0.5 and Mean Average Precision@IoU=0.5-0.95 for each class over the validation dataset
- Model Training Performance
README_EX.mp4
49d37214-86df-4ed1-99c1-8cf15eaa3364.mp4
Monocular Depth Estimation : I aim to add functionality of simultaneous depth estimation and objection detection very soon.(This functionality has been added)FPS Optimization : The model seems to be working at a mediocre FPS. I will be looking to solve this soon enough. (This issue was being caused because of a bug that was not letting the program utilize the GPU. It has now been fixed)- Quantization : Currently, the MiDaS depth estimation model runs on 1-10 fps depending upon the resolution of the image. To makes the inference faster I am going to add 4 bit Quantization to the model. This will involve converting the model hyperparameters from their 32 bit floating point representation to a 4 bit one
- Simultaneous Localization and Mapping (SLAM) : Due to the fact that the KITTI dataset also contains LiDAR and 3D Point Cloud data, It will be possible to add functionalities to visualize SLAM Processes in real time.
- Object Tracking and Projection : Using Kalman Filter, I am currently working on creating methods to Track these object's movements and visualize them in real time.
- YOLOv10 : Since the initiation of this project, YOLOv10 was released. Ultralytics state that this new model beats all SOTA Object Detection Benchmarks. I will soon add functionality for YOLOv10 inference