Comprehensive Analysis of Model Errors in Blueberry Detection and Maturity Classification: Identifying Limitations and Proposing Future Improvements in Agricultural Monitoring.
Repository for the code used in the paper
@Article{agriculture14010018,
AUTHOR = {Aguilera, Cristhian A. and Figueroa-Flores, Carola and Aguilera, Cristhian and Navarrete, Cesar},
TITLE = {Comprehensive Analysis of Model Errors in Blueberry Detection and Maturity Classification: Identifying Limitations and Proposing Future Improvements in Agricultural Monitoring},
JOURNAL = {Agriculture},
VOLUME = {14},
YEAR = {2024},
NUMBER = {1},
ARTICLE-NUMBER = {18},
URL = {https://www.mdpi.com/2077-0472/14/1/18},
ISSN = {2077-0472},
ABSTRACT = {In blueberry farming, accurately assessing maturity is critical to efficient harvesting. Deep Learning solutions, which are increasingly popular in this area, often undergo evaluation through metrics like mean average precision (mAP). However, these metrics may only partially capture the actual performance of the models, especially in settings with limited resources like those in agricultural drones or robots. To address this, our study evaluates Deep Learning models, such as YOLOv7, RT-DETR, and Mask-RCNN, for detecting and classifying blueberries. We perform these evaluations on both powerful computers and embedded systems. Using Type-Influence Detector Error (TIDE) analysis, we closely examine the accuracy of these models. Our research reveals that partial occlusions commonly cause errors, and optimizing these models for embedded devices can increase their speed without losing precision. This work improves the understanding of object detection models for blueberry detection and maturity estimation.},
DOI = {10.3390/agriculture14010018}
}
- Docker
- nvidia-docker
Build the docker container.
export WORKSPACE=/absolute_path_to_this_folder
docker build -t blueberry_detection -f docker/Dockerfile .
The first step is to launch the docker container. All commands are run inside the container.
# shell 1
sh docker/run.sh
# shell 2
docker exec -it blueberry_detection /bin/bash
The second step is to download the dataset & models which will be evaluated.
sh download_dataset.sh
sh download_models.sh
For each architecture, 3 models were trained and their metrics averaged. The detection metrics reported are:
Model | Class | Precision | Recall | F1 | mAP75 |
---|---|---|---|---|---|
YOLOv7-tiny | Ripe Pint Unripe All |
0.547 0.568 0.485 0.533 |
0.387 0.433 0.323 0.380 |
0.443 0.489 0.388 0.443 |
0.330 0.364 0.231 0.309 |
YOLOv7-default | Ripe Pint Unripe All |
0.626 0.641 0.605 0.624 |
0.456 0.508 0.415 0.46 |
0.528 0.567 0.492 0.530 |
0.435 0.432 0.348 0.405 |
YOLOv7-w6 | Ripe Pint Unripe All |
0.598 0.631 0.591 0.607 |
0.500 0.494 0.457 0.484 |
0.544 0.554 0.516 0.539 |
0.445 0.431 0.381 0.419 |
RT-DETR-L | Ripe Pint Unripe All |
0.606 0.627 0.544 0.592 |
0.429 0.329 0.380 0.380 |
0.502 0.431 0.447 0.462 |
0.393 0.282 0.289 0.321 |
Mask-RCNN | Ripe Pint Unripe All |
0.612 0.680 0.582 0.625 |
0.490 0.574 0.488 0.518 |
0.543 0.622 0.530 0.565 |
0.447 0.558 0.426 0.477 |
You can reproduce the results by running the following scripts.
python3 test.py --weights /workspace/models/blueberry_yolov7tiny/model1.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7tiny/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7tiny/model3.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7/model2.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7/model3.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7w6/model1.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 1280 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7w6/model2.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 1280 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_yolov7w6/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 1280 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7 --no-tide
python3 test.py --weights /workspace/models/blueberry_rt-detr-l/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode rt-detr --no-tide
python3 test.py --weights /workspace/models/blueberry_rt-detr-l/model2.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode rt-detr --no-tide
python3 test.py --weights /workspace/models/blueberry_rt-detr-l/model3.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode rt-detr --no-tide
python3 test.py --weights /workspace/models/blueberry_maskrcnn/model_best.pth --cfg /workspace/models/blueberry_maskrcnn/custom_cfg.yaml --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode mask-rcnn --no-tide
python3 test.py --weights /workspace/models/blueberry_maskrcnn/model2.pth --cfg /workspace/models/blueberry_maskrcnn/custom_cfg.yaml --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode mask-rcnn --no-tide
python3 test.py --weights /workspace/models/blueberry_maskrcnn/model3.pth --cfg /workspace/models/blueberry_maskrcnn/custom_cfg.yaml --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode mask-rcnn --no-tide
TIDE requires the results in COCO format to calculate errors, so it is essential to convert the results from the YOLO format into a format compliant with COCO standards
The following script will perform the conversion automatically.
cd detectron2
python3 yolo2coco.py --images_path /workspace/datasets/blueberry_dataset/test/images/ --labels_path /workspace/datasets/blueberry_dataset/test/labels --out ../coco_gt
cd /workspace
The errors metrics reported by each model are:
Model | Cls | Loc | Both | Dupe | Bkg | Miss | FP | FN |
---|---|---|---|---|---|---|---|---|
YOLOv7-tiny | 5.73 | 32.3 | 0.31 | 0.0 | 0.27 | 10.87 | 10.35 | 43.14 |
YOLOv7-default | 5.8 | 28.41 | 0.21 | 0.0 | 0.36 | 10.29 | 10.25 | 38.28 |
YOLOv7-w6 | 6.19 | 26.47 | 0.26 | 0.0 | 0.27 | 11.42 | 9.47 | 38.95 |
RT-DETR-L | 8.43 | 35.52 | 0.41 | 0.0 | 0.71 | 6.71 | 11.88 | 40.75 |
Mask-RCNN | 3.26 | 30.47 | 0.17 | 0.0 | 0.77 | 8.56 | 10.93 | 34.35 |
You can measure the TIDE errors & mAP of each model with the following:
python3 test.py --weights /workspace/models/blueberry_yolov7tiny/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7
python3 test.py --weights /workspace/models/blueberry_yolov7/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7
python3 test.py --weights /workspace/models/blueberry_yolov7w6/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 1280 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode yolov7
python3 test.py --weights /workspace/models/blueberry_rt-detr-l/model_best.pt --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode rt-detr
python3 test.py --weights /workspace/models/blueberry_maskrcnn/model_best.pth --cfg /workspace/models/blueberry_maskrcnn/custom_cfg.yaml --data /workspace/datasets/blueberry_dataset/data.yaml --batch-size 1 --img-size 640 --conf-thres 0.25 --iou-thres 0.65 --task test --device 0 --base-map 0.75 --infer-mode mask-rcnn
The average time it took each model to do inference over an image. Each model does inference 100 times as warmup, as to not pollute the final results, then each model does another 100 inferences which are recorded. Finally we get the mean of the 100 times recorded and then we calculate the standard deviation.
Times recorded on a computer system with a 12th Gen Core i7 CPU, 32GB of RAM, a 1TB SSD, and an NVIDIA RTX3080TI 10GB GPU.
Model | fp32 | std | fp16 | std |
---|---|---|---|---|
YOLOv7-tiny | 3.308 | 0.013 | 2.252 | 0.049 |
YOLOv7-default | 8.059 | 0.046 | 3.823 | 0.059 |
YOLOv7-w6 | 19.551 | 0.142 | 7.677 | 0.081 |
RT-DETR-L | 11.512 | 0.385 | 8.933 | 0.522 |
Mask-RCNN | 34.301 | 0.755 | 16.998 | 0.206 |
For this step you need to convert the models to a TensorRT engine.
To convert the YOLOv7 models:
cd deps/YOLOv7_Tensorrt
sh to_onnx.sh
sh to_trt.sh
cd /workspace
To convert the Mask-RCNN model:
cd detectron2
sh to_trt.sh
cd /workspace
The RT-DETR-L model cannot be converted to a TRT engine, therefore it's coverted to a torchscript model.
cd rt-detr
sh to_torchscript.sh
cd /workspace
FP16 inference times.
python3 benchmark.py --weights /workspace/models/blueberry_yolov7tiny/model_best_fp16.trt --imgsz 640 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_yolov7/model_best_fp16.trt --imgsz 640 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_yolov7w6/model_best_fp16.trt --imgsz 1280 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_rt-detr-l/model_best_fp16.torchscript --infer-type rt-detr-ts --im_folder /workspace/datasets/blueberry_dataset/test/images/ --half
python3 benchmark.py --weights /workspace/models/blueberry_maskrcnn/model_best_fp16.trt --infer-type mask-rcnn-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
FP32 inference times.
python3 benchmark.py --weights /workspace/models/blueberry_yolov7tiny/model_best_fp32.trt --imgsz 640 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_yolov7/model_best_fp32.trt --imgsz 640 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_yolov7w6/model_best_fp32.trt --imgsz 1280 --infer-type yolov7-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_rt-detr-l/model_best_fp32.torchscript --infer-type rt-detr-ts --im_folder /workspace/datasets/blueberry_dataset/test/images/
python3 benchmark.py --weights /workspace/models/blueberry_maskrcnn/model_best_fp32.trt --infer-type mask-rcnn-trt --im_folder /workspace/datasets/blueberry_dataset/test/images/
You can train models by following the instructions in each folder corresponding to its architecture.