# 通用模型摄像头识别
![通用模型摄像头识别](./resources/general-webcam.png)

In [1]:
import cv2
from ultralytics import YOLO
model = YOLO("./model/yolo11n.pt")
cap = cv2.VideoCapture(0)
while cap.isOpened():
    success, frame = cap.read()
    if success:
        results = model(frame)
        annotated_frame = results[0].plot()
        cv2.imshow("YOLO Inference", annotated_frame)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        break
cap.release()
cv2.destroyAllWindows()


0: 384x640 2 persons, 3 chairs, 2 laptops, 43.3ms
Speed: 3.2ms preprocess, 43.3ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 2 chairs, 2 laptops, 380.1ms
Speed: 3.4ms preprocess, 380.1ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 2 chairs, 3 laptops, 101.8ms
Speed: 3.0ms preprocess, 101.8ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 2 chairs, 1 laptop, 116.1ms
Speed: 4.5ms preprocess, 116.1ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 2 chairs, 2 laptops, 72.9ms
Speed: 3.3ms preprocess, 72.9ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 1 chair, 2 laptops, 59.8ms
Speed: 2.3ms preprocess, 59.8ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 2 chairs, 1 laptop, 58.2ms
Speed: 3.0ms preprocess, 58.2ms inference, 1.7ms postproce

# 通用模型识别本地图片
![通用模型识别本地图片](./resources/general-pic.png)

In [3]:
from ultralytics import YOLO
model = YOLO("./model/yolo11n.pt")
results = model('./test/test.png')
for r in results:
    print(r.boxes)
    r.show()
    r.save('./test/general-pic.png')



image 1/1 /Users/stepbystep/Documents/AI-class/lab/test/test.png: 352x640 2 persons, 1 backpack, 1 laptop, 2 books, 44.9ms
Speed: 3.5ms preprocess, 44.9ms inference, 1.0ms postprocess per image at shape (1, 3, 352, 640)
ultralytics.engine.results.Boxes object with attributes:

cls: tensor([ 0.,  0., 24., 63., 73., 73.])
conf: tensor([0.8742, 0.4147, 0.3586, 0.3586, 0.2847, 0.2684])
data: tensor([[2.5845e+02, 1.5790e+02, 1.2137e+03, 8.6446e+02, 8.7419e-01, 0.0000e+00],
        [5.9669e-01, 4.3276e+02, 3.4523e+02, 8.5983e+02, 4.1468e-01, 0.0000e+00],
        [8.6530e+01, 6.8506e+02, 3.4720e+02, 8.6642e+02, 3.5863e-01, 2.4000e+01],
        [1.4957e+03, 6.7928e+02, 1.6084e+03, 8.0835e+02, 3.5860e-01, 6.3000e+01],
        [1.1497e+03, 5.0836e+02, 1.2003e+03, 6.1739e+02, 2.8467e-01, 7.3000e+01],
        [1.1795e+03, 5.0837e+02, 1.2057e+03, 6.0970e+02, 2.6838e-01, 7.3000e+01]])
id: None
is_track: False
orig_shape: (868, 1686)
shape: torch.Size([6, 6])
xywh: tensor([[ 736.0776,  511.1809,  95

# 自训练微调锥形路障识别模型
![自训练微调锥形路障识别模型](./resources/finetune-video.png)

In [8]:
# https://docs.ultralytics.com/modes/track
import cv2
import numpy as np
import torch
from ultralytics import YOLO

model = YOLO('./model/best.pt')
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model.to(device)
# https://www.youtube.com/watch?v=oWMXQkGOzho
cap = cv2.VideoCapture('./test/test.mp4')
# cap = cv2.VideoCapture(0)
lower_red = np.array([0, 100, 100])
upper_red = np.array([10, 255, 255])
lower_blue = np.array([100, 100, 100])
upper_blue = np.array([130, 255, 255])
lower_yellow = np.array([20, 100, 100])
upper_yellow = np.array([30, 255, 255])
while cap.isOpened():
    success, frame = cap.read()
    if success:
        red_count = 0
        blue_count = 0
        yellow_count = 0
        results = model.track(frame, persist=True, tracker="bytetrack.yaml")
        for r in results:
            boxes = r.boxes
            for box in boxes:
                x1, y1, x2, y2 = box.xyxy[0]
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                id = int(box.id[0]) if box.id is not None else None
                roi = frame[y1:y2, x1:x2]
                hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
                red_mask = cv2.inRange(hsv_roi, lower_red, upper_red)
                blue_mask = cv2.inRange(hsv_roi, lower_blue, upper_blue)
                yellow_mask = cv2.inRange(hsv_roi, lower_yellow, upper_yellow)
                red_pixels = cv2.countNonZero(red_mask)
                blue_pixels = cv2.countNonZero(blue_mask)
                yellow_pixels = cv2.countNonZero(yellow_mask)
                max_pixels = max(red_pixels, blue_pixels, yellow_pixels)
                if max_pixels == red_pixels:
                    color = "Red"
                    color_box = (0, 0, 255)
                    background_color = (0, 0, 255)
                    red_count += 1
                elif max_pixels == blue_pixels:
                    color = "Blue"
                    color_box = (255, 0, 0)
                    background_color = (255, 0, 0)
                    blue_count += 1
                else:
                    color = "Yellow"
                    color_box = (0, 255, 255)
                    background_color = (0, 255, 255)
                    yellow_count += 1
                cv2.rectangle(frame, (x1, y1), (x2, y2), color_box, 4)
                label = f"id: {id} {color}" if id is not None else color
                label_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.9, 2)[0]
                cv2.rectangle(frame, (x1, y1 - label_size[1] - 10), (x1 + label_size[0], y1), background_color, -1)
                cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 3)
        
        total_count = red_count + blue_count + yellow_count
        print(f"Red Count: {red_count}")
        print(f"Blue Count: {blue_count}")
        print(f"Yellow Count: {yellow_count}")
        print(f"All: {total_count}")
        print("-" * 30)
        
        cv2.imshow("YOLO11 Tracking", frame)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        break
cap.release()
cv2.destroyAllWindows()


0: 384x640 6 Safety Cones, 357.6ms
Speed: 13.4ms preprocess, 357.6ms inference, 159.5ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 2
Blue Count: 1
Yellow Count: 3
All: 6
------------------------------

0: 384x640 5 Safety Cones, 20.4ms
Speed: 4.3ms preprocess, 20.4ms inference, 43.6ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 1
Blue Count: 1
Yellow Count: 3
All: 5
------------------------------



2024-11-18 19:11:20.661 python[64612:11508877] _TIPropertyValueIsValid called with 16 on nil context!
2024-11-18 19:11:20.661 python[64612:11508877] imkxpc_getApplicationProperty:reply: called with incorrect property value 16, bailing.
2024-11-18 19:11:20.661 python[64612:11508877] Text input context does not respond to _valueForTIProperty:


0: 384x640 5 Safety Cones, 15.2ms
Speed: 3.0ms preprocess, 15.2ms inference, 79.0ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 1
Blue Count: 1
Yellow Count: 3
All: 5
------------------------------

0: 384x640 5 Safety Cones, 19.5ms
Speed: 2.6ms preprocess, 19.5ms inference, 58.6ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 2
Blue Count: 1
Yellow Count: 2
All: 5
------------------------------

0: 384x640 6 Safety Cones, 21.2ms
Speed: 2.9ms preprocess, 21.2ms inference, 10.4ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 2
Blue Count: 1
Yellow Count: 3
All: 6
------------------------------

0: 384x640 6 Safety Cones, 16.5ms
Speed: 2.4ms preprocess, 16.5ms inference, 64.9ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 2
Blue Count: 1
Yellow Count: 3
All: 6
------------------------------

0: 384x640 6 Safety Cones, 23.5ms
Speed: 2.8ms preprocess, 23.5ms inference, 12.3ms postprocess per image at shape (1, 3, 384, 640)
Red Count: 2

# 模型训练过程
> 日志: ./train.log
## 训练效果
![模型训练效果](./runs/detect/train8/results.png)
<div style="display: flex; justify-content: space-around;">
  <img src="./runs/detect/train8/confusion_matrix.png" alt="Image 1" style="width: 50%;"/>
  <img src="./runs/detect/train8/confusion_matrix_normalized.png" alt="Image 2" style="width: 50%;"/>
</div>

<div style="display: flex; justify-content: space-around;">
<img src="./runs/detect/train8/F1_curve.png" alt="Image 1" style="width: 50%;"/>
<img src="./runs/detect/train8/P_curve.png" alt="Image 2" style="width: 50%;"/>
</div>
<div style="display: flex; justify-content: space-around;">
<img src="./runs/detect/train8/PR_curve.png" alt="Image 3" style="width: 50%;"/>
<img src="./runs/detect/train8/R_curve.png" alt="Image 4" style="width: 50%;"/>
</div>

![](./runs/detect/train8/labels.jpg)
![](./runs/detect/train8/labels_correlogram.jpg)
![](./runs/detect/train8/train_batch0.jpg)
![](./runs/detect/train8/train_batch1.jpg)
![](./runs/detect/train8/train_batch16830.jpg)
![](./runs/detect/train8/train_batch16831.jpg)
![](./runs/detect/train8/train_batch16832.jpg)
![](./runs/detect/train8/train_batch2.jpg)
![](./runs/detect/train8/val_batch0_labels.jpg)
![](./runs/detect/train8/val_batch0_pred.jpg)
![](./runs/detect/train8/val_batch1_labels.jpg)
![](./runs/detect/train8/val_batch1_pred.jpg)
![](./runs/detect/train8/val_batch2_labels.jpg)
![](./runs/detect/train8/val_batch2_pred.jpg)


# 反思

---

### **优点**
1. **逐渐降低的损失**：
   - `train/box_loss`、`train/cls_loss` 和 `train/dfl_loss` 在绝大多数 epoch 中都在逐渐减小，表明模型的优化方向正确，损失函数有效。
   - 验证损失 (`val/box_loss`、`val/cls_loss` 和 `val/dfl_loss`) 同样在减小，说明模型具有一定的泛化能力。

2. **较好的精度与召回率**：
   - `metrics/precision(B)` 达到 **0.78**，说明预测的目标中大部分是准确的。
   - `metrics/recall(B)` 达到 **0.65**，说明模型捕获了大部分的真实目标。
   - 这对许多实际场景（如目标检测应用）来说是可以接受的。

3. **mAP50 和 mAP50-95**：
   - `metrics/mAP50(B)` 达到 **0.71+**，表明模型在 IoU=50% 阈值下具有较高的目标检测性能。
   - `metrics/mAP50-95(B)` 达到 **0.37+**，反映模型在不同 IoU 阈值下的平均性能，虽然不算特别高，但可以接受。

---

### **不足**
1. **验证损失较高**：
   - 虽然验证损失（`val/box_loss`、`val/cls_loss` 等）在逐渐减小，但比训练损失高，表明模型存在 **过拟合倾向**。
   - 尤其在后期，`val/box_loss` 稍微有增加趋势，暗示可能需要更强的正则化手段（如增加 `weight_decay` 或减小 `batch_size`）。

2. **召回率略低**：
   - `metrics/recall(B)` 只有 **0.65**，说明有部分目标未被检测到。
   - 如果在实际应用中对漏检容忍度低（如安全性检测场景），需要进一步优化召回率。

3. **mAP50-95 较低**：
   - `metrics/mAP50-95(B)` 在 **0.37** 左右，表明模型对小目标或者边界不够准确。
   - 这可能与数据分布（目标大小、复杂度）或损失函数权重设置相关。

---


### 第一行（训练过程指标）

1. **`train/box_loss`**  
   - **含义**：表示边界框回归损失（Box Loss）。反映模型预测的边界框与真实目标框之间的差距。  
   - **目标**：越低越好，表明模型对目标位置的预测更加准确。

2. **`train/cls_loss`**  
   - **含义**：表示分类损失（Classification Loss）。反映模型对每个目标的类别预测是否准确。  
   - **目标**：越低越好，表明模型能够更好地区分不同类别。

3. **`train/dfl_loss`**  
   - **含义**：表示分布聚合损失（Distribution Focal Loss），通常用于预测边界框的更精确的定位分布。  
   - **目标**：越低越好，说明模型在预测目标框边界的质量更高。

4. **`metrics/precision(B)`**  
   - **含义**：精度（Precision），即模型预测为正样本的目标中，真正是正样本的比例。  
   - **目标**：越高越好，表明模型预测结果更精确，误报少。

5. **`metrics/recall(B)`**  
   - **含义**：召回率（Recall），即真实正样本中被模型正确预测为正样本的比例。  
   - **目标**：越高越好，表明模型捕捉到更多的真实目标。

### 第二行（验证过程指标）

6. **`val/box_loss`**  
   - **含义**：验证集上的边界框损失，与训练集的 `train/box_loss` 类似，但作用于验证数据。  
   - **目标**：越低越好，表示模型在验证集上对目标位置预测更加准确。

7. **`val/cls_loss`**  
   - **含义**：验证集上的分类损失，与训练集的 `train/cls_loss` 类似，但作用于验证数据。  
   - **目标**：越低越好，表示模型在验证集上的分类能力更强。

8. **`val/dfl_loss`**  
   - **含义**：验证集上的分布聚合损失，与训练集的 `train/dfl_loss` 类似，但作用于验证数据。  
   - **目标**：越低越好，表示模型对验证集目标框边界的预测质量更高。

9. **`metrics/mAP50(B)`**  
   - **含义**：在验证集上计算的平均精度（mAP），用 IoU（交并比）阈值 50% 来衡量目标检测的整体性能。  
   - **目标**：越高越好，表明模型整体检测精度更高。

10. **`metrics/mAP50-95(B)`**  
    - **含义**：在验证集上计算的平均精度（mAP），IoU 阈值从 50% 到 95% 的范围内取平均值，综合反映模型在不同 IoU 水平下的性能。  
    - **目标**：越高越好，表示模型对目标的检测能力更全面。

### 总体理解
- **损失项**（`box_loss`、`cls_loss`、`dfl_loss`）：越低越好，表示模型预测误差更小。
- **性能指标**（`precision`、`recall`、`mAP50`、`mAP50-95`）：越高越好，表示模型性能更好。
- **验证指标 vs. 训练指标**：验证指标反映模型对未见数据的泛化能力，应与训练指标保持一致或接近。如果差距过大，可能存在过拟合问题。

---

### **总体**
模型 **性能中等偏好**，特别在 `mAP50` 上已经达到较高水平（0.71+）。不过，对于一些严格场景（如需要高召回率和小目标检测的任务），还需进一步优化。

通过修改超参数、调整数据增强策略、增加数据量等方式，可以进一步提升模型性能。经过几轮训练，模型的性能有所提升。
