This repository is a Python prototype for detecting people and visible carried objects, tracking anonymous IDs over time, segmenting baggage, linking bags to likely owners, and exporting annotated video, JSON events, and CSV track logs.
Video / webcam -> Main YOLO detection/tracking for people and bags -> Optional secondary YOLO predict() stream for trained risk-object classes -> Detection merge -> Temporal risk-object confirmation -> BoT-SORT short-term tracking -> relevant-class whitelist and clutter filtering -> Nested ROI Search inside selected person/bag boxes -> duplicate cleanup -> OSNet person ReID + MemoryBank global IDs -> FastSAM/SAM masks for selected baggage/object classes -> person-bag relationship memory -> risk/event logic -> annotated MP4 + events JSON + tracks CSV
Main idea:
YOLO finds visible objects. BoT-SORT tracks them locally for short time spans. OSNet ReID helps reconnect people after track loss. MemoryBank assigns stable project-level IDs like G1, G2, G3. FastSAM segments selected objects, mainly bags and weapons. Relationship memory links bags to likely nearby people. Risk logic writes possible unattended-bag and tracking events.
Implemented:
[OK] Webcam/live-camera processing [OK] Prerecorded video processing [OK] YOLO object detection [OK] Optional dual-YOLO merge: main tracked detector + secondary predict-only risk detector [OK] Temporal risk-object confirmation, default 3 hits in 5 frames [OK] BoT-SORT local tracking [OK] Project global IDs: G1, G2, G3... [OK] Torchreid/OSNet person ReID backend [OK] Bad-crop guard for person ReID memory [OK] Multiple good ReID snapshots per person [OK] Entry/exit side continuity boost [OK] Group-safe ID assignment so visible people do not collapse into one ID [OK] Relevant-class whitelist [OK] Nested ROI Search for visible objects inside person/bag boxes [OK] FastSAM/SAM segmentation for selected object classes [OK] Person-bag relationship memory [OK] Owner-link display TTL to prevent stale lines staying on screen [OK] Possible unattended-bag event logic [OK] Annotated MP4 recording [OK] Events JSON output [OK] Per-frame tracks CSV output [OK] Manual offline CSV/JSON merge helper
Still in progress:
[DONE] Integrate trained risk-object detector through --secondary-weights [TODO] Tune thresholds on real team videos [TODO] Improve event reports and dashboard/report view
Expected layout:
screening_ai_project_deep_reid/ | |-- configs/ | |-- data.yaml # YOLO training dataset config | |-- weapon_data.yaml # Optional detector dataset config, if present | |-- classes.yaml # Class groups: people, bags, suspicious objects | |-- botsort_reid.yaml # BoT-SORT tracker settings | |-- risk_config.yaml # Event/risk thresholds | |-- tracking_memory.yaml # Global memory, ReID, filters, ROI search | |-- scripts/ | |-- run_webcam.py # Run live camera/webcam mode | |-- run_video.py # Run prerecorded video mode | |-- run_realtime_sam_cpu.py # CPU-friendly fixed demo launcher | |-- train_yolo.py # Train custom YOLO detector | |-- create_dataset_folders.py # Create dataset/input/output folders | |-- check_reid_backend.py # Verify OSNet/Torchreid backend | |-- smoke_test.py # Quick functional test | |-- offline_merge_tracks.py # Manual post-run ID merge helper | |-- import_zip_dataset.py # Optional dataset import helper | |-- download_weapon_dataset.py # Optional dataset download helper, if used | |-- src/screening_ai/ | |-- detector.py | |-- segmenter.py | |-- deep_reid.py | |-- appearance.py | |-- memory.py | |-- association.py | |-- risk.py | |-- visualization.py | |-- privacy.py # Legacy/optional privacy utilities, if still present | |-- pipeline.py | |-- utils.py | |-- datasets/ | |-- screening_dataset/ | | |-- images/train/ | | |-- images/val/ | | |-- labels/train/ | | |-- labels/val/ | |-- input/ # Put input videos here |-- outputs/ # Annotated video, JSON, CSV outputs |-- requirements.txt |-- requirements_reid_optional.txt |-- requirements_windows_stable_reid.txt |-- README.md
Always run commands from the project root, the folder that contains scripts, src, configs, and README.md.
note: python3.12 is crucial.
py -3.12 -m venv .venv .venv\Scripts\activate.bat python -m pip install --upgrade pip setuptools wheel python -m pip install -r requirements_windows_stable_reid.txt python scripts\check_reid_backend.py python scripts\smoke_test.py
Expected successful ReID check:
Checking Torchreid/OSNet backend... Using Torchreid/OSNet backend: osnet_x0_25 on cpu OSNet is available.
py -0p
Recommended:
Python 3.12 64-bit
Avoid Python 3.14 for this project right now. The base packages may install, but Torchreid/OSNet is more reliable on Python 3.10-3.12.
This can work for general YOLO/SAM experiments, but it is not the safest path for real OSNet:
py -3.12 -m venv .venv .venv\Scripts\activate.bat python -m pip install --upgrade pip setuptools wheel python -m pip install -r requirements.txt python -m pip install -r requirements_reid_optional.txt python scripts\check_reid_backend.py
Use the stable file if OSNet says unavailable.
python scripts\create_dataset_folders.py
This creates the dataset, input, and output folders.
python scripts\check_reid_backend.py
Do not continue debugging person tracking quality until this says:
OSNet is available.
Normal first run:
Downloading... From: https://drive.google.com/... To: C:\Users....cache\torch\checkpoints\osnet_x0_25_imagenet.pth
This is expected. Torchreid downloads OSNet weights the first time.
Harmless warnings:
UserWarning: Cython evaluation ... is unavailable FutureWarning: You are using torch.load with weights_only=False
If the final line says OSNet is available, continue.
python scripts\smoke_test.py
Expected:
Smoke tests passed: association, reconnect, SAM mask geometry, and group-safe ReID work.
smoke_test.py checks project logic. It does not prove OSNet is active. Use check_reid_backend.py for that.
Use this for the current full webcam demo:
python scripts\run_webcam.py --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 320 --device cpu --display
This command is tuned for CPU and webcam testing. It keeps bag detections more easily while relying on class-specific filters to reduce bad person detections.
Use this when best_v2.pt is in the project root. The main model keeps person/bag tracking. The secondary model adds trained risk-object detections and the pipeline only draws/logs/links them after temporal confirmation.
python scripts\run_webcam.py --weights yolo11n.pt --secondary-weights best_v2.pt --secondary-target-classes knife,gun --secondary-conf 0.35 --secondary-iou 0.45 --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 10 --sam-max-objects 3 --sam-classes backpack,handbag,suitcase,knife,gun --prefer-sam-masks --sam-tracking-classes backpack,handbag,suitcase,knife,gun --imgsz 640 --device cpu --display --output-video outputs\webcam_dual_yolo.mp4 --output-json outputs\webcam_dual_yolo_events.json --output-tracks outputs\webcam_dual_yolo_tracks.csvFaster CPU version:
python scripts\run_webcam.py --weights yolo11n.pt --secondary-weights best_v2.pt --secondary-target-classes knife,gun --secondary-conf 0.40 --secondary-iou 0.45 --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase,knife,gun --prefer-sam-masks --imgsz 320 --device cpu --display --output-video outputs\webcam_dual_yolo_fast.mp4 --output-json outputs\webcam_dual_yolo_fast_events.json --output-tracks outputs\webcam_dual_yolo_fast_tracks.csvDebug raw secondary detections by adding --disable-risk-smoothing. Do not use that for the main demo because it allows one-frame flicker.
The live OpenCV preview can now be made larger without changing the saved MP4 resolution:
python scripts\run_webcam.py --weights yolo11n.pt --secondary-weights best_v2.pt --secondary-target-classes knife,gun --secondary-conf 0.55 --secondary-iou 0.30 --weapon-confirm-window 7 --weapon-confirm-min-hits 4 --weapon-confirm-match-center-px 80 --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 15 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase,knife,gun --prefer-sam-masks --sam-tracking-classes backpack,handbag,suitcase,knife,gun --imgsz 640 --device cpu --display --display-scale 1.5 --label-scale 0.42 --label-thickness 1 --event-overlay-position bottom-right --event-overlay-max-lines 5 --event-overlay-ttl-frames 150 --output-video outputs\webcam_dual_yolo_ui.mp4 --output-json outputs\webcam_dual_yolo_ui_events.json --output-tracks outputs\webcam_dual_yolo_ui_tracks.csvUseful UI flags:
--display-scale 1.5 Larger live preview window only; saved video stays original size.
--display-width 1280 Optional explicit preview window width.
--display-height 720 Optional explicit preview window height.
--label-scale 0.42 Smaller bbox/owner/FPS text.
--label-thickness 1 Thinner text.
--event-overlay Enable blue/white event feed, enabled by default.
--no-event-overlay Disable event feed.
--event-overlay-position bottom-right, bottom-left, top-right, or top-left.
--event-overlay-max-lines 5 Maximum visible notification rows.
--event-overlay-ttl-frames 150 How long notifications remain visible.
--event-overlay-scale 0.46 Event-feed text size.
Lower global YOLO confidence slightly:
python scripts\run_webcam.py --weights yolo11n.pt --conf 0.22 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 320 --device cpu --display
If this creates fake people, raise the person-specific confidence in configs/tracking_memory.yaml:
min_conf_by_class: person: 0.60
or:
min_conf_by_class: person: 0.65
python scripts\run_webcam.py --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 30 --sam-max-objects 1 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 320 --device cpu --display --roi-every-n 20 --roi-max-parent-rois 1If your current branch does not include --roi-every-n or --roi-max-parent-rois, tune the same values inside configs/tracking_memory.yaml instead.
python scripts\run_webcam.py --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --sam-tracking-classes backpack,handbag,suitcase --imgsz 640 --device cpu --displayUse this when you want better boxes/crops and the computer can handle it.
Put a video here:
input\test_video.mp4
Run:
python scripts\run_video.py --source input\test_video.mp4 --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 320 --device cpu --output-video outputs\test_annotated.mp4 --output-json outputs\test_events.json --output-tracks outputs\test_tracks.csv
For faster debugging without SAM:
python scripts\run_video.py --source input\test_video.mp4 --weights yolo11n.pt --disable-sam --imgsz 640 --device cpu --output-video outputs\test_no_sam_annotated.mp4 --output-json outputs\test_no_sam_events.json --output-tracks outputs\test_no_sam_tracks.csv
python scripts\run_video.py --source "C:\Users\Misha\Desktop\my_video.mp4" --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 320 --device cpu --output-video outputs\my_video_annotated.mp4 --output-json outputs\my_video_events.json --output-tracks outputs\my_video_tracks.csv
--source Webcam camera index, video path, RTSP URL, etc. Webcam default: 0 Video default: input/test_video.mp4
--weights YOLO model weights. Use yolo11n.pt before custom training. Use runs\screening\yolo_screening_detector\weights\best.pt after training.
--tracker BoT-SORT tracker config path. Default: configs/botsort_reid.yaml
--classes Class grouping config path. Default: configs/classes.yaml
--risk Risk/event threshold config path. Default: configs/risk_config.yaml
--memory Project MemoryBank/ReID config path. Default: configs/tracking_memory.yaml
--output-video Output annotated MP4 path.
--output-json Output event JSON path.
--output-tracks Output per-frame tracks CSV path.
--conf YOLO global confidence threshold. Lower values keep weak bag detections but can create more false positives. Recommended live tuning: 0.25 or 0.22.
--imgsz YOLO image size. 320 = faster, less detail. 480 = good webcam compromise. 640 = better boxes/crops, slower on CPU.
--device cpu Force CPU.
--device 0 Use CUDA GPU 0 if available.
--max-frames Stop after N frames. Useful for quick tests.
--sam-weights SAM/FastSAM weights path. For CPU demos, pass FastSAM-s.pt explicitly.
--sam-every-n Run SAM once every N frames. Smaller = smoother masks but slower. Larger = faster but masks update less often.
--sam-max-objects Maximum detections to segment per SAM pass. Use 1 or 2 on CPU.
--sam-classes Comma-separated class names that SAM may segment. Example: backpack,handbag,suitcase
--prefer-sam-masks Use SAM mask geometry/appearance for configured object classes. Good for bags if YOLO boxes are rough.
--sam-tracking-classes Classes whose SAM masks may affect tracking geometry/appearance. Recommended: backpack,handbag,suitcase
--no-reuse-masks Do not reuse old SAM masks between SAM passes. Use if masks look stale or wrong.
--disable-sam Turn off SAM completely. Use for ReID/debug/FPS tests. Do not use if you expect masks/cropping.
--display Show preview window. Press q to stop.
--no-save-video Do not write annotated MP4.
--no-save-tracks Do not write tracks CSV.
--no-trails Do not draw movement trails.
--no-owner-links Do not draw person-bag owner links. Do not use this if the demo goal is to show bag-person connections.
Some branches may still include legacy privacy flags such as --blur-faces and --pause-recording-on-face. These are face detection/blurring utilities, not face recognition, and they are will not be in the final solution.
Instead of blocking wrong classes one by one, the pipeline keeps only project-relevant classes.
Example config in configs/tracking_memory.yaml:
target_classes:
- person
- backpack
- handbag
- suitcase
- trolley_bag
- cell phone
- phone
- laptop
- bottle
- suspicious_object
- dangerous_object
- knife
- gun
- weapon
Large boxes can hide smaller objects. For example:
- a backpack overlaps a person box,
- a phone or laptop is inside a person box,
- a visible object is attached to or on top of a bag,
- full-frame YOLO misses it because it is small or visually dominated by the parent object.
Nested ROI Search fixes this by running a second YOLO pass inside selected parent boxes.
Pipeline section:
full-frame YOLO + tracking -> choose selected parent boxes -> crop person/bag ROI with padding -> run YOLO predict() inside ROI -> map detections back to full-frame coordinates -> remove duplicates -> continue with SAM/ReID/memory/event logic
Important: ROI search uses YOLO predict(), not track(), so it should not corrupt the main BoT-SORT tracker state.
Typical config:
roi_inner_search:
enabled: true
every_n_frames: 10
parent_classes: ["person", "backpack", "handbag", "suitcase", "trolley_bag"]
max_parent_rois_per_frame: 2
max_inner_detections_per_roi: 5
min_parent_confidence: 0.25
roi_confidence: 0.18
roi_imgsz: 320
padding_ratio: 0.12Exclusion rules:
Inside a person box: do not search for another person.
Inside a backpack/handbag/suitcase/trolley_bag: do not search for person or other bag classes.
Useful inner classes: phone, cell phone, laptop, bottle, suspicious_object, dangerous_object, knife, gun, weapon.
YOLO/BoT-SORT local IDs are useful but fragile. They can change when a person leaves the frame, becomes occluded, or is missed for several frames.
The project adds a MemoryBank layer:
BoT-SORT local ID: L4 Project global ID: G2 Displayed label: person G2 L4
For people, OSNet extracts an embedding from person crops:
person crop -> OSNet -> embedding vector -> compare to MemoryBank
Current ReID improvements:
- strict Torchreid/OSNet backend,
- bad-crop guard,
- multiple good snapshots per person,
- best-snapshot matching instead of simple averaging,
- entry/exit side continuity boost,
- group-safe assignment so visible people do not collapse into one ID,
- offline merge helper for post-run cleanup.
Bad-crop guard avoids updating memory from:
- side-clipped people,
- top-clipped people,
- tiny boxes,
- strange aspect ratios,
- partial bodies at frame edges.
This protects long-term memory but can make close webcam demos stricter. Stand farther from the camera so the full body is visible.
SAM/FastSAM is mainly used for selected object masks, especially:
backpack handbag suitcase trolley_bag weapons
It is not the main person identity mechanism. Person identity is handled by YOLO boxes + BoT-SORT + OSNet ReID.
Bag ownership is estimated using visual relationship history.
A bag/object track stores fields such as:
owner_scores owner_contact_frames owner_separation_frames owner_last_near_frame owner_link_strength owner_last_distance_px
Example logic:
bag G4 was near or overlapping person G2 for several seconds bag G4 later became stationary person G2 moved away => possible unattended bag event
This is an estimate.
Examples:
outputs\webcam_annotated.mp4 outputs\test_annotated.mp4 outputs\annotated_video.mp4
Shows:
bounding boxes G global IDs L local tracker IDs confidence values movement trails person-bag owner links SAM masks for selected objects event messages
Examples:
outputs\webcam_events.json outputs\test_events.json outputs\events.json
Contains events such as:
track_reidentified possible_unattended_bag offline_track_merge relationship/risk events
Examples:
outputs\webcam_tracks.csv outputs\test_tracks.csv outputs\tracks.csv
Useful columns:
frame global_id raw_global_id offline_merged_into local_tracker_id class_name confidence bbox_x1, bbox_y1, bbox_x2, bbox_y2 center_x, center_y owner_id owner_link_strength owner_contact_frames owner_separation_frames owner_last_distance_px mask_area used_mask_geometry crop_quality crop_quality_reason last_observed_side exit_side entry_side snapshot_count reidentified_count detection_source parent_class_name roi_level
ROI-related examples:
detection_source = main # normal full-frame YOLO/BoT-SORT detection detection_source = roi:person # object found inside a person ROI detection_source = roi:backpack # object found inside a backpack ROI
Controls BoT-SORT short-term tracking.
Useful parameters:
track_buffer: 180 match_thresh: 0.72 track_high_thresh: 0.25 new_track_thresh: 0.32 with_reid: true
If the tracker loses people too quickly:
increase track_buffer slightly lower match_thresh
If the tracker merges different people:
increase match_thresh increase appearance/ReID thresholds
Controls project-level global memory, ReID, filters, class whitelist, SAM tracking classes, ROI search, and offline merge.
Important sections:
target_classes:
min_conf_by_class:
min_area_ratio_by_class:
max_area_ratio_by_class:
sam_tracking_classes:
person_reid: enabled: true backend: torchreid model_name: osnet_x0_25 allow_torchvision_fallback: false require_backend: true
group_safe_assignment: enabled: true
roi_inner_search: enabled: true
offline_merge: enabled: true
Controls class groups.
Example:
person_classes:
- person
bag_classes:
- backpack
- handbag
- suitcase
- trolley_bag
suspicious_classes:
- suspicious_object
- dangerous_object
Controls risk and event thresholds, especially person-bag ownership and unattended-bag logic.
For live demos, relationship thresholds may need to be more forgiving than final evaluation thresholds.
Example demo-style association tuning:
association: max_owner_distance_px: 260.0 min_owner_score: 0.25 motion_window_frames: 12 score_decay: 0.98 score_gain: 0.22 min_contact_frames: 8 contact_distance_px: 240.0 separation_distance_px: 320.0
YOLO training dataset config.
Example:
path: datasets/screening_dataset
train: images/train val: images/val
names: 0: person 1: backpack 2: handbag 3: suitcase 4: trolley_bag 5: suspicious_object 6: dangerous_object 7: phone 8: laptop 9: bottle
Adjust this before training if the class list changes.
Try:
--conf 0.25If still too strict:
--conf 0.22Also lower per-class bag thresholds only if needed:
min_conf_by_class:
backpack: 0.22
handbag: 0.22
suitcase: 0.25Raise person-specific confidence:
min_conf_by_class:
person: 0.60If still bad:
min_conf_by_class:
person: 0.65Also use bbox shape/area filters and avoid static clutter zones.
Check owner-link TTL/display settings in the visualization or memory configuration. Relationship memory should persist internally, but visible lines should only be drawn when both tracks were seen recently.
Also check:
- Did the bag/person actually disappear from tracking?
- Is FPS very low, causing frame-based TTL to feel too long?
- Is the owner-link drawing using last-seen tracks instead of current/recent tracks?
Check:
- Did you use --no-owner-links?
- Did YOLO detect the bag as backpack/handbag/suitcase?
- Was the bag close to the person long enough?
- Are risk_config.yaml association thresholds too strict?
- Is FPS very low?
Try one of these:
--sam-every-n 10--no-reuse-masksor remove:
--prefer-sam-masksFor tracking stability, it is often better to let YOLO boxes drive tracking and use SAM mainly for visualization/mask geometry.
Fastest command:
python scripts\run_webcam.py --weights yolo11n.pt --disable-sam --imgsz 320 --device cpu --display --no-save-videoBalanced CPU command:
python scripts\run_webcam.py --weights yolo11n.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 30 --sam-max-objects 1 --sam-classes backpack,handbag,suitcase --imgsz 320 --device cpu --displayCheck CUDA:
python -c "import torch; print('CUDA:', torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU only')"If CUDA is available:
python scripts\run_webcam.py --weights yolo11n.pt --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase --prefer-sam-masks --imgsz 640 --device 0 --displaySymptom:
Torchreid/OSNet backend requested but unavailable
OSNet is not available
Most likely cause:
pip installed a too-new PyTorch/Numpy stack for old Torchreid.
Fix:
deactivate
rmdir /s /q .venv
py -3.12 -m venv .venv
.venv\Scripts\activate.bat
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements_windows_stable_reid.txt
python scripts\check_reid_backend.pyExpected:
OSNet is available.
Symptom:
Installed packages show cp314
OSNet/Torchreid does not load correctly
Fix:
py -0p
py -3.12 -m venv .venvUse Python 3.12 for this project.
Symptom:
ModuleNotFoundError: No module named 'tensorboard'
Fix:
python -m pip install tensorboardBetter fix: use requirements_windows_stable_reid.txt, which includes TensorBoard.
Symptom:
ModuleNotFoundError: No module named 'torchreid.utils'
Cause:
Some torchreid installs expose FeatureExtractor at torchreid.reid.utils instead of torchreid.utils.
Fix:
Use the patched project files where check_reid_backend.py and deep_reid.py try both:
from torchreid.utils import FeatureExtractor
# fallback:
from torchreid.reid.utils import FeatureExtractorSymptom:
Downloading...
From: https://drive.google.com/...
Download failed
Fixes:
- Check internet connection.
- Try again later; Google Drive sometimes rate-limits.
- Keep gdown==4.7.3 from the stable requirements.
- Check whether the file already exists in:
C:\Users\<username>\.cache\torch\checkpoints\osnet_x0_25_imagenet.pth
Cause:
You have not trained a custom model yet, or the path is wrong.
Fix:
--weights yolo11n.ptUse best.pt only after training creates it or after you place the trained weights in the expected folder.
Cause:
input\test_video.mp4 does not exist.
Fix:
Put a file at input\test_video.mp4
or pass a full path:
python scripts\run_video.py --source "C:\Users\Misha\Desktop\my_video.mp4" --weights yolo11n.pt --disable-samCause:
You are probably not running from the project root, or the src folder is missing.
Fix:
cd C:\Users\Misha\Desktop\screening_ai_project_deep_reid
python scripts\smoke_test.pyTry camera index 1:
python scripts\run_webcam.py --source 1 --weights yolo11n.pt --disable-sam --imgsz 480 --device cpu --displayAlso close other apps using the camera.
Check whether you used:
--no-save-videoAlso check that the outputs folder exists:
python scripts\create_dataset_folders.pyPossible causes:
- Person leaves frame for too long.
- Person re-enters with different pose/scale/lighting.
- Person crop is partial or side-clipped.
- Bad-crop guard refuses to update memory from partial crops.
- OSNet thresholds are strict to avoid wrong merges.
- FPS is low, causing fewer good observations.
Fixes:
- Keep the full body visible during the demo.
- Improve lighting.
- Use --imgsz 480 or --imgsz 640.
- Avoid standing too close to the camera.
- Use prerecorded video and the offline merge helper for final reports.
Fixes:
- Increase ReID/appearance thresholds in configs/tracking_memory.yaml.
- Keep group-safe assignment enabled.
- Avoid testing with two people in nearly identical clothing at first.
- Increase YOLO image size.
The pretrained yolo11n.pt model is only a baseline. For the final project, train YOLO on project-specific classes and camera angles.
Step 1: Collect safe project videos/images
Step 2: Extract useful frames
Step 3: Label frames in YOLO format
Step 4: Split data into train/val
Step 5: Update configs/data.yaml
Step 6: Train YOLO
Step 7: Validate results
Step 8: Run the pipeline with best.pt
Step 9: Tune tracking/memory/risk thresholds
Step 10: Add SAM for trained object classes
datasets/screening_dataset/
|-- images/
| |-- train/
| |-- val/
|-- labels/
| |-- train/
| |-- val/
Each image needs a matching .txt label file:
images/train/frame_000123.jpg
labels/train/frame_000123.txt
Each label row:
class_id center_x center_y width height
All values are normalized from 0 to 1.
Example:
0 0.512 0.438 0.231 0.604
Meaning:
class 0, center x=0.512, center y=0.438, width=0.231, height=0.604
Start simple:
person
backpack
handbag
suitcase
trolley_bag
phone
laptop
bottle
suspicious_object
dangerous_object
If you train or integrate a separate detector, make sure the class names match the names in configs/data.yaml, configs/classes.yaml, and target_classes in configs/tracking_memory.yaml.
For restricted/dangerous-object demonstrations, use only safe approved lab props, institution-approved datasets, or synthetic/clearly non-functional examples. Do not collect data with real dangerous items.
python scripts\create_dataset_folders.pypython scripts\train_yolo.py --data configs\data.yaml --base yolo11n.pt --epochs 80 --imgsz 640 --batch 8 --device cpuCPU training is slow.
python scripts\train_yolo.py --data configs\data.yaml --base yolo11n.pt --epochs 80 --imgsz 640 --batch 8 --device 0Expected trained model:
runs\screening\yolo_screening_detector\weights\best.pt
Webcam:
python scripts\run_webcam.py --weights runs\screening\yolo_screening_detector\weights\best.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase,suspicious_object,dangerous_object --prefer-sam-masks --imgsz 640 --device cpu --displayPrerecorded video:
python scripts\run_video.py --source input\test_video.mp4 --weights runs\screening\yolo_screening_detector\weights\best.pt --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 20 --sam-max-objects 3 --sam-classes backpack,handbag,suitcase,suspicious_object,dangerous_object --prefer-sam-masks --imgsz 640 --device cpu --output-video outputs\trained_annotated.mp4 --output-json outputs\trained_events.json --output-tracks outputs\trained_tracks.csvEarlier notes mention a trained knife/gun detector with strong validation metrics. Before documenting it as part of the final pipeline, verify that the actual weights file exists in the repository or shared drive and that run_webcam.py / run_video.py can load it directly or through a multi-model integration layer.
Do not claim a trained detector is active in the demo unless the command actually uses its weights.
The dual-YOLO mode keeps the original general detector as the tracked stream and adds a second predict-only detector:
yolo11n.pt track() -> person/backpack/handbag/suitcase tracking
best_v2.pt predict() -> risk-object boxes
merged detections -> filtering -> temporal confirmation -> SAM -> MemoryBank -> risk/person-link logic
New flags:
--secondary-weights Optional second YOLO weights path, for example best_v2.pt
--secondary-target-classes Comma-separated secondary classes to keep
--secondary-conf Secondary YOLO confidence threshold
--secondary-imgsz Secondary YOLO image size
--secondary-iou Secondary NMS IoU threshold; lower reduces duplicate boxes
--secondary-max-det Maximum secondary boxes per frame
--disable-risk-smoothing Debug only; lets raw risk detections pass immediately
--weapon-confirm-window Override risk confirmation window size
--weapon-confirm-min-hits Override required hits inside the window
Default temporal confirmation is configured in configs/risk_config.yaml: 3 matching hits in a 5-frame window. Only confirmed risk-class detections are drawn, sent to SAM, linked to people, written to CSV, and logged to events JSON.
The current dual-YOLO pipeline now separates people/bags from risk-object alerting:
- people and bags still use normal
G#MemoryBank/ReID tracking; - secondary-model risk detections are grouped into temporary
R#risk clusters; - short confirmed detections create
risk_warningevents; - long person-linked clusters create
risk_object_confirmedevents; - repeated warnings for the same person create
risk_repeated_warningevents; - the live screen shows security alerts on the bottom-right and tracking/ReID debug messages on the bottom-left.
Important flags:
--risk-cluster-match-center-px 95
--risk-cluster-ttl-frames 45
--risk-confirm-linked-frames 30
--risk-warning-cooldown-frames 30
--risk-repeat-warning-window-frames 300
--risk-repeat-warning-count 10
--event-overlay-position bottom-right
--debug-overlay-position bottom-leftRecommended strict demo command:
python scripts\run_webcam.py --weights yolo11n.pt --secondary-weights best_v2.pt --secondary-target-classes knife,gun --secondary-conf 0.55 --secondary-iou 0.30 --weapon-confirm-window 7 --weapon-confirm-min-hits 4 --weapon-confirm-match-center-px 80 --risk-confirm-linked-frames 30 --risk-repeat-warning-count 10 --conf 0.25 --sam-weights FastSAM-s.pt --sam-every-n 15 --sam-max-objects 2 --sam-classes backpack,handbag,suitcase,knife,gun --prefer-sam-masks --sam-tracking-classes backpack,handbag,suitcase,knife,gun --imgsz 640 --device cpu --display --display-scale 1.5 --label-scale 0.42 --event-overlay-position bottom-right --debug-overlay-position bottom-left --output-video outputs\webcam_dual_yolo_cluster.mp4 --output-json outputs\webcam_dual_yolo_cluster_events.json --output-tracks outputs\webcam_dual_yolo_cluster_tracks.csv