Skip to content

AI_Grounded with SAM2

ehdwo0427 edited this page Aug 4, 2025 · 2 revisions

상위 문서로 이동 : AI Wiki

Grounded with SAM2

  • We use Grounded SAM2 to automatically detect and segment objects in an image based on a text prompt (e.g., "monitor. keyboard. mouse.").
  • It combines Grounding DINO (for text-based object detection) with SAM2 (for high-quality segmentation), producing pixel-accurate masks for each object.

Masks are used

  • visualize object boundaries
  • generate per-class binary masks
  • or apply inpainting models to replace or modify specific regions

Example Output

Test1

  • Origin Image test

  • Grounded SAM2.1 Unknown-3

  • Mask

desk deskmat laptop monitor mouse
desk deskmat laptop monitor_desktop mouse

Test2

  • Origin Image test

  • Grounded SAM2.1 bounding box with mask

  • Mask

desk deskmat monitor
desk deskmat monitor_desktop
keyboard mouse speaker
keyboard mouse speaker_speaker

Test3

  • Origin Image test3

  • Grounded SAM2.1 Unknown-3

  • Mask

desk deskmat monitor
desk deskmat monitor_desktop
laptop keyboard mouse speaker
laptop keyboard image speaker_speaker

Test4

  • Origin Image test4

  • Grounded SAM2.1 Unknown-4

  • Mask

desk monitor desktop
desk monitor_desktop laptop_desktop
keyboard mouse speaker
keyboard mouse speaker_speaker

Issues

  • When saving masks, objects with the same class name overwrite each other. Only the last instance is preserved
  • In some cases (e.g., test1), unintended objects such as the other person's mouse may be detected
  • If the label confidence is low or ambiguous, a single object may be split into multiple segments (e.g., "desktop_monitor" detected as two parts)

Next Steps

  • Merge all masks into a single combined mask
  • Apply SDXL inpainting using the original image and the merged mask
  • Fine-tune the SDXL inpainting model for better domain-specific results

Reference

Woody's AI Backend Engineering Log


💼 About

Deepvisions | AI Engineer 2026.03 ~ 재직중


🚀 Projects (최신순)

CCTV 자전거 경로 & 공회전 탐지 — 한동대학교 리빙랩

2026.05 ~ | @ Deepvisions 캠퍼스 CCTV 4대 · 자전거 OCR + 차량 공회전 다중 신호

야생동물 탐지 — RPi 엣지 배포

2026.04 ~ | @ Deepvisions 포도밭 침입 탐지 (5종 multi-class · 라즈베리파이 4 실시간)

포도밭 병해충 탐지 및 수확량 예측

2026.03 ~ | @ Deepvisions 드론 이미지 기반 객체 탐지 + GSD calibration + 수확량 예측


📦 종료된 프로젝트

OnTheTop

2025.03 ~ 2025.08 | 카카오테크부트캠프 | ✅ 종료 AI 기반 데스크테리어 추천 서비스


AI Notes


About

Clone this wiki locally