SAM(Segmentation Anything Model), Mask R-CNN, DeepLabV3, YOLO-Seg

- 파이프라인
  - 1. 비디오 프레임 추출
    - OpenCV를 이용 동영상을 프레임단위로 나눔
  - 2. 객체 세그먼테이션(SAM사용)
    - SAM을 이용해 각 프레임에서 특정 객체를 픽셀 단위로 분할
  - 3. 마스크를 원본 프레임에 오버레이
    - 세그먼테이션된 마스크를 투명한 색상으로 원본 프레임위에 시각적으로 합성
  - 4. 새로운 동영상
    - 각각 처리한 프레임을 하나의 동영상으로 합침

라이브러리 설치

In [1]:
!pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting

In [2]:
!pip install opencv-python numpy matplotlib



In [4]:
!pip install git+https://github.com/facebookresearch/segment-anything.git

Collecting git+https://github.com/facebookresearch/segment-anything.git
  Cloning https://github.com/facebookresearch/segment-anything.git to /tmp/pip-req-build-hz21kewr
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/segment-anything.git /tmp/pip-req-build-hz21kewr
  Resolved https://github.com/facebookresearch/segment-anything.git to commit dca509fe793f601edb92606367a655c15ac00fdf
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: segment_anything
  Building wheel for segment_anything (setup.py) ... [?25l[?25hdone
  Created wheel for segment_anything: filename=segment_anything-1.0-py3-none-any.whl size=36592 sha256=f180e078d64552c3a0a7320e154bd1f7f39e154d0a6063525fb37aba129a5601
  Stored in directory: /tmp/pip-ephem-wheel-cache-kdl088gh/wheels/15/d7/bd/05f5f23b7dcbe70cbc6783b06f12143b0cf1a5da5c7b52dcc5
Successfully built segment_anything
Installing collected packages: segment_anything
Successfully 

In [5]:
!pip install gdown



SAM 모델 가중치 다운로드

In [6]:
import os
import gdown
# 모델 가중치 다운로드
sam_checkpoint = 'sam_vit_h_4b8939.pth'
model_url = 'https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth'
# 가중치 파일이 없으면 다운로드
if not os.path.exists(sam_checkpoint):
  print('weight downloading...')
  gdown.download(model_url,sam_checkpoint)
else:
  print('weight already exist.')

Downloading...
From: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
To: /content/sam_vit_h_4b8939.pth


weight downloading...


100%|██████████| 2.56G/2.56G [00:18<00:00, 141MB/s]


youtube 동영상 다운로드

In [7]:
!pip install yt-dlp

Collecting yt-dlp
  Downloading yt_dlp-2025.1.26-py3-none-any.whl.metadata (172 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/172.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m172.0/172.0 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading yt_dlp-2025.1.26-py3-none-any.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m77.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp
Successfully installed yt-dlp-2025.1.26


In [8]:
# 다운로드할 youtube 영상
youtube_url = 'https://www.youtube.com/watch?v=7X-haM3KJRI'
# mp4 형식으로 다운로드
!yt-dlp -f best -o 'sample_vidoe.mp4' {youtube_url}

         To let yt-dlp download and merge the best available formats, simply do not pass any format selection.
[youtube] Extracting URL: https://www.youtube.com/watch?v=7X-haM3KJRI
[youtube] 7X-haM3KJRI: Downloading webpage
[youtube] 7X-haM3KJRI: Downloading tv client config
[youtube] 7X-haM3KJRI: Downloading player f3d47b5a
[youtube] 7X-haM3KJRI: Downloading tv player API JSON
[youtube] 7X-haM3KJRI: Downloading ios player API JSON
[youtube] 7X-haM3KJRI: Downloading m3u8 information
[info] 7X-haM3KJRI: Downloading 1 format(s): 18
[download] Destination: sample_vidoe.mp4
[K[download] 100% of    1.55MiB in [1;37m00:00:00[0m at [0;32m6.46MiB/s[0m


SAM 모델 로드 및 비디오 처리

In [None]:
import cv2
import torch
import numpy as np
import matplotlib.pyplot as plt
from segment_anything import sam_model_registry, SamPredictor
from PIL import Image

# sam 모델 로드
model_type = 'vit_h'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint).to(device)
predictor = SamPredictor(sam)

# 비디오 파일 로드
video_path = '/content/sample_vidoe.mp4'
cap = cv2.VideoCapture(video_path)
frame_width =  int(cap.get(3))
frame_height = int(cap.get(4))
fps = int(cap.get(cv2.CAP_PROP_FPS))

# 비디오 저장 설정
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_path = '/content/output_video.mp4'
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))

# 프레임별 세그먼테이션 및 시각화
frame_count = 0
while cap.isOpened():
  ret, frame = cap.read()
  if not ret:
    break  # 동영상 끝
  frame_count += 1
  print(f"Processing frame {frame_count}...")
  # RGB 변환(OpenCV는 RGB를 사용하므로 변환 필요)
  image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
  # PIL 이미지로 변환
  image_pil = Image.fromarray(image)
  # sam 모델 입력을 위한 전처리
  predictor.set_image(np.array(image_pil))
  # 관심 객체의 박스 입력(중앙 영역을 임시로 지정)
  h,w,_ = image.shape
  input_box = np.array([w//4, h//4, w*3//4, h*3//4]) # 중앙 사각형 영역
  # sam 예측 수행
  masks, scores, logits = predictor.predict(box = input_box,multimask_output=False)
  mask = masks[0]  # 가장 확률이 높은 마스트 선택
  # 마스크를 원본 이미지에 오버레이
  mask_overlay = np.zeros_like(frame, dtype=np.uint8)
  mask_overlay[mask] = (0,255,0) # 녹색 마스크 적용
  # 투명도 조절
  alpha = 0.5
  blended = cv2.addWeighted(frame, 1-alpha, mask_overlay, alpha, 0)
  # 결과 영상 저장
  out.write(blended)

# 자원 해제
cap.release()
out.release()
cv2.destroyAllWindows()
print('segmentatino complete',output_path)

  state_dict = torch.load(f)


Processing frame 1...
Processing frame 2...
Processing frame 3...
Processing frame 4...
Processing frame 5...
Processing frame 6...
Processing frame 7...
