Skip to content

Latest commit

ย 

History

History
144 lines (98 loc) ยท 5.48 KB

monocular_depth_estimation.md

File metadata and controls

144 lines (98 loc) ยท 5.48 KB

๋‹จ์ผ ์˜์ƒ ๊ธฐ๋ฐ˜ ๊นŠ์ด ์ถ”์ •[[depth-estimation-pipeline]]

๋‹จ์ผ ์˜์ƒ ๊ธฐ๋ฐ˜ ๊นŠ์ด ์ถ”์ •์€ ํ•œ ์žฅ๋ฉด์˜ ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ ์žฅ๋ฉด์˜ ๊นŠ์ด ์ •๋ณด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋‹จ์ผ ์นด๋ฉ”๋ผ ์‹œ์ ์˜ ์žฅ๋ฉด์— ์žˆ๋Š” ๋ฌผ์ฒด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

๋‹จ์ผ ์˜์ƒ ๊ธฐ๋ฐ˜ ๊นŠ์ด ์ถ”์ •์€ 3D ์žฌ๊ตฌ์„ฑ, ์ฆ๊ฐ• ํ˜„์‹ค, ์ž์œจ ์ฃผํ–‰, ๋กœ๋ด‡ ๊ณตํ•™ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์‘์šฉ๋ฉ๋‹ˆ๋‹ค. ์กฐ๋ช… ์กฐ๊ฑด, ๊ฐ€๋ ค์ง, ํ…์Šค์ฒ˜์™€ ๊ฐ™์€ ์š”์†Œ์˜ ์˜ํ–ฅ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ์žฅ๋ฉด ๋‚ด ๋ฌผ์ฒด์™€ ํ•ด๋‹น ๊นŠ์ด ์ •๋ณด ๊ฐ„์˜ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ์ด ์ดํ•ดํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๊นŒ๋‹ค๋กœ์šด ์ž‘์—…์ž…๋‹ˆ๋‹ค.

์ด ์ž‘์—…๊ณผ ํ˜ธํ™˜๋˜๋Š” ๋ชจ๋“  ์•„ํ‚คํ…์ฒ˜์™€ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ณด๋ ค๋ฉด ์ž‘์—… ํŽ˜์ด์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ ๊ฐ€์ด๋“œ์—์„œ ๋ฐฐ์šธ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๊นŠ์ด ์ถ”์ • ํŒŒ์ดํ”„๋ผ์ธ ๋งŒ๋“ค๊ธฐ
  • ์ง์ ‘ ๊นŠ์ด ์ถ”์ • ์ถ”๋ก ํ•˜๊ธฐ

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—, ํ•„์š”ํ•œ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

pip install -q transformers

๊นŠ์ด ์ถ”์ • ํŒŒ์ดํ”„๋ผ์ธ[[depth-estimation-inference-by-hand]]

๊นŠ์ด ์ถ”์ •์„ ์ถ”๋ก ํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ ํ•ด๋‹น ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” [pipeline]์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Hugging Face Hub ์ฒดํฌํฌ์ธํŠธ์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import pipeline

>>> checkpoint = "vinvino02/glpn-nyu"
>>> depth_estimator = pipeline("depth-estimation", model=checkpoint)

๋‹ค์Œ์œผ๋กœ, ๋ถ„์„ํ•  ์ด๋ฏธ์ง€๋ฅผ ํ•œ ์žฅ ์„ ํƒํ•˜์„ธ์š”:

>>> from PIL import Image
>>> import requests

>>> url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> image
Photo of a busy street

์ด๋ฏธ์ง€๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.

>>> predictions = depth_estimator(image)

ํŒŒ์ดํ”„๋ผ์ธ์€ ๋‘ ๊ฐœ์˜ ํ•ญ๋ชฉ์„ ๊ฐ€์ง€๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” predicted_depth๋กœ ๊ฐ ํ”ฝ์…€์˜ ๊นŠ์ด๋ฅผ ๋ฏธํ„ฐ๋กœ ํ‘œํ˜„ํ•œ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ํ…์„œ์ž…๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” depth๋กœ ๊นŠ์ด ์ถ”์ • ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋Š” PIL ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค.

์ด์ œ ์‹œ๊ฐํ™”ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

>>> predictions["depth"]
Depth estimation visualization

์ง์ ‘ ๊นŠ์ด ์ถ”์ • ์ถ”๋ก ํ•˜๊ธฐ[[depth-estimation-inference-by-hand]]

์ด์ œ ๊นŠ์ด ์ถ”์ • ํŒŒ์ดํ”„๋ผ์ธ ์‚ฌ์šฉ๋ฒ•์„ ์‚ดํŽด๋ณด์•˜์œผ๋‹ˆ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณต์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Hugging Face Hub ์ฒดํฌํฌ์ธํŠธ์—์„œ ๋ชจ๋ธ๊ณผ ๊ด€๋ จ ํ”„๋กœ์„ธ์„œ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ด์ „์— ์‚ฌ์šฉํ•œ ์ฒดํฌํฌ์ธํŠธ์™€ ๋™์ผํ•œ ๊ฒƒ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import AutoImageProcessor, AutoModelForDepthEstimation

>>> checkpoint = "vinvino02/glpn-nyu"

>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)
>>> model = AutoModelForDepthEstimation.from_pretrained(checkpoint)

ํ•„์š”ํ•œ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์„ ์ฒ˜๋ฆฌํ•˜๋Š” image_processor๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€ ์ž…๋ ฅ์„ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. image_processor๋Š” ํฌ๊ธฐ ์กฐ์ • ๋ฐ ์ •๊ทœํ™” ๋“ฑ ํ•„์š”ํ•œ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

>>> pixel_values = image_processor(image, return_tensors="pt").pixel_values

์ค€๋น„ํ•œ ์ž…๋ ฅ์„ ๋ชจ๋ธ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:

>>> import torch

>>> with torch.no_grad():
...     outputs = model(pixel_values)
...     predicted_depth = outputs.predicted_depth

๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค:

>>> import numpy as np

>>> # ์›๋ณธ ์‚ฌ์ด์ฆˆ๋กœ ๋ณต์›
>>> prediction = torch.nn.functional.interpolate(
...     predicted_depth.unsqueeze(1),
...     size=image.size[::-1],
...     mode="bicubic",
...     align_corners=False,
... ).squeeze()
>>> output = prediction.numpy()

>>> formatted = (output * 255 / np.max(output)).astype("uint8")
>>> depth = Image.fromarray(formatted)
>>> depth
Depth estimation visualization