このコードは全てのtest画像のシーン名_input_base.jpgから1度の処理でまとめて画像修復します。
このコード内の指示に従いtest_data内にシーン名_input_base.jpgのアップロードを左上の↑マークからお願いいたします。test_dataは添付したフォルダのファイル名で動きます。

# 各種インポート

In [None]:
import os
import base64
from base64 import b64decode
import cv2
import torch
import urllib.request
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
import glob
from shutil import copyfile
import shutil
import copy

In [None]:
!pip install timm

Collecting timm
  Downloading timm-0.5.4-py3-none-any.whl (431 kB)
[?25l[K     |▊                               | 10 kB 17.1 MB/s eta 0:00:01[K     |█▌                              | 20 kB 12.4 MB/s eta 0:00:01[K     |██▎                             | 30 kB 9.9 MB/s eta 0:00:01[K     |███                             | 40 kB 8.8 MB/s eta 0:00:01[K     |███▉                            | 51 kB 4.3 MB/s eta 0:00:01[K     |████▋                           | 61 kB 5.1 MB/s eta 0:00:01[K     |█████▎                          | 71 kB 5.7 MB/s eta 0:00:01[K     |██████                          | 81 kB 5.9 MB/s eta 0:00:01[K     |██████▉                         | 92 kB 6.1 MB/s eta 0:00:01[K     |███████▋                        | 102 kB 5.3 MB/s eta 0:00:01[K     |████████▍                       | 112 kB 5.3 MB/s eta 0:00:01[K     |█████████▏                      | 122 kB 5.3 MB/s eta 0:00:01[K     |█████████▉                      | 133 kB 5.3 MB/s eta 0:00:01[K     |█

In [None]:
# test_data内にscene_o_0019_input_base.jpg等のscene名_input_base.jpgを全てアップロードする
os.mkdir("/content/test_data")

In [None]:
# predict_imgにファイルをフォルダごと分けて保存
os.mkdir("/content/predict_img")
fnames = sorted(glob.glob("/content/test_data/*_input_base.jpg"))
for fname in fnames:
    path_name = "/content/predict_img/" + fname.split("/")[-1].replace("_input_base.jpg", "")
    os.mkdir(path_name)
    shutil.copy(fname, path_name)

!rm -r test_data  # test_dataの削除（不要なため）

# 学習済みモデルのダウンロードと関数の定義

## MiDaS深度推定
### This notebook is optionally accelerated with a GPU runtime.
### If you would like to use this acceleration, please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# MiDaS

*Author: Intel ISL*

**MiDaS models for computing relative depth from a single image.**

<img src="https://pytorch.org/assets/images/midas_samples.png" alt="alt" width="50%"/>


### Model Description

[MiDaS](https://arxiv.org/abs/1907.01341) computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using
multi-objective optimization to ensure high quality on a wide range of inputs.

### Dependencies

MiDaS depends on [timm](https://github.com/rwightman/pytorch-image-models). Install with

In [None]:
model_type_large = "DPT_Large"     # MiDaS v3 - Large     (highest accuracy, slowest inference speed)
model_type_hybrid = "DPT_Hybrid"   # MiDaS v3 - Hybrid    (medium accuracy, medium inference speed)

midas_large = torch.hub.load("intel-isl/MiDaS", model_type_large)
midas_hybrid = torch.hub.load("intel-isl/MiDaS", model_type_hybrid)

# 深度推定large
device_large = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# 深度推定hybrid
device_hybrid = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# modelのtransformを指定
midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
transform = midas_transforms.dpt_transform

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
Downloading: "https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-midas-2f21e586.pt" to /root/.cache/torch/hub/checkpoints/dpt_large-midas-2f21e586.pt


  0%|          | 0.00/1.28G [00:00<?, ?B/s]

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master
Downloading: "https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid-midas-501f0c75.pt" to /root/.cache/torch/hub/checkpoints/dpt_hybrid-midas-501f0c75.pt


  0%|          | 0.00/470M [00:00<?, ?B/s]

Using cache found in /root/.cache/torch/hub/intel-isl_MiDaS_master


In [None]:
midas_large.to(device_large)
midas_large.eval()

DPTDepthModel(
  (pretrained): Module(
    (model): VisionTransformer(
      (patch_embed): PatchEmbed(
        (proj): Conv2d(3, 1024, kernel_size=(16, 16), stride=(16, 16))
        (norm): Identity()
      )
      (pos_drop): Dropout(p=0.0, inplace=False)
      (blocks): Sequential(
        (0): Block(
          (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=1024, out_features=3072, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=1024, out_features=1024, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (act): GELU()
            (drop1): Dropout(p=0.0, inplace=False)
            (fc2): Linear(in_features=4

In [None]:
midas_hybrid.to(device_hybrid)
midas_hybrid.eval()

DPTDepthModel(
  (pretrained): Module(
    (model): VisionTransformer(
      (patch_embed): HybridEmbed(
        (backbone): ResNetV2(
          (stem): Sequential(
            (conv): StdConv2dSame(3, 64, kernel_size=(7, 7), stride=(2, 2), bias=False)
            (norm): GroupNormAct(
              32, 64, eps=1e-05, affine=True
              (act): ReLU(inplace=True)
            )
            (pool): MaxPool2dSame(kernel_size=(3, 3), stride=(2, 2), padding=(0, 0), dilation=(1, 1), ceil_mode=False)
          )
          (stages): Sequential(
            (0): ResNetStage(
              (blocks): Sequential(
                (0): Bottleneck(
                  (downsample): DownsampleConv(
                    (conv): StdConv2dSame(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                    (norm): GroupNormAct(
                      32, 256, eps=1e-05, affine=True
                      (act): Identity()
                    )
                  )
                  (conv1): S

## LaMa画像修復
# 🦙 **LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions**

[[Project page](https://saic-mdal.github.io/lama-project/)] [[GitHub](https://github.com/saic-mdal/lama)] [[arXiv](https://arxiv.org/abs/2109.07161)] [[Supplementary](https://ashukha.com/projects/lama_21/lama_supmat_2021.pdf)] [[BibTeX](https://senya-ashukha.github.io/projects/lama_21/paper.txt)]

<p align="center" "font-size:30px;">
Our model generalizes surprisingly well to much higher resolutions (~2k❗️) than it saw during training (256x256), and achieves the excellent performance even in challenging scenarios, e.g. completion of periodic structures.
</p>

In [None]:
print('\n> Cloning the repo')
!git clone https://github.com/saic-mdal/lama.git

print('\n> Install dependencies')
!pip install -r lama/requirements.txt --quiet
!pip install wget --quiet
!pip install webdataset==0.1.103

print('\n> Changing the dir to:')
%cd /content/lama

print('\n> Download the model')
!curl -L $(yadisk-direct https://disk.yandex.ru/d/ouP6l8VJ0HpMZg) -o big-lama.zip
!unzip big-lama.zip

print('>fixing opencv')
!pip uninstall opencv-python-headless -y --quiet
!pip install opencv-python-headless==4.1.2.30 --quiet

import wget
print('\n> Init mask-drawing code')


> Cloning the repo
Cloning into 'lama'...
remote: Enumerating objects: 283, done.[K
remote: Counting objects: 100% (283/283), done.[K
remote: Compressing objects: 100% (205/205), done.[K
remote: Total 283 (delta 73), reused 265 (delta 66), pack-reused 0[K
Receiving objects: 100% (283/283), 6.49 MiB | 5.61 MiB/s, done.
Resolving deltas: 100% (73/73), done.

> Install dependencies
[K     |████████████████████████████████| 12.5 MB 4.4 MB/s 
[K     |████████████████████████████████| 22.3 MB 1.6 MB/s 
[K     |████████████████████████████████| 72 kB 673 kB/s 
[K     |████████████████████████████████| 144 kB 54.7 MB/s 
[K     |████████████████████████████████| 841 kB 44.4 MB/s 
[K     |████████████████████████████████| 271 kB 53.1 MB/s 
[K     |████████████████████████████████| 46 kB 4.2 MB/s 
[K     |████████████████████████████████| 948 kB 38.1 MB/s 
[K     |████████████████████████████████| 47.8 MB 43 kB/s 
[K     |████████████████████████████████| 74 kB 3.2 MB/s 
[K     |█

# 使用する関数

In [None]:
def judge_fence_thick(depth_bit_img, gray_img):
    judge_thick_img = np.zeros_like(depth_bit_img, dtype="float32")
    judge_thick_img = np.where(depth_bit_img == 255, gray_img, np.nan)
    judge_thick = np.nanvar(judge_thick_img)  # nan以外の分散を計算
    if judge_thick < 505:  # 分散がある閾値を超えたら太い柵とする
        return True
    else:
        return False

In [None]:
class Mask:
    def __init__(self, edge_img):
        self.edge_img = edge_img


    def judge_line(self, line_img, ratio=0.71):
        judge_img = self.edge_img * line_img
        if np.sum(judge_img) / np.sum(line_img) > ratio:
            return True
        else:
            return False


    def get_mask(self):
        mask_img = self.get_vertical_mask() + self.get_holizontal_mask()
        mask_img[mask_img > 1] = 1
        return mask_img


    def get_vertical_mask(self):
        cols, rows = self.edge_img.shape
        mask_pre_img = np.zeros([cols, rows*3], dtype="uint8")
        for _ in range(2):
            for row_low in range(0, rows, 5):
                for shift in range(0, rows*2-row_low, 5):
                    line_pre_img = np.zeros([cols, rows*3], dtype="uint8")
                    cv2.line(line_pre_img, (row_low+shift, 0), (rows+shift, cols-1), 1, 1)
                    line_img = line_pre_img[:, rows:rows*2]
                    if self.judge_line(line_img, 0.71):
                        cv2.line(mask_pre_img, (row_low+shift, 0), (rows+shift, cols-1), 1, 10)
            self.edge_img = cv2.flip(self.edge_img, 1)
            mask_pre_img = cv2.flip(mask_pre_img, 1)
        mask_img = mask_pre_img[:, rows:rows*2]
        return mask_img


    def get_holizontal_mask(self):
        self.edge_img = cv2.rotate(self.edge_img, cv2.ROTATE_90_CLOCKWISE)
        mask_img = self.get_vertical_mask()
        self.edge_img = cv2.rotate(self.edge_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
        mask_img = cv2.rotate(mask_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
        return mask_img

# コード

In [None]:
fnames = sorted(glob.glob("/content/predict_img/*/*_input_base.jpg"))

In [None]:
for fname in fnames:
    img = cv2.imread(fname)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # 深度推定large
    input_batch = transform(img).to(device_large)
    with torch.no_grad():
        prediction = midas_large(input_batch)

        prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
        ).squeeze()

    depth_large = prediction.cpu().numpy()

    # 深度推定hybrid
    input_batch = transform(img).to(device_hybrid)
    with torch.no_grad():
        prediction = midas_hybrid(input_batch)

        prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
        ).squeeze()

    depth_hybrid = prediction.cpu().numpy()


    # 柵が太いか判定（柵が太い場合は後述のマスク処理がスキップされる）
    depth_bit_img = np.zeros_like(depth_large, dtype="uint8")

    # 深度推定の画像の2値化
    depth_bit_img[depth_large < 20] = 0
    depth_bit_img[depth_large > 20] = 255

    # グレースケール画像と深度推定の2値化画像で覆われた領域の分散を計算する
    # 太い柵の場合は柵を覆うように深度推定の2値化画像ができるため、その領域のグレースケール画像の分散は小さくなる。
    # 細い柵の場合は柵以外の領域も深度推定の2値化画像で覆われてしまうため、分散が大きくなる。
    # ある分散の閾値今回は経験的に505以下のとき太い柵とする。
    gray_img = cv2.imread(fname, cv2.IMREAD_GRAYSCALE)
    fence_thick_flag = judge_fence_thick(depth_bit_img, gray_img)

    if fence_thick_flag:
        kernel = np.ones((5,5), np.uint8)
        depth_bit_img = cv2.dilate(depth_bit_img, kernel, iterations=10)
        mask_img = depth_bit_img
        cv2.imwrite(fname.replace(".jpg", "_mask_img.png"), mask_img)
        print("The fence is thick.")
    else:
        # 深度推定後の画像のエッジ画像
        blur_sigma = 5
        kernel = np.array([[1, 1, 1], [1, -8, 1], [1, 1, 1]])  # 8近傍ラプラシアンフィルタ

        # 深度推定large
        depth_large_norm = cv2.normalize(depth_large, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        depth_large_norm = cv2.GaussianBlur(depth_large_norm, (311, 311), blur_sigma)
        edge_large = signal.convolve(depth_large_norm, kernel, mode="same")

        # 深度推定hybrid
        depth_hybrid_norm = cv2.normalize(depth_hybrid, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        depth_hybrid_norm = cv2.GaussianBlur(depth_hybrid_norm, (311, 311), blur_sigma)
        edge_hybrid = signal.convolve(depth_hybrid_norm, kernel, mode="same")

        # エッジ画像を低画質で保存（計算高速化とmatplotlibの優れた画像補間（interaption=None）を利用するため）
        # 302で割ると丁度縦横半分の画質になる
        fig_row = edge_large.shape[1] / 302
        fig_col = edge_large.shape[0] / 302

        # 深度推定largeのedge画像を保存
        fig, ax = plt.subplots(1, 1, dpi=100, figsize=(fig_row, fig_col))
        ax.imshow(edge_large, interpolation=None, cmap="gray", vmin=-0.001, vmax=0.001)
        ax.axis("off")
        plt.savefig("edge_large.jpg", bbox_inches='tight', pad_inches=0)
        plt.close()

        # 深度推定hybridのedge画像を保存
        fig, ax = plt.subplots(1, 1, dpi=100, figsize=(fig_row, fig_col))
        ax.imshow(edge_hybrid, interpolation=None, cmap="gray", vmin=-0.001, vmax=0.001)
        ax.axis("off")
        plt.savefig("edge_hybrid.jpg", bbox_inches='tight', pad_inches=0)
        plt.close()

        # 保存したedge画像を開く
        edge_large_img = cv2.imread("edge_large.jpg", cv2.IMREAD_GRAYSCALE)
        edge_hybrid_img = cv2.imread("edge_hybrid.jpg", cv2.IMREAD_GRAYSCALE)

        # 深度推定largeのエッジ画像を白黒反転して0-1に2値化
        edge_large_bit_img = np.zeros_like(edge_large_img, dtype="uint8")
        edge_large_bit_img[edge_large_img < 127] = 1

        # 深度推定hybridのエッジ画像を白黒反転して0-1に2値化
        edge_hybrid_bit_img = np.zeros_like(edge_hybrid_img, dtype="uint8")
        edge_hybrid_bit_img[edge_hybrid_img < 127] = 1

        # 深度推定large
        mask_large = Mask(edge_large_bit_img)
        mask_large_img = mask_large.get_mask()

        # 深度推定hybrid
        mask_hybrid = Mask(edge_hybrid_bit_img)
        mask_hybrid_img = mask_hybrid.get_mask()

        # 深度推定largeのマスクと深度推定hybridのマスクの結合
        mask_img = mask_large_img + mask_hybrid_img
        mask_img[mask_img > 1] = 1

        # mask画像をinput_baseに合わせて拡大
        mask_img = cv2.resize(mask_img, (img.shape[1], img.shape[0]))
        mask_img[mask_img == 1] = 255  # 白(255)に置換

        # マスク画像の保存（マスク画像の拡張子はpngでないと上手く動作しない）
        cv2.imwrite(fname.replace(".jpg", "_mask_img.png"), mask_img)

    # LaMaによる画像修復
    path_name = fname.replace("/" + fname.split("/")[-1], "")
    print(f"inpainting {path_name}")
    if '.jpeg' in fname:
        !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$path_name outdir=/content/inpainting_img dataset.img_suffix=.jpeg > /dev/null
    elif '.jpg' in fname:
        !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$path_name outdir=/content/inpainting_img  dataset.img_suffix=.jpg > /dev/null
    elif '.png' in fname:
        !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$path_name outdir=/content/inpainting_img  dataset.img_suffix=.png > /dev/null
    else:
        print(f'Error: unknown suffix .{fname.split(".")[-1]} use [.png, .jpeg, .jpg]')

  "See the documentation of nn.Upsample for details.".format(mode)


inpainting /content/predict_img/scene_m_0002
100% 1/1 [00:02<00:00,  2.33s/it]
inpainting /content/predict_img/scene_m_0003
100% 1/1 [00:02<00:00,  2.27s/it]
inpainting /content/predict_img/scene_m_0021
100% 1/1 [00:02<00:00,  2.33s/it]
inpainting /content/predict_img/scene_m_0024
100% 1/1 [00:02<00:00,  2.27s/it]
inpainting /content/predict_img/scene_m_0026
100% 1/1 [00:02<00:00,  2.33s/it]
The fence is thick.
inpainting /content/predict_img/scene_m_0033
100% 1/1 [00:02<00:00,  2.24s/it]
inpainting /content/predict_img/scene_m_0046
100% 1/1 [00:02<00:00,  2.31s/it]
inpainting /content/predict_img/scene_o_0001
100% 1/1 [00:02<00:00,  2.27s/it]
The fence is thick.
inpainting /content/predict_img/scene_o_0019
100% 1/1 [00:02<00:00,  2.26s/it]
inpainting /content/predict_img/scene_o_0022
100% 1/1 [00:02<00:00,  2.31s/it]
inpainting /content/predict_img/scene_u_0019
100% 1/1 [00:02<00:00,  2.28s/it]
inpainting /content/predict_img/scene_u_0020
100% 1/1 [00:02<00:00,  2.32s/it]
inpainting /

In [None]:
# zip化してファイルを取得（google colabの左のフォルダマークの縦に3つドットが書かれているところをクリックすると保存できます。）
!zip -r /content/inpainting_img.zip /content/inpainting_img
!zip -r /content/predict_img.zip /content/predict_img

  adding: content/inpainting_img/ (stored 0%)
  adding: content/inpainting_img/scene_m_0026_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_m_0033_input_base_mask_img.png (deflated 2%)
  adding: content/inpainting_img/scene_u_0082_input_base_mask_img.png (deflated 2%)
  adding: content/inpainting_img/scene_o_0019_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_u_0060_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_u_0084_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_m_0002_input_base_mask_img.png (deflated 2%)
  adding: content/inpainting_img/scene_u_0020_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_m_0024_input_base_mask_img.png (deflated 3%)
  adding: content/inpainting_img/scene_m_0046_input_base_mask_img.png (deflated 1%)
  adding: content/inpainting_img/scene_m_0003_input_base_mask_img.png (deflated 2%)
  adding: content/inpainting_i