# Make things disappear with XMem and FGT

Sources:
- [ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model: https://github.com/hkchengrex/XMem
- [ECCV 2022] Flow-Guided Transformer for Video Inpainting: https://github.com/hitachinsk/fgt

In [3]:
import os
from os.path import exists as path_exists

In [1]:
try:
    import torch
    import torchvision
except ImportError:
    !pip install torch==1.10.1
    !pip install torchvision==0.11.2

In [2]:
!nvidia-smi

if torch.cuda.is_available():
    print('Using GPU')
    device = 'cuda'
else:
    print('CUDA not available. Please connect to a GPU instance if possible.')
    device = 'cpu'

/usr/bin/sh: 1: nvidia-smi: not found
CUDA not available. Please connect to a GPU instance if possible.


## (a) Load video from YouTube and split into frames
- Source: https://huggingface.co/spaces/YiYiXu/it-happened-one-frame-2

In [25]:
if not path_exists('helper.py'):
    !wget https://huggingface.co/spaces/YiYiXu/it-happened-one-frame-2/raw/main/app.py

In [26]:
try:
    import youtube_dl
except:
    !pip install youtube_dl

In [27]:
from helper import vid2frames

100%|████████████████████████████████████████| 338M/338M [00:01<00:00, 181MiB/s]


Running on local URL:  http://127.0.0.1:7860/



KeyboardInterrupt



In [None]:
youtube_url = 'https://youtu.be/KOnfiFOCwH0' # Trump leaves Argentinean president alone on stage at G20

In [None]:
skip_frames, path_frames = vid2frames(youtube_url)

## Predict Video Segmentation Mask
### `Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model`
- Source: https://colab.research.google.com/drive/1RXK5QsUo2-CnOiy5AOSjoZggPVHOPh1m?usp=sharing#scrollTo=MWGdN7XCSYSm

### Get our code and install pre-requisites

In [5]:
if not path_exists('XMem'):
    !git clone https://github.com/hkchengrex/XMem.git
    !pip install -r XMem/requirements.txt

In [6]:
try:
    import cv2
    import numpy
except ImportError:
    !pip install opencv-python
    !pip install -U numpy

In [None]:
try:
    import detectron2
except ImportError:
    raise ImportError(
        "Please install detectron2. Check "
        "`https://detectron2.readthedocs.io/en/latest/tutorials/install.html` "
        "for installation details."
    )
    !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html -q

### Download the pre-trained model

In [8]:
if not path_exists('XMem/saves/XMem.pth'):
    !wget -P ./XMem/saves/ https://github.com/hkchengrex/XMem/releases/download/v1.0/XMem.pth

### Basic setup

In [None]:
import os
from os import path
from argparse import ArgumentParser
import shutil

import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
import numpy as np
from PIL import Image

from inference.data.test_datasets import LongTestDataset, DAVISTestDataset, YouTubeVOSTestDataset
from inference.data.mask_mapper import MaskMapper
from model.network import XMem
from inference.inference_core import InferenceCore

from progressbar import progressbar

torch.set_grad_enabled(False)

# default configuration
config = {
    'top_k': 30,
    'mem_every': 5,
    'deep_update_every': -1,
    'enable_long_term': True,
    'enable_long_term_count_usage': True,
    'num_prototypes': 128,
    'min_mid_term_frames': 5,
    'max_mid_term_frames': 10,
    'max_long_term_elements': 10000,
}

network = XMem(config, './saves/XMem.pth').eval().to(device)

ModuleNotFoundError: No module named 'clip'

### Preview the video and first-frame annotation
The first frame mask is a PNG with a color palette.

In [None]:
from IPython.display import HTML
from base64 import b64encode
data_url = "data:video/mp4;base64," + b64encode(open(video_name, 'rb').read()).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

In [None]:
import IPython.display
IPython.display.Image('first_frame.png', width=400)