# villo application of attribute scoring
- Environment: colab pro

## Reference
- [SCORE_REPRO.md](https://github.com/Muennighoff/vilio/blob/master/SCORE_REPRO.md)

## Prerequisite (Google Drive)
- Add kaggle.json to MyDrive/vilio
- Add hateful memes challenge data as MyDrive/vilio/hateful_memes.zip
- Add confounders to MyDrive/annotation
    - See `benign_confounder.ipynb` for annotation details

## Protocol
### Prerequisite
- Dependent on which model to run, comment in/out codes in `init / data`, `additional installation`, `inference` subsections

### inference-only
- Run till `additional installation > check gpu version`
  - Just `Run all` should be fine - will stop automatically
- Run the rest
  - Rerun it if stopped in `check gpu version`



## init

In [None]:
import time
t0 = time.time()

In [None]:
# mount
from google.colab import drive
drive.mount('/content/drive')

In [None]:
model = "V"#["O","U","V","E"]

In [None]:
if model in ["O","U","V"]:
  data_path = "/vilio/data/"
else:#E
  data_path = "/vilio/ernie-vil/data/hm/"

### installation

In [None]:
# vilio basics
!cd ../; git clone -b master https://github.com/Muennighoff/vilio.git
# !cd ../vilio/py-bottom-up-attention; pip install -r requirements.txt
# !pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# !cd ../vilio/py-bottom-up-attention; python setup.py build develop

In [None]:
# setup clone-anonymous-github
!git clone https://github.com/fedebotu/clone-anonymous-github
!cd clone-anonymous-github; pip install -r requirements.txt

In [None]:
# main repo
!cd clone-anonymous-github; python src/download.py \
    --url https://anonymous.4open.science/r/MemesModalityEvaluation-2540 \
    --save_dir /

In [None]:
# replace / add required files
!cd /MemesModalityEvaluation-2540/shell; cp vilio_overwrite_scripts.sh /content
!bash ./vilio_overwrite_scripts.sh
!rm -r /MemesModalityEvaluation-2540

In [None]:
# for lmdb feats
!pip install -q kaggle

In [None]:
# added for eps
!pip install kaleido
!apt-get install poppler-utils

### kaggle setup
- [reference](https://www.kaggle.com/general/74235)

In [None]:
!mkdir ~/.kaggle
!cp /content/drive/MyDrive/vilio/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets list

### data
- [kaggle reference](https://qiita.com/k_ikasumipowder/items/1c20d8b68dbc94ab2633)

In [None]:
%%capture
# hateful memes from MyDrive
!mkdir tmp_data
!unzip /content/drive/MyDrive/vilio/hateful_memes.zip -d tmp_data


In [None]:
# hateful memes from MyDrive
!cp -r tmp_data/hateful_memes/* $data_path

In [None]:
# kaggle lmdb
!kaggle datasets download muennighoff/hmfeatureszipfin

In [None]:
# kaggle lmdb
!unzip hmfeatureszipfin.zip -d $data_path

In [None]:
## delete original for memory saving
!rm -r tmp_data

In [None]:
# kaggle lmdb
## delete original for memory saving
!rm hmfeatureszipfin.zip

In [None]:
# confounders
# !mkdir /vilio/data
!cp /content/drive/MyDrive/annotation/confounders.parquet /vilio/data

## additional installation

### for torch 1.6.0
- [cuda 10.2](https://gist.github.com/tzvsi/222b3b22a847004a729744f89fe31255)
- [python 3.6](https://stackoverflow.com/questions/66775948/downgrade-python-version-from-3-7-to-3-6-in-google-colab)

#### python version

In [None]:
%%bash

# MINICONDA_INSTALLER_SCRIPT=Miniconda3-4.5.4-Linux-x86_64.sh
MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_23.1.0-1-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

In [None]:
import sys
# _ = (sys.path.append("/usr/local/lib/python3.6/site-packages"))
_ = (sys.path.append("/usr/local/lib/python3.7/site-packages"))

In [None]:
!python --version
!python3 --version

#### cuda 10.2

In [None]:
!mkdir install ; cd install
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
!mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
!apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
!apt-get update
!apt-get -y install cuda-10-2

In [None]:
!ls -d /usr/local/cuda-*
!which nvcc

In [None]:
import os
p = os.getenv('PATH')
ld = os.getenv('LD_LIBRARY_PATH')
os.environ['PATH'] = f"/usr/local/cuda-10.2/bin:{p}"
os.environ['LD_LIBRARY_PATH'] = f"/usr/local/cuda-10.2/lib64:{ld}"
!nvcc --version

#### O/U/V

In [None]:
%%writefile new_req.txt
sacremoses==0.0.43
pandas==1.1.3
regex==2020.4.4
h5py==2.10.0
filelock==3.0.10
scipy==1.4.1
sentencepiece~=0.1.91
matplotlib==3.2.1
tensorflow==2.3.1
tqdm==4.45.0
numpy==1.18.1
six==1.14.0
packaging==20.1
wandb==0.10.8
psutil==5.7.0
requests==2.23.0
pytorch_lightning==1.0.4
ImageHash==4.1.0
tokenizers~=0.9.2
transformers==3.5.1 # Required due to some imports in the files under src/vilio/transformers
torchvision==0.7.0
jieba==0.42.1
botocore==1.19.8
spacy==2.3.2
boto3==1.16.8
comet_ml==3.2.5
dataclasses==0.6
fairseq==0.9.0
ftfy==5.8
fugashi==1.0.5
ipadic==1.0.0
lmdb==1.0.0
Pillow==8.0.1
py3nvml==0.2.6
pydantic==1.7.2
pythainlp==2.2.4
PyYAML==5.3.1
scikit_learn==0.23.2
tensorboardX==2.1
timeout_decorator==0.4.1
torchcontrib==0.0.2

torch==1.6.0

In [None]:
!pip install -r new_req.txt

### for this module

In [None]:
%%writefile grad_req.txt
einops==0.6.0
dask==2022.2.0
plotly==5.13.1
pyarrow==11.0.0
kaleido==0.2.1
ipykernel==5.5.6
cloudpickle==2.2.1
IPython==7.34.0
transformers==3.5.1
imgkit==1.2.3

In [None]:
!pip install --no-deps -r grad_req.txt

In [None]:
!apt-get update && apt-get install -y wkhtmltopdf && apt-get clean

## inference

### download extracted features

In [None]:
model = "O"#["O","U","V","E"]

In [None]:
# redefine data_path for session restart
if model in ["O","U","V"]:
  data_path = "/vilio/data/"
else:#E
  data_path = "/vilio/ernie-vil/data/hm/"

In [None]:
# tsv features
!kaggle datasets download muennighoff/hmtsvfeats

In [None]:
# tsv features
!unzip hmtsvfeats.zip -d $data_path
## delete original for memory saving
!rm hmtsvfeats.zip

In [None]:
# check list of data
!echo $data_path
!echo =========================
!cd ..$data_path; ls

### download pretrained/fine-tuned model

In [None]:
model_dir = "/vilio/ckpt"
!mkdir $model_dir
# # O
# !mkdir $model_dir/vilioo36;mkdir $model_dir/vilioo50;mkdir $model_dir/vilioov50
# !kaggle datasets download muennighoff/vilioo36
# !unzip -j vilioo36.zip -d $model_dir/vilioo36
# !rm vilioo36.zip
# !kaggle datasets download muennighoff/vilioo50
# !unzip -j vilioo50.zip -d $model_dir/vilioo50
# !rm vilioo50.zip
# !kaggle datasets download muennighoff/vilioov50
# !unzip -j vilioov50.zip -d $model_dir/vilioov50
# !rm vilioov50.zip
# # U
# !mkdir $model_dir/viliou36;mkdir $model_dir/viliou50;mkdir $model_dir/viliou72
# !kaggle datasets download muennighoff/viliou36
# !unzip -j viliou36.zip -d $model_dir/viliou36
# !rm viliou36.zip
# !kaggle datasets download muennighoff/viliou50
# !unzip -j viliou50.zip -d $model_dir/viliou50
# !rm viliou50.zip
# !kaggle datasets download muennighoff/viliou72
# !unzip -j viliou72.zip -d $model_dir/viliou72
# !rm viliou72.zip
# V
!mkdir $model_dir/viliov45;mkdir $model_dir/viliov90;mkdir $model_dir/viliov135
!kaggle datasets download muennighoff/viliov45
!unzip -j viliov45.zip -d $model_dir/viliov45
!rm viliov45.zip
!kaggle datasets download muennighoff/viliov90
!unzip -j viliov90.zip -d $model_dir/viliov90
!rm viliov90.zip
!kaggle datasets download muennighoff/viliov135
!unzip -j viliov135.zip -d $model_dir/viliov135
!rm viliov135.zip

# ES
# !kaggle datasets download muennighoff/vilioe36
# !unzip -j vilioe36.zip vilioes/step_2500train/* -d $model_dir/step_2500train
# !unzip -j vilioe36.zip vilioes/step_2500traindev/* -d $model_dir/step_2500traindev
# !rm vilioe36.zip

In [None]:
!echo $model_dir
!echo =========================
!cd $model_dir;ls

In [None]:
print(f"Installation finished in {time.time()-t0} seconds")

In [None]:
import os
os.kill(os.getpid(), 9)

### check [gpu version](https://blog.paperspace.com/alternative-to-google-colab-pro/)

In [None]:
# https://stackoverflow.com/questions/64526139/how-does-one-get-the-model-of-the-gpu-in-python-and-save-it-as-a-string
import subprocess
import os
def get_mdl():
    line_as_bytes = subprocess.check_output("nvidia-smi -L", shell=True)
    line = line_as_bytes.decode("ascii")
    _, line = line.split(":", 1)
    line, _ = line.split("(")
    return line.strip()

In [None]:
import time
gpu_type = get_mdl()
if "A100" in gpu_type:
  print(f"{gpu_type} detected, killing runtime. Rerun this section from the beginning again")
  time.sleep(5)
  os.kill(os.getpid(), 9)
else:
  print(f"GPU {gpu_type}: proceeding to next section")

In [None]:
!ls -d /usr/local/cuda-*
!which nvcc

In [None]:
p = os.getenv('PATH')
ld = os.getenv('LD_LIBRARY_PATH')
os.environ['PATH'] = f"/usr/local/cuda-10.2/bin:{p}"
os.environ['LD_LIBRARY_PATH'] = f"/usr/local/cuda-10.2/lib64:{ld}"
!nvcc --version

### inference

In [None]:
model_dir = "/vilio/ckpt"

In [None]:
# !echo $model_dir
# # # O
# # !cd /vilio; bash ./bash/inference/O/hm_O.sh \
# #     $model_dir/vilioov50/LASTtrain.pth \
# #     $model_dir/vilioo50/LASTtrain.pth \
# #     $model_dir/vilioo36/LASTtrain.pth
# # !cd /vilio; bash ./bash/inference/O/hm_O_correct_label.sh \
# #     $model_dir/vilioov50/LASTtrain.pth \
# #     $model_dir/vilioo50/LASTtrain.pth \
# #     $model_dir/vilioo36/LASTtrain.pth
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/O O
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/correct_label/O O
# # # U
# # !cd /vilio; bash ./bash/inference/U/hm_U.sh \
# #     $model_dir/viliou72/LASTtrain.pth \
# #     $model_dir/viliou50/LASTtrain.pth \
# #     $model_dir/viliou36/LASTtrain.pth
# # !cd /vilio; bash ./bash/inference/U/hm_U_correct_label.sh \
# #     $model_dir/viliou72/LASTtrain.pth \
# #     $model_dir/viliou50/LASTtrain.pth \
# #     $model_dir/viliou36/LASTtrain.pth
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/U U
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/correct_label/U U
# V
!cd /vilio; bash ./bash/inference/V/hm_V.sh \
    $model_dir/viliov45/LASTtrain.pth \
    $model_dir/viliov90/LASTtrain.pth \
    $model_dir/viliov135/LASTtrain.pth
!cd /vilio; bash ./bash/inference/V/hm_V_correct_label.sh \
    $model_dir/viliov45/LASTtrain.pth \
    $model_dir/viliov90/LASTtrain.pth \
    $model_dir/viliov135/LASTtrain.pth
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/V V
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/correct_label/V V
# # ES
# # !cd /vilio/ernie-vil; bash ./bash/inference/ES/hm_ES36.sh \
# #                             /vilio/ckpt/step_2500train \
# #                             /vilio/ckpt/step_2500traindev
# # !cd /vilio/bash; bash ./micace_evaluator.sh \
# #     /content/drive/MyDrive/vilio/export/ES ES

# EOS