In [None]:
# Install condacolab
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
# Create a Python 3.7 environment
!conda create -n scene_graph_env python=3.7 -y

# Install PyTorch 1.4.0 and torchvision 0.5.0 using pip
!conda run -n scene_graph_env pip install torch==1.4.0 torchvision==0.5.0 -f https://download.pytorch.org/whl/cu101/torch_stable.html

# Verify the installation
!conda run -n scene_graph_env python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); import torchvision; print(f'torchvision version: {torchvision.__version__}')"

In [None]:
# Install other required packages
!conda run -n scene_graph_env pip install ninja yacs cython matplotlib tqdm opencv-python overrides scipy h5py ipython gdown

# hardware and OS
To run this repo, you need 1 to 8 GPUs with CUDA. If you don't have a Nvidia Graphic card, no problem! This notebook is intended to run on Google Colab!

Don't forget to activate the GPU in **Goolgle Colab** ([how to](https://jovianlin.io/pytorch-with-gpu-in-google-colab/)) If you do not enable GPU from the beggining then you will have to restart te setup from beggining

If using **Kaggle kernel** do not forget to enable internet

**OS:** It is almost impossible to run this repo on Windows (believe me, I tried hard) because of many dependencies that are made for Linux. So better continue with Linux.

# Environment setup
before installing anything related to this project, first you need a version of PyTorch compatible with your GPU settings

with the 2 line script below you can check if your environment is ok.

In [None]:
! wget --quiet "https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py"
! python collect_env.py

Collecting environment information...
PyTorch version: 1.5.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration: 
GPU 0: Tesla K80
GPU 1: Tesla K80

Nvidia driver version: 430.26
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.5.0+cu101
[pip3] torchvision==0.6.0+cu101
[conda] mkl                       2018.0.1             h19d6760_4  
[conda] mkl-service               1.1.2            py36h17a0993_4  
[conda] numpy                     1.18.4                   pypi_0    pypi
[conda] numpydoc                  0.7.0            py36h18f165f_0  
[conda] torch                     1.5.0+cu101              pypi_0    pypi
[conda] torchvision               0.6.0+cu101              pypi_0    p

- first, check that your computer detect your graphic card ``GPU 0: ...``
- second, you need a driver to use your GPU for any application: ``Nvidia driver version: ...``
- third, you need some libraries to run any data science algorith on your GPU: ``cuDNN version: ...`` and ``CUDA runtime version: ...``
- fourth, you need to install Pytorch after having all the things above so that Pytorch is built using the CUDA library: ``PyTorch version: 1.5.0+cu101`` and ``CUDA used to build PyTorch: 10.1``
- Finally, if your are on windows, you might also need ``Microsoft Visual C++ 14.0``

If any of these are missing, the project setup will probably fail somewhere. Below are the instruction to fix your environment.

## Setup


In [None]:
! apt-get update
! pip install --upgrade pip

! apt-get install linux-headers-$(uname -r)
! apt-get -y install cmake

! conda update conda
! conda update conda-build

In [None]:
!gcc --version

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



## Driver



In [None]:
# this command must work otherwise check the symLink /usr/local/cuda
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105


## CUDA
Note that each version of CUDA has a minimum requirement concerning the version of the driver


cuda toolkit: https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
example of thing you should install XD
```
wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.105_418.39_linux.run
sh cuda_10.1.105_418.39_linux.run
```

## libnccl
Nvidia NCCL can be downloaded from: https://developer.nvidia.com/nccl/nccl-download (you need to create a free account)

## libcudnn
do not download ``libcudnn-dev``

then install it using this command ``dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb``


## Pytorch
Get the command line that fits your hardware on this web site: https://pytorch.org/get-started/locally/
For example you should run something like the line below

In [None]:
! pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
# now run the command below to see if Pytorch detects your GPUs
import torch
torch.cuda.is_available()

True

# Install the requirements
[github source](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch)

## Python libraries
Might take 5 to 10 mins to install all the rest.

In [None]:
! pip install ipython scipy h5py

In [None]:
! pip install ninja yacs cython matplotlib tqdm opencv-python overrides

ends with `Successfully installed ninja-1.9.0.post1 overrides-3.0.0 yacs-0.1.7`

## install PyCOCO tools (cocoapi)

In [None]:
! git clone https://github.com/cocodataset/cocoapi.git

Cloning into 'cocoapi'...
remote: Enumerating objects: 975, done.[K
remote: Total 975 (delta 0), reused 0 (delta 0), pack-reused 975[K
Receiving objects: 100% (975/975), 11.72 MiB | 3.13 MiB/s, done.
Resolving deltas: 100% (576/576), done.
Checking connectivity... done.


In [None]:
! cd cocoapi/PythonAPI; python setup.py build_ext install

ends with

```
Finished processing dependencies for pycocotools==2.0
```



## install apex
it is a PyTorch extension for easy mixed precision and distributed training in Pytorch

### method 1

In [None]:
! git clone https://github.com/NVIDIA/apex.git
! cd apex ; python setup.py install --cuda_ext --cpp_ext

### method 2
https://stackoverflow.com/questions/57284345/how-to-install-nvidia-apex-on-google-colab

In [None]:
%%writefile setup.sh

git clone https://github.com/NVIDIA/apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex

Overwriting setup.sh


In [None]:
!sh setup.sh

In [None]:
!pip install -v --no-cache-dir ./apex

ends with
```
Successfully installed apex-0.1
Cleaning up...
Removed build tracker '/tmp/pip-req-tracker-yl9p9317'
```

"A Python-only build" omits:
* Fused kernels **required** to use apex.optimizers.FusedAdam.
* Fused kernels **required** to use apex.normalization.FusedLayerNorm.
* Fused kernels that improve the performance and numerical stability of apex.parallel.SyncBatchNorm.
* Fused kernels that improve the performance of apex.parallel.DistributedDataParallel and apex.amp. DistributedDataParallel, amp, and SyncBatchNorm will still be usable, but they may be slower.

AMP: Automatic Mixed Precision: https://nvidia.github.io/apex/amp.html

## install PyTorch Detection (Scene-Graph-Benchmark.pytorch)
We change the file name from `Scene-Graph-Benchmark.pytorch` to `Scene`
because *ninja* can not handel certain characters in the directory's name
([source](https://stackoverflow.com/questions/54569963/error-building-depfile-has-multiple-output-paths-ninja-build-stopped-subcomm ))

In [None]:
! git clone https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch.git Scene

Cloning into 'Scene-Graph-Benchmark.pytorch'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 462 (delta 1), reused 0 (delta 0), pack-reused 451[K
Receiving objects: 100% (462/462), 26.17 MiB | 12.90 MiB/s, done.
Resolving deltas: 100% (150/150), done.
Checking connectivity... done.


In [None]:
# you can also rename it using this cmd
# ! mv Scene-Graph-Benchmark.pytorch Scene

In [None]:
! cd Scene; python setup.py build develop

running build
running build_py
running build_ext
building 'maskrcnn_benchmark._C' extension
Emitting ninja build file /root/Scene/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc -DWITH_CUDA -I/root/Scene/maskrcnn_benchmark/csrc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /root/Scene/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.cu -o /root/Scene/build/temp.linux-x86_64-3.7/root/Scene/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options 

ends with:


```
Installed /content/Scene
Processing dependencies for maskrcnn-benchmark==0.1
Finished processing dependencies for maskrcnn-benchmark==0.1
```
otherwise you can get:
~~~
RuntimeError: Error compiling objects for extension
~~~


The line above might not work for many reasons:
* ninja is not installed
* the folder name contains crasy characters like space, points
* other reasons

follow carefully the instructions above to avoid any problem :)

### Additionnal instructions for Windows:
You might incounter this issue https://github.com/facebookresearch/maskrcnn-benchmark/issues/547
then you might need to run that first:
```
set "VS150COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build"
set CMAKE_GENERATOR=Visual Studio 16 2019 Win64
set DISTUTILS_USE_SDK=1
call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11
python setup.py build develop
call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat"  x64 -vcvars_ver=14.0
```

# DATASET
## VG images
Download the VG images part1 (9 Gb) part2 (5 Gb).

In [None]:
! wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip

In [None]:
! wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip

--2020-06-04 13:58:39--  https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip
Resolving cs.stanford.edu (cs.stanford.edu)... 171.64.64.64
Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5471658058 (5.1G) [application/zip]
Saving to: ‘images2.zip’


2020-06-04 14:28:57 (2.87 MB/s) - ‘images2.zip’ saved [5471658058/5471658058]



Please Extract these images to the directory datasets/vg/VG_100K/

If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`

## SGG model

The following code does this automatically:

Download the [scene graphs](https://onedrive.live.com/embed?cid=22376FFAD72C4B64&resid=22376FFAD72C4B64%21779871&authkey=AA33n7BRpB1xa3I) to `Scene/datasets/vg/VG-SGG-with-attri.h5` (144 Mb)

or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catelog.py`

In [None]:
!pip install gdown

Collecting gdown
  Downloading gdown-3.11.0.tar.gz (8.6 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
Collecting filelock
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Building wheels for collected packages: gdown
  Building wheel for gdown (PEP 517) ... [?25ldone
[?25h  Created wheel for gdown: filename=gdown-3.11.0-py3-none-any.whl size=9619 sha256=184ea18d44683eac553a1b7705ea72a9ec3cf61445acecbfd08fdcf3912a97bb
  Stored in directory: /root/.cache/pip/wheels/05/e6/10/9cbfea8dcf9fde0f406da1e4c71d5c3cf3c99e0502d7f08ac6
Successfully built gdown
Installing collected packages: filelock, gdown
Successfully installed filelock-3.0.12 gdown-3.11.0


In [None]:
# alternatively
! gdown "https://drive.google.com/uc?id=1h2XzeQgJNYgg3q66t1oujofbWvBIYMXG" -O /content/Scene/maskrcnn_benchmark/data/datasets/vg/VG-SGG-with-attri.h5
# miror link for manual download https://onedrive.live.com/embed?cid=22376FFAD72C4B64&resid=22376FFAD72C4B64%21779871&authkey=AA33n7BRpB1xa3I


In [None]:
! ls -l --block-size=M "/content/Scene/maskrcnn_benchmark/datasets/vg"
# file size should be 144M

## Pretrained models

do not name any directory `checkpoints` with an ``s`` because you will not be able to explore it using jupyter notebook

In [None]:
%%bash
mkdir Scene/checkpoint/pretrained_faster_rcnn/
cd Scene/checkpoint/pretrained_faster_rcnn/
gdown "https://drive.google.com/uc?id=1GoUdVlwZ8ekS7w_aWJ-tcXsx-ULCCjyI" -O log.txt
gdown "https://drive.google.com/uc?id=1Pj8gfFBouqaKzJVkOV6wsY8GU60z6Nrb" -O config.yml
gdown "https://drive.google.com/uc?id=1TRT3uX0tbqvIfNeL3bRGzVeqKS8SHtFa" -O model_final.pth
gdown "https://drive.google.com/uc?id=1Y1SnKGeQCBqGmIpUa8izy2EvUYLyPc89" -O VG_stanford_filtered_wth_attribute_train_statistics.cache
gdown "https://drive.google.com/uc?id=1_aRGThcciCvg0gFLr9EkhLIE92vEitfP" -O labels.json
gdown "https://drive.google.com/uc?id=1q6w_tZzhKTx70hgmQ-7Rnlt4EM60MmXp" -O last_checkpoint

If the links above are dead you can download and extract the files manually following this:

- [download the Faster R-CNN model](https://onedrive.live.com/embed?cid=22376FFAD72C4B64&resid=22376FFAD72C4B64%21779870&authkey=AH5CPVb9g5E67iQ),
- extract all the files to the directory `/home/username/checkpoints/pretrained_faster_rcnn`.

To train your own Faster R-CNN model, please follow [the next section](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch#pretrained-models).


# Setup done !
# Training and Testing
## Settings
The default settings are under

`configs/e2e_relation_X_101_32_8_FPN_1x.yaml` ([see on github](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/db02790a60bb9b9f7c270352820968b2f2089469/configs/e2e_relation_X_101_32_8_FPN_1x.yaml#L74))
and
`maskrcnn_benchmark/config/defaults.py` (todo find link in github)

The priority is in this order `command > yaml > defaults.py`

* For Predicate Classification (PredCls), we need to set:
```
MODEL.ROI_RELATION_HEAD.USE_GT_BOX True
MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True
```

* For Unbiased-Causal-TDE Model:
```
MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor
```

## Example 1 : (PreCls, Motif Model)
### Training Example 1 : (PreCls, Motif Model)

`CUDA_VISIBLE_DEVICES=0,1` <-- will use GPU 0 and 1

`python -m torch.distributed.launch` <-- will run the script across multiple GPUs

`--master_port 10025` <-- the value itself is not important, just use a free port

`--nproc_per_node=2` <-- [{num_gpus}](https://docs.fast.ai/distributed.html): this should correspond to the number of gpu you specified up


`tools/relation_train_net.py` <-- the script to run by the "torch.distributed". Mainly we want to train the model, or resume the training
`--config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml"` <-- Config file for all what we didn't specify
`MODEL.ROI_RELATION_HEAD.USE_GT_BOX True ` <--
`MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True ` <-- This means that the ground truth object labels are provided as input to the model.

`MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor ` <-- round truth bounding boxes are provided as input to the model.

`SOLVER.IMS_PER_BATCH 12` <--        **this number must be divisible by the number of GPUs (2) used.**

`TEST.IMS_PER_BATCH 2` <--            **must be equal to the number of GPUs (2) used.**

`DTYPE "float16" ` <--

`SOLVER.MAX_ITER 50000 ` <-- number of epoch, there is also EarlyStopping implemented

`SOLVER.VAL_PERIOD 2000 ` <-- run validation every 2 000 epochs

`SOLVER.CHECKPOINT_PERIOD 2000 ` <-- create a checkpoint every 2000 epochs (1 hour or more)

`GLOVE_DIR /home/kaihua/glove ` <-- directory where the pretrained word embeddings will be downloaded and stored

`MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoint/pretrained_faster_rcnn/model_final.pth ` <--

`OUTPUT_DIR /home/kaihua/checkpoints/motif-precls-exmp` <-- where the model is saved. if the directory is not empty then the training is automatically resumed. So you can for example stop training and add more GPUs



In [None]:
! cd "Scene/"; python -m torch.distributed.launch --master_port 10025 --nproc_per_node=8 \
tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
MODEL.ROI_RELATION_HEAD.USE_GT_BOX True \
MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True \
MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor \
SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" \
SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 \
GLOVE_DIR glove/ \
MODEL.PRETRAINED_DETECTOR_CKPT checkpoint/pretrained_faster_rcnn/model_final.pth \
OUTPUT_DIR checkpoint/motif-precls-exmp

### Test

### Test Example 1 : (PreCls, Motif Model)
Better use only one GPU for testing
~~~
CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch
--master_port 10027
--nproc_per_node=1
~~~

`tools/relation_test_net.py` <-- this line is the only that change

~~~
--config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml"
MODEL.ROI_RELATION_HEAD.USE_GT_BOX True
MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True
MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor
~~~
`TEST.IMS_PER_BATCH 1` <----------------------------- must be equal to nproc_per_node
~~~
DTYPE "float16"
GLOVE_DIR /home/kaihua/glove
MODEL.PRETRAINED_DETECTOR_CKPT checkpoints/motif-precls-exmp
OUTPUT_DIR checkpoints/motif-precls-exmp
~~~

In [None]:
# evaluation
! cd "Scene/"; python -m torch.distributed.launch --master_port 10027 --nproc_per_node=8 tools/relation_test_net.py \
--config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
MODEL.ROI_RELATION_HEAD.USE_GT_BOX True \
MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True \
MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor \
TEST.IMS_PER_BATCH 32 DTYPE "float16" \
GLOVE_DIR glove/ \
MODEL.PRETRAINED_DETECTOR_CKPT checkpoint/pretrained_faster_rcnn/model_final.pth \
OUTPUT_DIR checkpoint/motif-precls-exmp

## Train and test Example 2 : (SGCls, Causal, TDE, SUM Fusion, MOTIFS Model)

In [None]:
!  python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 \
tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \
MODEL.ROI_RELATION_HEAD.USE_GT_BOX True \
MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False \
MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor \
MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none
MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum \
MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs \
SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE "float16" \
SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 \
GLOVE_DIR /home/kaihua/glove
MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/pretrained_faster_rcnn/model_final.pth
OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgcls-exmp

In [None]:
# evaluation
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10028 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /home/kaihua/glove MODEL.PRETRAINED_DETECTOR_CKPT /home/kaihua/checkpoints/causal-motifs-sgcls-exmp OUTPUT_DIR /home/kaihua/checkpoints/causal-motifs-sgcls-exmp