# Experiment

## Graph Classification
 * dataset: MUTAG, PTC_MR, PROTEINS, NCI1, DD, IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, COLLAB
 * model: 6L with #fc-neuron=128,256 and drop-rate=0.5,0.8,0.9
     - should be changed manually in `run-ego-cnn.py`
 * [default] preprocess: relabel graph(set `-s`) + node/edge labels

## Scale-Free Regularizer
 * dataset: REDDIT-BINARY
 * model: 6L, 2L, 6L-SF with #fc-neuron=256 and drop-rate=0.8
 * preprocess
     * do not relabel graph(unset `-s`)

## Visualizing Critical Structures
 * dataset: Alkane-vs-Alcohol, Asym-vs-Sym, REDDIT-BINARY
 * model: 6L, 6LAttention with #fc-neuron=256 and drop-rate=0.8
 * preprocess
     * must not relabel graph (unset `-s`)
     * must not include node/edge labels (hardcode in related files)

## Detailed Choices (all options)
 * [v]: default setting and is used in the submission to ICML'19
 * [switch]: the switch between options is implemented and can be controlled by setting the arguments/flags in command line
 * [unused]: experiments that are not chosen to present in paper

### Preprocess Settings
 * [switch] Choice 1. relabeling graphs?
     * [v] don't do anything => for visualization experiment
     * [v] sort all vertex by degree => for other experiment
 * Choice 2. selection of neighbors
     * [v] 1-hop neighbors
     * BFS selection upto k
     * [unsued] consecutive nodes in 1D list for Convolution
 * Choice 3. ordering of neighbors
    * 3-1. according to which criterion?
         * degree
         * [v] 1-WL: label all nodes by running Weisfeiler-Lehman algorithm for 1 run, and calculate #occurence of each 1-WL labels
    * 3-2 sorting order?
         * larger value first
         * [v] smaller value first
         
### Model Settings
 * Choice 1. Patchy-San layer
    * 1-1. [switch] k: the size of neighborhood
        * [v] k=10
    * 1-2. consider node/edge labels?
        * [v] no => for visualization experiments
        * [v] yes if it's provided => for other experiments
 * Choice 2. Ego-Convolution
     * 2-1 [switch] k: # of nodes in a receptive field (self + neighbors)
        * k=9
        * [v] 17
 * Choice 3. Classificaion Layers
     * 3-1. # of layer: 2 fc layers
     * 3-2. # of neurons between the 2 fc layers
         * [v] 128: mostly better
         * 256: for social network datasets
 * Choice 4. Regularization
     * 4-1 Dropout rate
         * 0.5
         * [v] 0.8
         * 0.9
     * 4-2 Batch Normalization
         * [v] yes
         * no
 * [unused] Choice 5. Convolution
     * 5-1. k: # of consecutive nodes in a receptive field
         * same as Ego-Convolution
     * 5-2. stride: #nodes to skip in 1-D list
         * stride=1
         
### Visualization Settings
 * Choice 0. preprocess
     * [v] 0-1. must not relabel graphs [do not set `-s` for `preprocess.py`]
     * [v] 0-2. must not include node/edge labels [do not let Patchy-San include it]
 * Choice 1. fixed embedding layers while retrain the visualization model?
     * [v] yes, fixed Patchy-San and Ego-Convolution layers
 * Choice 2. Classification layers
     * [v] 1 layer => for REDDIT-BINARY, Asym-Sym
     * [v] 2 layers => for Alk-Alc

### Plot Settings
 * Choice 1. threshold to choose importance neighborhoods
     * 0.7
     * [v] 0.8
     * 0.9
 * Choice 2. threshold to identify edges as critical in a neighborhood
     * [v] > 0
     * 0.5

### Graph Visualization Tool - [Gephi 0.9.0](https://gephi.org/users/download/)
 * Force Atlas 2 gives better separation of nodes => less manual nodes dragging
 * By default, the properties of node/edges are not loaded, except the edge-size
     * node-color, node-size, and edge-color must be set manually
     * click "Appearance" -> node/edge -> "color"/"size" bar -> "Partition" by color/size
 * To generate figures for paper, draw critical node/edges in `#696969` and normal node in `#FFFFFF` and edge in `#000000`.

In [None]:
"""Monitor the experiment(10-CV) performance"""
import os
import _pickle as cPickle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import networkx as nx
import tensorflow as tf
#os.environ["CUDA_VISIBLE_DEVICES"]='0'
%matplotlib inline

def show_cv(name, size=None, fold=10):
    fname = '{}_cv_hist.pkl'.format(name)
    if os.path.exists(fname):
        hist = cPickle.load(open(fname, 'rb'))
        tacc = hist['test_acc']
        if len(tacc)==fold:
            tacc = np.mean(tacc)
            print('{}: test={:.3f}, val={:.3f} takes {:.3f}s'.format(name, tacc, np.mean(hist['cv_vacc']), np.mean(np.array(hist['time']))))
        else:
            print('{}: test={}, val={:.3f} takes {:.3f}s'.format(name, tacc, np.mean(hist['cv_vacc']), np.mean(np.array(hist['time']))))