There are two types of experiments:
- comparing MSE of two-stage polynomial interpolation estimator under different clustering strategies for a particular network; See Figures 4, 11, 12, and 14
- comparing bias, variance or MSE of two-stage polynomial interpolation estimator against different estimators for the Total Treatment Effect (TTE); see Figures 2,8,9,10, and 13

The files `run_compare_clusterings_experiment.py` and `run_compare_estimators_experiment.py` are to be used with the real-world networks.
The Lattice network has its own folder and its own file for running the comparing clusterings experiment (see "compare_clusters.py" in the Lattice folder). Note we have not implemented a script to run the "compare estimators" experiment for a Lattice network.


# Comparing Clusterings Experiment

The `run_compare_clusterings_experiment.py` file requires a file called `compare_estimators.json` (which is unique to each real-world network) and a file called `data.pkl` which is also unique to each real-world network. The script generates data and saves it into a file called `compare_estimators.pkl` in the Experiment folder corresponding to the network. 

For information on what the `data.pkl` file contains, please refer to the Jupyter notebook titled "preparing_network_data."

The JSON file `compare_estimators.json` has the following format
```
{ 
    "name" : "compare_clusterings", 
    "network" : "Amazon",
    "input" : "Network/data.pkl",
    "vary" : {
        "nc" : [250],
        "beta" : [2,3]
    },
    "fix" : {
        "p" : 0.1
    },
    "replications" : 1000
}
```

- The "name" parameter says what experiment this file is for
- The "network" parameter says which network this is for 
- The "input" parameter has the path to the data.pkl file
- The "vary" parameter contains a dictionary of parameters the experiment may vary over, for example if you wanted to run experiments with $\beta=1$ but for different cluster sizes like $50$, $100$, and $250$ clusters, you would have:
    - "nc" : [50, 100, 250]
    - "beta" : [1]
- The "fix" parameter contains a dictionary of values for parameters that should remain fixed throughout the experiment, for example we may want the overall treatment budget $p$ to be fixed at the value 0.1.
- The "replications" parameter is an integer corresponding to how many replications of the randomized design you want to run
    - for example, the expected value is computed by averaging the results over this number of replications and the experimental standard deviation is calculated by taking the square root of the experimental
variance over all replications

# Demo: Amazon Network

For the purposes of demonstration, files that include "DEMO" in their filename are distinct from the files used to generate data/figures/tables for the paper.

In [None]:
import json
import pickle
import experiment_python_scripts.run_compare_clusterings_experiment as cluster_exp

my_path = "Amazon/Experiments/DEMO/compare_clusterings_DEMO.json" # path to .json file for the experiment
jf = open(my_path,'rb')
j = json.load(jf)
jf.close()

exp_name = j["name"]
network_folder = j["network"]
in_file = j["input"]

print("Loading Graph")

nf = open(network_folder + "/" + in_file,'rb')
G,Cls = pickle.load(nf)
nf.close()

fixed = j["fix"]
varied = j["vary"]
r = j["replications"]

data = cluster_exp.run_experiment(G,Cls,fixed,varied,r)

out_file = network_folder + "/Experiments/DEMO/" + exp_name + "_DEMO.pkl"
print(f"Writing output to {out_file}")
of = open(out_file,'wb')
pickle.dump(data,of)
of.close()

Loading Graph
n: 19828
Preparing Clusterings with 250 Clusters

beta = 2
Clustering: feature


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 7 concurrent workers.
[Parallel(n_jobs=-2)]: Done   3 out of  10 | elapsed:   11.3s remaining:   26.5s
[Parallel(n_jobs=-2)]: Done   5 out of  10 | elapsed:   11.5s remaining:   11.5s
[Parallel(n_jobs=-2)]: Done   7 out of  10 | elapsed:   11.5s remaining:    4.9s
[Parallel(n_jobs=-2)]: Done  10 out of  10 | elapsed:   15.7s finished


Clustering: graph


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 7 concurrent workers.
[Parallel(n_jobs=-2)]: Done   3 out of  10 | elapsed:    9.5s remaining:   22.2s
[Parallel(n_jobs=-2)]: Done   5 out of  10 | elapsed:    9.6s remaining:    9.6s
[Parallel(n_jobs=-2)]: Done   7 out of  10 | elapsed:    9.6s remaining:    4.1s
[Parallel(n_jobs=-2)]: Done  10 out of  10 | elapsed:   13.5s finished


Clustering: none


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 7 concurrent workers.
[Parallel(n_jobs=-2)]: Done   3 out of  10 | elapsed:    9.4s remaining:   22.0s
[Parallel(n_jobs=-2)]: Done   5 out of  10 | elapsed:    9.5s remaining:    9.5s
[Parallel(n_jobs=-2)]: Done   7 out of  10 | elapsed:    9.5s remaining:    4.1s


Writing output to Amazon/Experiments/compare_clusterings_DEMO.pkl


[Parallel(n_jobs=-2)]: Done  10 out of  10 | elapsed:   13.4s finished


This created a file called `compare_clusterings_DEMO.pkl` in the Experiment subfolder of the Amazon folder. The file contains data that can be used to plot figures such as  Figures 4, 11, 12, and 14 in the paper. To see plotting demos, refer to the Jupyter notebook "figures_and_tables.ipynb"

The process for the other real-world networks (BlogCatalog and Email) is the same, just pay attention to file names/directories.

# Comparing Estimators Experiment

For the purposes of demonstration, files that include "DEMO" in their filename are distinct from the files used to generate data/figures/tables for the paper.

In [None]:
import json
import pickle
import experiment_python_scripts.run_compare_estimators_experiment as estimator_exp

my_path = "Amazon/Experiments/DEMO/compare_estimators_DEMO.json" # path to .json file for the experiment
jf = open(my_path,'rb')
j = json.load(jf)
jf.close()

exp_name = j["name"]
network_folder = j["network"]
in_file = j["input"]

print("Loading Graph")

nf = open(network_folder + "/" + in_file,'rb')
G,Cls = pickle.load(nf)
nf.close()

fixed = j["fix"]
varied = j["vary"]
r = j["replications"]
gamma = j["gamma"]

data = estimator_exp.run_experiment(G,Cls,fixed,varied,r,gamma)

out_file = network_folder + "/Experiments/DEMO/" + exp_name + "_DEMO.pkl"
print(f"Writing output to {out_file}")
of = open(out_file,'wb')
pickle.dump(data,of)
of.close()

Loading Graph
beta = 2
nc = 250, q = 0.5


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 7 concurrent workers.
[Parallel(n_jobs=-2)]: Done   5 out of  16 | elapsed:  3.5min remaining:  7.8min
[Parallel(n_jobs=-2)]: Done   7 out of  16 | elapsed:  3.5min remaining:  4.5min
[Parallel(n_jobs=-2)]: Done   9 out of  16 | elapsed:  6.1min remaining:  4.7min
[Parallel(n_jobs=-2)]: Done  11 out of  16 | elapsed:  6.4min remaining:  2.9min
[Parallel(n_jobs=-2)]: Done  13 out of  16 | elapsed:  6.7min remaining:  1.5min


Writing output to Amazon/Experiments/compare_estimators_DEMO.pkl


[Parallel(n_jobs=-2)]: Done  16 out of  16 | elapsed:  7.9min finished


This created a file called `compare_estimators_DEMO.pkl` in the Experiment/Demo subfolder of the Amazon folder. The file contains data that can be used to plot figures such as  Figures 2, 8, 9, 10, and 13 in the paper. To see plotting demos, refer to the Jupyter notebook "figures_and_tables.ipynb"