This folder contains a minimal working example (MWE) for ACNet.
We first describe the core code for ACNet. Then, we describe how to run experiments for one (sub)-experiment for each of the 3 subsections in Section 5 and should be sufficient for most purposes.
Code for other datasets are described in the final section.
Plotting of graphs was done by separately sampling and evaluating pdfs in ACNet, followed by calling pgfgen. These components are not include in this MWE.
The main files are main.py
and dirac_phi.py
.
main.py
This contains the code performing the generation of the computational
graph, Newton's root finding method, bisection method (for root finding in the
case of conditional sampling), as well as the implementation for the
gradients of inverses.
dirac_phi.py
This contains the code defining the specific structure of phi_NN.
Dependencies: pytorch numpy matplotlib scipy optional: sacred, used for monitoring and keeping track of experiments optional: scikit-learn, used only to download the boston housing dataset.
Other files are:
phi_listing.py
Contains definitions for other phi's for the commonly used copula. Used to generate
synthetic data.
The rest of the files contain boilerplate code and helper functions.
We asume that dependencies are installed correctly.
To generate synthetic data drawn from the Clayton distribution, first run:
python -m gen_scripts.clayton -F gen_clayton
This will generate synthetic data in the pickle file claytonX.p
.
If this is your first run, X=1.
Next, navigate to train_scripts/clayton.py. In the cfg() function, modify the data_name accordingly. If this is your first run, the default should be alright. Then, run:
python -m train_scripts.clayton -F learn_clayton
The experiment should run, albeit with slightly different values as us due to slight differents in randomness. The network should converge after around 10k (training) iterations.
After every fixed interval, samples will be drawn from the learned network and plotted
in the /sample_figs
folder.
The boston housing dataset will automatically be downloaded using scikit learn. Simply run
python -m train_scripts.boston.train -F boston_housing
to run the experiment. Sampled points will be found in /sample_figs
in the appropriate
folder.
Note that since the dataset is fairly small, results may vary significantly between runs. You may change
the train/test split in train_scripts/boston/train.py
directly to get varying results. In our experiments
we used 5 different seeds and took averages.
Typically, convergence occurs at around 10k epochs. Note that because the dataset is so small, in some settings, test loss will be better than training loss. This is not a bug.
Generate the data from the Clayton copula as shown in the first section. Simply run
python -m train_scripts.clayton_noisy -F learn_clayton_noisy
As before, samples from the learned distribution will be periodically saved in /sample_figs
.
Note that the training loss being reported are the log probabilities and not the log likelihoods. However, the test loss is based on log likelihoods, in order to facilitate comparison with the non-noisy case (Experiment 1).
In order to change the magnitude of noise, modify the variable width_noise
in train_scripts/clayton_noisy.py
.
This coressponds to \lambda in the paper.
The same steps may be followed (replace clayton
by frank
and joe
where appropriate).
The INTC-MSFT dataset may be obtained here. The data was analyzed in Chapter 5 of McNeil, Frey and Embrechts (2005).
For convenience, processed data is included in the folder /data/rdj
. Training using ACNet
may be done by running:
python -m train_scripts.rdj.train -F learn_rdj
The script train_scrips/rdj/train.py
contains the same network structure used as before. The
seeds used are also included there to aid with reproduction. You can adjust the proportion of
randomly generated noise
datapoints in the configuration.
To train other parametric copula using the dataset, run
python -m train_scripts.rdj.train_with_frank -F learn_rdj_with_frank
Replace frank
with gumbel
and clayton
as appropriate.
The data was obtained from Yahoo Finance and is included in the /data
folder.
Training is done in the same manner as before:
python -m train_scripts.stocks.train -F learn_stocks
and training using other copula is done by
python -m train_scripts.stocks.train_with_frank -F learn_stocks_with_frank
where frank
may be replaced by clayton
or gumbel
.
The gas dataset may be downloaded here.
You should download the data and organize it such that the files batchX.dat
lies in /data/gas
.
Training is done in the same way; note that here we learn when d=3::
python -m train_scripts.gas.train -F learn_gas
and
python -m train_scripts.gas.train_with_frank -F learn_gas_with_frank