This code accompanies the RankFromSets SDSS submission.
To view the above visualization in a browser, please download this HTML file.
These experiments were conducted on a red hat linux cluster with Nvidia P100 GPUs.
Python environment, using the Anaconda python package manager:
conda env create -f environment.yml
{train, valid, test}.tsv
files are observations of user, item interactions.
The item_attributes_csr.npz
is a compressed sparse row format matrix of shape (n_items, n_attributes)
. For example, if the data is documents in a bag of words format, each row is a document and the attributes are the words.
We omit raw arXiv data and food tracking data as it is private user data.
This follows the reproducibility supplement example of a square kernel.
Generate data to /tmp/dat/simulation_%d
where %d is a number from 1 to 30 replications:
export DAT=/tmp
python build_simulation_dataset.py
Launch the best-performing parameter settings with the SLURM manager for the inner product, deep, and residual regression functions:
PYTHONPATH=. python experiment/arxiv/grid.py
We find that large batch sizes significantly improve performance. See config.yml
for the best-performing hyperparameters.