Running on Reddit dataset is extremely slow #14

cai-lw · 2019-04-02T06:56:16Z

I downloaded the processed Reddit data set form #8 (comment), and then run train_batch_multiRank_inductive_reddit_Mixlayers_sampleA.py with default parameters. It takes about 10 minutes for a single epoch. However the paper reported 638.6 seconds for the WHOLE training process. I am ~200x slower than your reported speed.

I am running on an AWS m5.2xlarge instance with the same CPU spec as your machine (8 vCPUs = 4 core 8 thread, 2.5GHz). All dependencies are simply installed by pip.

The text was updated successfully, but these errors were encountered:

matenure · 2019-04-04T00:32:05Z

The default parameter did not do any sampling: main(None).
Change the "None" into 100 or 200

cai-lw · 2019-04-04T00:39:26Z

@matenure It works. Thank you.
Could you change the default behavior of this code, or tell people how to change it in README? The README says it is "the final model" but it isn't since it didn't do any sampling.

matenure · 2019-04-08T05:07:20Z

Thanks. Your update has been merged.

cai-lw mentioned this issue Apr 4, 2019

Add note about sampling of the Reddit model #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on Reddit dataset is extremely slow #14

Running on Reddit dataset is extremely slow #14

cai-lw commented Apr 2, 2019

matenure commented Apr 4, 2019

cai-lw commented Apr 4, 2019

matenure commented Apr 8, 2019

Running on Reddit dataset is extremely slow #14

Running on Reddit dataset is extremely slow #14

Comments

cai-lw commented Apr 2, 2019

matenure commented Apr 4, 2019

cai-lw commented Apr 4, 2019

matenure commented Apr 8, 2019