Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on Reddit dataset is extremely slow #14

Open
cai-lw opened this issue Apr 2, 2019 · 3 comments
Open

Running on Reddit dataset is extremely slow #14

cai-lw opened this issue Apr 2, 2019 · 3 comments

Comments

@cai-lw
Copy link
Contributor

cai-lw commented Apr 2, 2019

I downloaded the processed Reddit data set form #8 (comment), and then run train_batch_multiRank_inductive_reddit_Mixlayers_sampleA.py with default parameters. It takes about 10 minutes for a single epoch. However the paper reported 638.6 seconds for the WHOLE training process. I am ~200x slower than your reported speed.

I am running on an AWS m5.2xlarge instance with the same CPU spec as your machine (8 vCPUs = 4 core 8 thread, 2.5GHz). All dependencies are simply installed by pip.

@matenure
Copy link
Owner

matenure commented Apr 4, 2019

The default parameter did not do any sampling: main(None).
Change the "None" into 100 or 200

@cai-lw
Copy link
Contributor Author

cai-lw commented Apr 4, 2019

@matenure It works. Thank you.
Could you change the default behavior of this code, or tell people how to change it in README? The README says it is "the final model" but it isn't since it didn't do any sampling.

@matenure
Copy link
Owner

matenure commented Apr 8, 2019

Thanks. Your update has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants