Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why running SGD with multiple hosts return segfault? #39

Closed
WyjAloneSmile opened this issue Jun 24, 2019 · 6 comments
Closed

Why running SGD with multiple hosts return segfault? #39

WyjAloneSmile opened this issue Jun 24, 2019 · 6 comments

Comments

@WyjAloneSmile
Copy link

WyjAloneSmile commented Jun 24, 2019

I have an bipartite graph with 11 nodes and 10 edges. The edgelist of this graph is:
1 6 1
2 6 1
2 7 1
3 7 1
3 8 1
4 8 1
4 9 1
5 9 1
5 10 1
0 10 1
I convert it to .gr with -edgelist2gr.
When I used it to run SGD in one host ,the program run successfully,but when I used it to run SGD in two hosts, the program return segfaults.
[0] InitializeGraph::go called
[1] InitializeGraph::go called
yhrun: error: cn7420: task 1: Segmentation fault

When program run
_graph.sync<writeSource, readAny, Reduce_set_latent_vector>("InitializeGraph");
program return segfaults.
How to solve it?

@WyjAloneSmile WyjAloneSmile changed the title How to change bipartite graph to gr? How to convert bipartite graph to gr? Jun 24, 2019
@WyjAloneSmile WyjAloneSmile changed the title How to convert bipartite graph to gr? Why running SGD with multiple hosts return segfault? Jun 24, 2019
@gurbinder533
Copy link
Contributor

Hello,
We have been able to reproduce this error and are working on the fix. We will push the changes as soon as possible and let you know.

@Yang-YiFan
Copy link

Yang-YiFan commented Jul 17, 2019

I've also encountered a similar issue in matrix completion. The program went segmentation fault on both gr and sgr format input even with one thread.
Any idea why I can't run SGD on one thread?

@l-hoang
Copy link
Member

l-hoang commented Oct 18, 2019

Hello.

@WyjAloneSmile May you give me the command you used to convert your edgelist to the gr format as well as the command line you used to run sgd?

@Yang-YiFan Please open a separate issue with more details such as the command line used, graph used, etc, and I can take a look at your problem.

Thank you.

@WyjAloneSmile
Copy link
Author

Hello.

@WyjAloneSmile May you give me the command you used to convert your edgelist to the gr format as well as the command line you used to run sgd?

@Yang-YiFan Please open a separate issue with more details such as the command line used, graph used, etc, and I can take a look at your problem.

Thank you.

convert edgelist to gr: ./graph-convert input_graph output_graph -edgelist2gr -edgeType=int32
run SGD: yhrun -N 2 -n 2 -c 1 ./sgd input_gr -runs=1 -t=1 -DECAY_RATE=0.5 -LAMBDA=0.001 -LEARNING_RATE=0.001 -maxIterations=1000

@l-hoang
Copy link
Member

l-hoang commented Oct 21, 2019

Hello @WyjAloneSmile.

Please try using "-edgeType=float64" instead of "-edgeType=int32". The distributed SGD graph expects a double for its datatype.

I am not able to reproduce the exact error you are seeing, but using int32 does cause the program to hang on my end. Using float64 causes it to run to completion correctly.

Let me know how it goes.

Thanks,
Loc

@l-hoang
Copy link
Member

l-hoang commented Dec 4, 2019

I'll be closing this issue unless there are still problems. In that case, please reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants