how can I handle graph with more than 50k nodes? #13

YunseobShin · 2018-12-19T11:46:52Z

Traceback (most recent call last):
File "/home/lib/python3.5/site-packages/joblib/externals/loky/backend/queues.py", line 157, in _feed
send_bytes(obj)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Hi. I really appreciate your library. I can get more accurate result with your code even than the author's one.

However, I have a problem that I can't learn the embedding of graphs whose nodes are about more than 50,000.

I guess the joblib module for "parallel_generate_walks" has a limit for large dataset.

Is this code originally limited to be so?

shuttle1987 · 2019-01-14T13:57:30Z

When this bug is fixed in Python upstream https://bugs.python.org/issue17560 you may find this fixed without having to modify your code. However it may be the case that at such large sizes there are better ways for handling the shared state in the first place.

eliorc · 2019-01-15T09:37:40Z

In the latest version, it is now possible to use memmapping when using parallel execution using the temp_folder variable on the Node2Vec constructor. Just pass there a location with enough space and it is supposed to work

YunseobShin closed this as completed Dec 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how can I handle graph with more than 50k nodes? #13

how can I handle graph with more than 50k nodes? #13

YunseobShin commented Dec 19, 2018

shuttle1987 commented Jan 14, 2019

eliorc commented Jan 15, 2019

how can I handle graph with more than 50k nodes? #13

how can I handle graph with more than 50k nodes? #13

Comments

YunseobShin commented Dec 19, 2018

shuttle1987 commented Jan 14, 2019

eliorc commented Jan 15, 2019