Big vertex fixes #137

cerquide · 2014-04-21T00:52:48Z

As reported here, I am interested in extending graphlab so that it can be used with ids that are larger than the standard. This pull request does not change the vertex_id_type, but makes some minor changes which are necessary in the code that make it possible for someone who is interested to use a larger vertex id space.

Description of the changes

Current version of graphlab is not prepared for larger vertex id types. The usage of larger vertex id types requires to configure the project to compile with C++11. Slight modifications have been done to to the following files:

graphlab/graph/graph_basic_types.hpp: The type for local vertex id should be decoupled from the larger type (currently they are enforced to be the same type). We do this by introducing a new intermediate type called standard_vertex_id_type. For regular users this is just an implementation detail.
graphlab/engine/distributed_chandy_misra.hpp: The assessment of sequentialization keys misses a cast to unsigned char.
graphlab/graph/distributed_graph.hpp and graphlab/graph/ingress/distributed_ingress_base.hpp: num_in_edges and num_out_edges should have type size_t.
graphlab/graph/graph_hash.hpp: three casts from vertex_id_type to size_t are missing.
Two variables have been introduced into CMakeLists.txt to control the usage of an extended vertex id type.
Some casts have been added to several files in apps and tests.

To the best of my knowledge, those changes will not have any effect when the vertex_id_type is kept "standard" (that is, either uint32_t or uint64_t) either with C++11 or without it. Thus I push the changes to be accepted into the graphlab project.

Steps to use a different `vertex_id_type` (assuming that the changes in this pull request are accepted into the project)

There are two ways for doing it.

Simple way

Select one of the types included in graphlab/util/multiprecision_vertex_id_types.hpp. They are boost::multiprecision::int128_t, boost::multiprecision::int256_t, boost::multiprecision::int512_t and boost::multiprecision::int1024_t
Configure the project as follows:

./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'<graphlab/util/multiprecision_vertex_id_types.hpp>'" -D EXTERNAL_VERTEX_ID_TYPE=boost::multiprecision::int128_t

Complex way. Implementing another type

Here follows a step by step guide on how to use other types as vertex_id_type.

Decide on the type that you would prefer for your vertex id. Constraints on the type are:
1. It should be an arithmetic type implementing +, *, ^ and so on. Currently I have only tested with boost::multiprecision::int128_t, and it fulfills all the needs in this sense.
2. It should be castable into common types such as size_t, int, unsigned char, .... It our case this is also provided by the boost multiprecision library.
3. It should be graphlab serializable (for boost::multiprecision::int128_t, I used out of place serialization, see graphlab/util/multiprecision_vertex_id_types.hpp for examples).
4. There should exist a function size_t hash_value(const my_large_id_type& x) in the same namespace where my_large_id_type is defined (see graphlab/util/multiprecision_vertex_id_types.hpp, for examples).
Configure the project as follows:

./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'[path to your include file here]'" -D EXTERNAL_VERTEX_ID_TYPE=[your vertex_id_type class here]

…ex_id_type

cerquide · 2014-04-21T21:34:08Z

I have also modified all the code in the toolkits to work with larger vertex_ids. However, I would like to hear your opinion on the pull request before making it bigger.

dbickson · 2014-04-22T05:13:40Z

Hi Jesus,
Let us take a look (it will take a couple of days) and we will get back to
you ASAP. We highly appreciate your work!!

http://www.graphlab.com Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889 https://twitter.com/graphlabteam
http://www.linkedin.com/company/graphlab
https://www.facebook.com/graphlabinc
http://www.youtube.com/user/GraphLabInc

On Tue, Apr 22, 2014 at 12:34 AM, Jesús Cerquides
notifications@github.comwrote:

I have also modified all the code in the toolkits to work with larger
vertex_ids. However, I would like to hear your opinion on the pull request
before making it bigger.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/137#issuecomment-40979823
.

cerquide · 2014-04-22T07:01:44Z

Thanks Danny, by now just take a look at the main idea on how to do the extension. I have identified some issues that still need to be fixed.

…orted a bug in boost::multiprecision. Provided a fix so that graphlab is not affected.

cerquide · 2014-04-22T11:18:07Z

Making the whole of graphlab compatible with large vertex_ids is a big change that impacts different parts of the code. Although I have tried to minimize the impact, it is still a major change. As such, right now adding it to the graphlab master branch does not seem the right way to go. On the other hand, I think that it can be potentially very interesting to other people that, like myself, should read the graph from a database which already has their own ids. Thus, to me the best option right now is to create a branch where the work is committed.

This raises questions about the development process that you expect for the graphlab project.
Do you plan to have a single branch?
How would major changes such as this make it into the project?
How do you decide which functionality should be on the next release?
Does the project have any defined set of tests that can be automatically run? If so, how?

These are very relevant questions to make clear if your idea is to have people contributing to the code base. By the way, given the amount of movement going on on the code base right now. A simple model such as "Danny will decide" could be the most appropriate.

Sorry for wandering ;)

dbickson · 2014-04-22T13:58:21Z

Hi
Those are great questions and I think they will be way more visible in our
user forum. We are setting up an internal discussion for deciding what is
the best way to help you in what you need. Please repost at the forum and
we promise to get back with some advice.

http://www.graphlab.com Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889 https://twitter.com/graphlabteam
http://www.linkedin.com/company/graphlab
https://www.facebook.com/graphlabinc
http://www.youtube.com/user/GraphLabInc

On Tue, Apr 22, 2014 at 2:18 PM, Jesús Cerquides
notifications@github.comwrote:

Making the whole of graphlab compatible with large vertex_ids is a big
change that impacts different parts of the code. Although I have tried to
minimize the impact, it is still a major change. As such, right now adding
it to the graphlab master branch does not seem the right way to go. On the
other hand, I think that it can be potentially very interesting to other
people that, like myself, should read the graph from a database which
already has their own ids. Thus, to me the best option right now is to
create a branch where the work is committed.

This raises questions about the development process that you expect for
the graphlab project.
Do you plan to have a single branch?
How would major changes such as this make it into the project?
How do you decide which functionality should be on the next release?
Does the project have any defined set of tests that can be automatically
run? If so, how?

These are very relevant questions to make clear if your idea is to have
people contributing to the code base. By the way, given the amount of
movement going on on the code base right now. A simple model such as "Danny
will decide" could be the most appropriate.

Sorry for wandering ;)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/137#issuecomment-41028118
.

cerquide added 4 commits April 21, 2014 02:19

Modifications needed to use big_vertices

7ad3231

Modifications needed to use big_vertices. Minor fix

8fcca55

Added the preprocessor variable EXTERNAL_VERTEX_ID_TYPE

f93925f

Small changes to allow apps, tests, and graphlab to use a larger vert…

4b5775b

…ex_id_type

Some fixes so that distributed_graph_test can run. Identified and rep…

0a7f0a8

…orted a bug in boost::multiprecision. Provided a fix so that graphlab is not affected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big vertex fixes #137

Big vertex fixes #137

cerquide commented Apr 21, 2014

cerquide commented Apr 21, 2014

dbickson commented Apr 22, 2014

cerquide commented Apr 22, 2014

cerquide commented Apr 22, 2014

dbickson commented Apr 22, 2014

Big vertex fixes #137

Are you sure you want to change the base?

Big vertex fixes #137

Conversation

cerquide commented Apr 21, 2014

Description of the changes

Steps to use a different vertex_id_type (assuming that the changes in this pull request are accepted into the project)

Simple way

Complex way. Implementing another type

cerquide commented Apr 21, 2014

dbickson commented Apr 22, 2014

cerquide commented Apr 22, 2014

cerquide commented Apr 22, 2014

dbickson commented Apr 22, 2014

Steps to use a different `vertex_id_type` (assuming that the changes in this pull request are accepted into the project)