Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big vertex fixes #137

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

cerquide
Copy link
Contributor

As reported here, I am interested in extending graphlab so that it can be used with ids that are larger than the standard. This pull request does not change the vertex_id_type, but makes some minor changes which are necessary in the code that make it possible for someone who is interested to use a larger vertex id space.

Description of the changes

Current version of graphlab is not prepared for larger vertex id types. The usage of larger vertex id types requires to configure the project to compile with C++11. Slight modifications have been done to to the following files:

  1. graphlab/graph/graph_basic_types.hpp: The type for local vertex id should be decoupled from the larger type (currently they are enforced to be the same type). We do this by introducing a new intermediate type called standard_vertex_id_type. For regular users this is just an implementation detail.
  2. graphlab/engine/distributed_chandy_misra.hpp: The assessment of sequentialization keys misses a cast to unsigned char.
  3. graphlab/graph/distributed_graph.hpp and graphlab/graph/ingress/distributed_ingress_base.hpp: num_in_edges and num_out_edges should have type size_t.
  4. graphlab/graph/graph_hash.hpp: three casts from vertex_id_type to size_t are missing.
  5. Two variables have been introduced into CMakeLists.txt to control the usage of an extended vertex id type.
  6. Some casts have been added to several files in apps and tests.

To the best of my knowledge, those changes will not have any effect when the vertex_id_type is kept "standard" (that is, either uint32_t or uint64_t) either with C++11 or without it. Thus I push the changes to be accepted into the graphlab project.

Steps to use a different vertex_id_type (assuming that the changes in this pull request are accepted into the project)

There are two ways for doing it.

Simple way

  1. Select one of the types included in graphlab/util/multiprecision_vertex_id_types.hpp. They are boost::multiprecision::int128_t, boost::multiprecision::int256_t, boost::multiprecision::int512_t and boost::multiprecision::int1024_t

  2. Configure the project as follows:

    ./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'<graphlab/util/multiprecision_vertex_id_types.hpp>'" -D EXTERNAL_VERTEX_ID_TYPE=boost::multiprecision::int128_t

Complex way. Implementing another type

Here follows a step by step guide on how to use other types as vertex_id_type.

  1. Decide on the type that you would prefer for your vertex id. Constraints on the type are:

    1. It should be an arithmetic type implementing +, *, ^ and so on. Currently I have only tested with boost::multiprecision::int128_t, and it fulfills all the needs in this sense.
    2. It should be castable into common types such as size_t, int, unsigned char, .... It our case this is also provided by the boost multiprecision library.
    3. It should be graphlab serializable (for boost::multiprecision::int128_t, I used out of place serialization, see graphlab/util/multiprecision_vertex_id_types.hpp for examples).
    4. There should exist a function size_t hash_value(const my_large_id_type& x) in the same namespace where my_large_id_type is defined (see graphlab/util/multiprecision_vertex_id_types.hpp, for examples).
  2. Configure the project as follows:

    ./configure --c++11 -D EXTERNAL_VERTEX_ID_TYPE_INCLUDE="'[path to your include file here]'" -D EXTERNAL_VERTEX_ID_TYPE=[your vertex_id_type class here]

@cerquide
Copy link
Contributor Author

I have also modified all the code in the toolkits to work with larger vertex_ids. However, I would like to hear your opinion on the pull request before making it bigger.

@dbickson
Copy link
Contributor

Hi Jesus,
Let us take a look (it will take a couple of days) and we will get back to
you ASAP. We highly appreciate your work!!

http://www.graphlab.com Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889 https://twitter.com/graphlabteam
http://www.linkedin.com/company/graphlab
https://www.facebook.com/graphlabinc
http://www.youtube.com/user/GraphLabInc

On Tue, Apr 22, 2014 at 12:34 AM, Jesús Cerquides
notifications@github.comwrote:

I have also modified all the code in the toolkits to work with larger
vertex_ids. However, I would like to hear your opinion on the pull request
before making it bigger.


Reply to this email directly or view it on GitHubhttps://github.com//pull/137#issuecomment-40979823
.

@cerquide
Copy link
Contributor Author

Thanks Danny, by now just take a look at the main idea on how to do the extension. I have identified some issues that still need to be fixed.

…orted a bug in boost::multiprecision. Provided a fix so that graphlab is not affected.
@cerquide
Copy link
Contributor Author

Making the whole of graphlab compatible with large vertex_ids is a big change that impacts different parts of the code. Although I have tried to minimize the impact, it is still a major change. As such, right now adding it to the graphlab master branch does not seem the right way to go. On the other hand, I think that it can be potentially very interesting to other people that, like myself, should read the graph from a database which already has their own ids. Thus, to me the best option right now is to create a branch where the work is committed.

This raises questions about the development process that you expect for the graphlab project.
Do you plan to have a single branch?
How would major changes such as this make it into the project?
How do you decide which functionality should be on the next release?
Does the project have any defined set of tests that can be automatically run? If so, how?

These are very relevant questions to make clear if your idea is to have people contributing to the code base. By the way, given the amount of movement going on on the code base right now. A simple model such as "Danny will decide" could be the most appropriate.

Sorry for wandering ;)

@dbickson
Copy link
Contributor

Hi
Those are great questions and I think they will be way more visible in our
user forum. We are setting up an internal discussion for deciding what is
the best way to help you in what you need. Please repost at the forum and
we promise to get back with some advice.

http://www.graphlab.com Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889 https://twitter.com/graphlabteam
http://www.linkedin.com/company/graphlab
https://www.facebook.com/graphlabinc
http://www.youtube.com/user/GraphLabInc

On Tue, Apr 22, 2014 at 2:18 PM, Jesús Cerquides
notifications@github.comwrote:

Making the whole of graphlab compatible with large vertex_ids is a big
change that impacts different parts of the code. Although I have tried to
minimize the impact, it is still a major change. As such, right now adding
it to the graphlab master branch does not seem the right way to go. On the
other hand, I think that it can be potentially very interesting to other
people that, like myself, should read the graph from a database which
already has their own ids. Thus, to me the best option right now is to
create a branch where the work is committed.

This raises questions about the development process that you expect for
the graphlab project.
Do you plan to have a single branch?
How would major changes such as this make it into the project?
How do you decide which functionality should be on the next release?
Does the project have any defined set of tests that can be automatically
run? If so, how?

These are very relevant questions to make clear if your idea is to have
people contributing to the code base. By the way, given the amount of
movement going on on the code base right now. A simple model such as "Danny
will decide" could be the most appropriate.

Sorry for wandering ;)


Reply to this email directly or view it on GitHubhttps://github.com//pull/137#issuecomment-41028118
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants