Binary encoding of (directed) Bipartite Graphs #168

Open
wires opened this Issue Sep 24, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@wires

wires commented Sep 24, 2016

This note describes an encoding of (finite) bipartite graphs.

http://mathworld.wolfram.com/BipartiteGraph.html

Motivation

Bi-partite graphs are ubiquitous in mathematics and computer science.

Matching on a graph is a very natural application (customer ↔ taxi), recommendations, clustering.
Lots of cool things.

And, of course, my personal motivation, http://statebox.github.io but more on that later ;^)

/cc @repatica @epost @whyrusleeping @jbenet @diasdavid

Details

A bi-partite graph is a graph where the nodes can be partitioned in two
disjoint sets, say A and B.

The edges only run between the two sets, so A=>B and B=>A only, no B=>B or A=>A. Hence, (directed) bi-partited graphs are often depicted as such:

screen shot 2016-09-24 at 12 30 31

For clarity and nothing else, I used nrs for one partition and letters for the other.

Proposed encoding

We now describe the graph by listing the adjacent edges of each node of one partition. So lets focus on the diamons and list them out.

screen shot 2016-09-24 at 12 30 43

Now in the directed case, each node a pair of sets of edges, namely the in- and outgoing edges. When we list this out we get (JSON example):

{ a: [ [1]  , [2] ]
, b: [ [1,2], []  ]
, c: [ []   , [2] ]
}

Now I do not want to name the nodes a, b, c, just be able to refer to them...
So instead we pick some order and infer their identifiers from the position in the array:

[ [ [1]  , [2] ]
, [ [1,2], []  ]
, [ []   , [2] ]
]

Result

Whats now left is:

List of lists (pairs) of lists of numbers

Note that there is an upperbound to these numbers, namely the size of
the other set. So, you can minize the upper bound by picking the
partition with the most nodes.

Not only that but we can further simplify.

  • Let the number 0 denote [
  • Let the number 1 denote ]
  • Counting the nodes starting at 2 (so 0=>2, 1=>3, ...).

Our example now becomes (indented for clarity):

0
 0 0 2 1
   0 3 1 1
 0 0 2 3 1
   0 1 1
 0 0 1
   0 3 1 1
1

Thus we are left with

List of numbers smaller than some upper bound.

This naturally leads to encoding as a list of varints, see protcol buffers for more information. Compression algorithms should work really well on this.

We now have a compact binary encoding of bipartite graphs and a natural decoding into JSON.

Further notes:

  • Why not n-partite? Well 1) the edge set becomes more involved when n goes up and 2) they are used less often.
  • What about undirected? Well, in that case you can leave out the pair altogether and stick to [[n]]. The parser can detect this by looking at how deep the lists are nested.
  • How to add data to this? Well, you can just we use a labeling
    function, along the lines of
{ labels: { vertices: { A: { 0: { someProp: 'someValue' },
                             1: { someProp: 'someOtherValue' } }
                      , B: { 0: { someOtherProp: 3.1415927 } } }
          , edges: { AB: {} , BA: {} }
          }
}

Opinions?!

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Sep 25, 2016

Member

I'll read through and respond properly to this later, but for now: cc @nicola @BrendanBenshoof

Member

whyrusleeping commented Sep 25, 2016

I'll read through and respond properly to this later, but for now: cc @nicola @BrendanBenshoof

@nicola

This comment has been minimized.

Show comment
Hide comment
@nicola

nicola Sep 25, 2016

Member

Hey @wires thanks for joining in the conversation! I will give a deeper read soon.
It sounds very similar to a conversation we had during one of the weekly IPLD meetings (I suggest you to join in (see ipfs/pm#201))

Here are more info about application-level cycles (but your formalization sounds like a great path forward), will give a proper feedback soon.

Member

nicola commented Sep 25, 2016

Hey @wires thanks for joining in the conversation! I will give a deeper read soon.
It sounds very similar to a conversation we had during one of the weekly IPLD meetings (I suggest you to join in (see ipfs/pm#201))

Here are more info about application-level cycles (but your formalization sounds like a great path forward), will give a proper feedback soon.

@wires

This comment has been minimized.

Show comment
Hide comment
@wires

wires Sep 30, 2016

Hi, thanks! I'll be looking more into this over the coming days and see if it can be aligned better with merkle-dag / IPLD specs. To see how I'm using this in statebox to express processes, have a look at https://github.com/statebox/tbpt-viewer ; but this is a bit thin on docs at the moment.

wires commented Sep 30, 2016

Hi, thanks! I'll be looking more into this over the coming days and see if it can be aligned better with merkle-dag / IPLD specs. To see how I'm using this in statebox to express processes, have a look at https://github.com/statebox/tbpt-viewer ; but this is a bit thin on docs at the moment.

@wires

This comment has been minimized.

Show comment
Hide comment
@wires

wires Sep 30, 2016

Oh, and indeed, bipartite graphs to express cycles seem a good fit... I'll digest more!

wires commented Sep 30, 2016

Oh, and indeed, bipartite graphs to express cycles seem a good fit... I'll digest more!

@diasdavid diasdavid added the IPLD label Aug 28, 2017

@bhaugen bhaugen referenced this issue in valueflows/vf-apps-traversing-the-flows Jul 1, 2018

Open

Sources for flow traversal algorithm #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment