Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge labels #17

Closed
snowleopard opened this issue Jul 12, 2017 · 54 comments
Closed

Edge labels #17

snowleopard opened this issue Jul 12, 2017 · 54 comments
Labels

Comments

@snowleopard
Copy link
Owner

snowleopard commented Jul 12, 2017

The current implementation has no support for edge labels and this is a serious limitation. Let's use this issue to discuss possible designs.

As an example, consider a graph with vertices 1, 2, 3 and labelled edges 1 -{a}-> 2 and 2 -{b}-> 3.

The paper mentions this approach:

type LabelledGraph b a = b -> Graph a

g :: LabelledGraph Char Int
g 'a' = 1 * 2
g 'b' = 2 * 3
g _   = empty

Here b stands for the type of labels that can be attached not only to edges, but also vertices and arbitrary subgraphs. Function application g x acts as a projection: it extracts the subgraph labelled with x.

A similar, but somewhat simpler approach is to use lists of pairs instead of functions:

type LabelledGraph b a = [(b, Graph a)]

g :: LabelledGraph Char Int
g = [('a', 1 * 2), ('b', 2 * 3)]

To obtain a projection, we filter the list by the edge label and overlay the graphs. The advantage of this approach compared to the above is that it's more explicit, and we can easily implement equality checking.

A very different approach is to add a new labelling operator [b] to the algebra. Here to obtain a nice algebraic structure the type of labels b need to form an idempotent semiring. For example, this works very well for Boolean labels:

  • [0]a = empty
  • [1]a = a
  • [x]a + [y]a = [x | y]a
  • [x][y]a = [x & y]a
  • [x](a + b) = [x]a + [x]b
  • [x](a * b) = [x]a * [x]b

This should also work well for numeric labels, e.g. usually a distance label x on an edge corresponds to the interval [x; infinity) in terms of possible time it takes to traverse the edge, and such intervals, I believe form an idempotent semiring.

The cost is a new graph construction primitive:

data G b a = Empty
           | Vertex a
           | Overlay (G b a) (G b a)
           | Connect (G b a) (G b a)
           | Label b (G b a)

g :: G Char Int
g = Overlay (Label 'a' $ Vertex 1 `Connect` Vertex 2) (Label 'b' $ Vertex 2 `Connect` Vertex 3)

I haven't yet tried any of these approaches on a serious example.

I'm pretty sure there are other approaches and the design space is quite large. Do you have any suggestions?

@jaesharp
Copy link

jaesharp commented Jul 21, 2017

Perhaps an implementation based on line graphs (where the nodes are representative of edges and cliques uniquely represent nodes in the underlying edge-to-vertex dual graph). You might use multiparameter typeclass instances of the nodes to represent valid connection types overloading the type hierarchy for validity checking and label specification -- out of band, so speak. You could then treat the concrete linegraph as an edge 'schema' connected with a surjective relation of cliques to concrete edge-to-vertex dual graph nodes. Still, I'm just brainstorming.

@snowleopard
Copy link
Owner Author

snowleopard commented Jul 21, 2017

Thanks @justinlynn!

Line graphs are cool, but so far I couldn't come up with a good way to represent them algebraically (without allowing some of their forbidden subgraphs). I also couldn't figure out how to efficiently (and algebraically) obtain the line graph of a given graph. So, I am a bit stuck in this direction, but I agree, this is a potential option to consider when dealing with edge labels.

@jaesharp
Copy link

@snowleopard Excellent point -- now that you say it, I can't yet see that either. I suppose I've been coming at it from more of a data structure perspective and looking at how to cram the construction into the haskell typechecker while still maintaining invariants. Thanks heaps for taking a look and taking the time to provide some excellent feedback. Cheers!

@alpmestan
Copy link

Hello. I'm loving alga so far and am looking into ways to make this labelled edge story nice. Here are a few thoughts:

  • Maybe we could have the edge labelling part be an extension to the core graph algebra, so:

    -- mirrors your general `Label` primitive, but we could
    -- look into just a function that takes two `Vertex g`s and
    -- basically labels `Connect v1 v2`. We could have both with
    -- default definitions in at least one direction, maybe?
    class Graph g => LabelledGraph g where
        label :: b -> g -> g

    This way, we keep the core algebra intact, and each interpretation is free to implement (or not)
    the label combinator. This however does not really help in finding one or more implementations
    that just "fall out" nicely, especially not ones that e.g allow us to just label edges, not arbitrary
    subgraphs.

  • I have been looking into providing an adjacency matrix style interpretation with labelled edges. The point would be to have the ability to reuse Russell O'Connor's approach to solving a whole bunch of problems using just the semiring hammer. It however seems to be like a particularly unfriendly representation for alga-style graphs, as you possibly just keep discovering new vertices as you interpret a graph expression, which possibly requires that you resize the matrix unless using a sparse representation, etc, in addition to requiring that your vertex type implements Ix and Bounded, which is a bit annoying when all you have is "a", "b", "c" and you'd just like to have some graph involving those 3 vertices. If we do find a nice way to make this work, then going from labelled to unlabelled edges is just about using the standard Bool semiring instead of some other lbl semiring.

  • In the paper, you say:

    The above looks remarkably close to a semiring, with the only oddity being the shared identity of the two operations. The lack of the annihilating zero element (i.e. x → 0 = 0) and the following decomposition law is what makes the algebra of graphs different

    But the post I linked to above (and several papers/books) do admit at least one encoding of (labelled or unlabelled) graphs that is a semiring. I saw in some comments on your blog that other people asked about that kind of thing, but have you considered augmenting the algebra to make it basically one with two distinct identities (so that 0 absorbs →) ? So a graph would be semiring + decomposition law. What would change? Have you investigated this?

More thoughts might come later.

@snowleopard
Copy link
Owner Author

@alpmestan Hello, and thank you for sharing your thoughts! Let me answer point-by-point.

  • I like your suggestion about adding a type class with an additional graph labelling method. I'd probably prefer not to use the term 'label' here to avoid confusion with edge labels. Perhaps, we can call such graphs tagged graphs? We'll also need to add an associated type:

    class Graph g => TaggedGraph g where
        type Tag g
        tag :: Tag g -> g -> g

    Tagged graphs have been discussed here: Remove GraphKL from the API #19 (comment).

  • I agree, matrices seem to be quite unfriendly for Alga graphs. One possible workaround is to keep graph expressions in the tree form (i.e. using the deep embedding from Algebra.Graph) until they actually need to be consumed as a matrix. It seems to me that there is no need to mix construction of graphs (using the graph algebra) and their analysis (using matrices, semirings etc.). Then there will be a function like Graph a -> Matrix a that takes the whole expression, finds out all the vertices in it and builds the corresponding matrix, avoiding intermediate ones. Do you think this will work for your use case?

  • Yes, one can add annihilating zero into the algebra of graphs, but this zero is very unusual. Just as unusual as the identity (empty graph). Zero is just the fully connected graph. Indeed we have 0 + x = 0 and 0 * x = 0, i.e. the fully connected graph annihilates both overlay and connect! But this is still not a semiring, it's a different algebraic structure. It's probably not a satisfactory answer for you, but I don't know how to marry the algebra of graphs with traditional semiring-based graph analysis. Perhaps, they are just different tools useful in different circumstances.

@alpmestan
Copy link

alpmestan commented Nov 3, 2017

Tagged graphs as you introduce them in your last comment could work well indeed. I can certainly imagine this construct being useful. I however do think I need another solution for graphs with labelled edges, as I basically want a representation that'll only allow me to "tag" Connect (Vertex _) (Vertex _)-shaped graphs.

Regarding your suggestion of constructing the matrix "at the last minute", once we have described the entire graph and right before we run some algorithm on it, this would definitely work, I think. That's one possible approach. I have been thinking, since my last comment, about another route that doesn't go through the deep embedding. It is probably less efficient, but it is quite cute IMO.

So, having an adjacency matrix and growing it all the time is a no-go. And I want something that doesn't restrict me to Ix+Bounded-compliant vertex types (I want to be able to have String vertices for example). But we know about other representations of linear maps! Conal (here, here) and Ed (here) have talked about this. By putting this together with Russell's approach, it occured to me that I might be able to model a graph as a linear map from a free module generated by our set of vertices to itself, where the free modules are over an arbitrary semiring (Bool for unlabelled edges, something fancier otherwise). I believe that the set can in theory be infinite (but countable?), as long as any vector we ever consider is only a finite linear combination of elements from our set. The "adjacency linear map" representation of a graph could in addition to storing the linear map, store the set of vertices that we're actually dealing with (or that we've seen so far, when building the graph), so that we would only ever call (and memoize) the restriction of the linear map to just those vertices when using that representation. I have written a bit of code down but it's too early for me to know whether this will work in the end, and to know whether I can provide sane asymptotics for the construction and use of that representation.

Most likely, this is all fun but terribly inefficient and we'll just go with your suggestion of going through Graph and converting to an actual Matrix at the end. :)

Re: the algebra, it's interesting that this 0 would be the fully connected graph. That's a really weird thing to work with. It does feel like their ought to be a way to reconcile the two, but I don't see any.

@snowleopard
Copy link
Owner Author

I basically want a representation that'll only allow me to "tag" Connect (Vertex _) (Vertex _)-shaped graphs.

@alpmestan Yes, I see. Thinking about this led me to the following type class:

class (Graph g, Monoid (Label g)) => LabelledGraph g where
    type Label g 
    labelledEdge :: Label g -> Vertex g -> Vertex g -> g

With the following two laws:

  • connect (vertex x) (vertex y) == labelledEdge mempty x y
  • overlay (labelledEdge a x y) (labelledEdge b x y) == labelledEdge (a <> b) x y

It doesn't look very nice though.

But we know about other representations of linear maps! [..]

That's very cool, I like this idea! I'm very curious to see what you come up with -- please share the code when/if you manage to make this approach work :)

It does feel like their ought to be a way to reconcile the two, but I don't see any.

Same here!

@boggle
Copy link

boggle commented Dec 12, 2017

Late to the thread... putting labels on connect is what I was thinking about, too. Does it have to be a monoid though? Shouldn't a semiring be enough? Or do you think unlabeled edges are important? With "SET" as the semiring, this directly gives labeled multigraphs which is what I would want most of the time.

@snowleopard
Copy link
Owner Author

@boggle You're not late, I think we're still looking for the right approach.

Does it have to be a monoid though? Shouldn't a semiring be enough?

Good point, a semiring should be sufficient for combining edges. My previous attempts to extend the theory with labelled edges used monoid labels, but it's interesting to see what happens with a semiring.

With "SET" as the semiring, this directly gives labeled multigraphs which is what I would want most of the time.

Could you clarify what you mean by "SET" here? Plain sets do have the identity element (the empty set), so I guess you mean something else here.

@boggle
Copy link

boggle commented Dec 13, 2017

I meant the the normal "Set a" monoid. Except the monoid constraint isn't needed here. Actually the kind of graphs I'm interested in right now would only need "NonEmptySet a" which of course is just a semiring. Layering is possible, too, if the type of edge labels is a monoid, then "connect" can be used without specifying a label.

@snowleopard
Copy link
Owner Author

@boggle I see, thanks for clarifying!

Just to make sure I fully understand your use-case: you are looking for a graph type G a b where a is the type of vertices, and b is the type of labels that can be placed on edges, where multiple edges are allowed between a pair of vertices. Alternatively, we can think of this as having at most one edge between a pair of vertices, which is labelled by NonEmptySet b.

Interesting!

@boggle
Copy link

boggle commented Dec 14, 2017

Yes, that's what I had in mind. And by the way you've put it, I'm realizing if one actually works with a monoid bound, one could get a function "connected a b g" this gives mempty if a b are not connected in a b, the label otherwise. That might turn out to feel really natural and wouldn't require moving to NonEmptySet. It raisese the question though what "connect a b mempty" would mean - perhaps this is just a no-op.

@snowleopard
Copy link
Owner Author

It raisese the question though what "connect a b mempty" would mean - perhaps this is just a no-op.

I think connect a b mempty should be the same as overlay a b in this case.

@boggle
Copy link

boggle commented Dec 14, 2017

Yes, I agree.

This made me now wonder for a moment if that would allow to not even have overlay, i.e. overlay becomes just a shorthand for connect a b mempty thus reducing the number of operators in the algebra.

@snowleopard
Copy link
Owner Author

snowleopard commented Dec 14, 2017

would allow to not even have overlay, i.e. overlay becomes just a shorthand for connect a b mempty

@boggle That's true! We've been experimenting with so-called 'conditional connect' which allows us to have the following encoding of overlay and connect for edge-unlabelled graphs:

data Graph a = Empty | Vertex a | ConditionalConnect Bool (Graph a) (Graph a)

overlay :: Graph a -> Graph a -> Graph a
overlay = ConditionalConnect False

connect :: Graph a -> Graph a -> Graph a
connect = ConditionalConnect True

This case is not particularly interesting, since it's isomorphic to the datatype with two separate constructors Overlay and Connect, but when we take this idea to edge-labelled graphs this seems to become more interesting, as you point out:

data Graph b a = Empty | Vertex a | LabelledConnect b (Graph a) (Graph a)

overlay :: Monoid b => Graph b a -> Graph b a -> Graph b a
overlay = LabelledConnect mempty

But there is more! If b is a lattice, then we can have the following symmetric encoding:

data Graph b a = Empty | Vertex a | LabelledConnect b (Graph a) (Graph a)

overlay :: BoundedLattice b => Graph b a -> Graph b a -> Graph b a
overlay = LabelledConnect bottom

connect :: BoundedLattice b => Graph b a -> Graph b a -> Graph b a
connect = LabelledConnect top

This brings us back to the 'conditional connect' since Boolean algebra is a bounded lattice with top = True and bottom = False.

Could this be the encoding we've been looking for?

@alpmestan
Copy link

You could probably play the same trick with Semiring b, just to tie it all up to what I mentionned earlier in this thread. Possibly other structures? But yes, promising encoding, definitely.

@snowleopard
Copy link
Owner Author

snowleopard commented Dec 15, 2017

@alpmestan I used to think we need semirings, but I'm now leaning towards something lattice-like.

Let's try to list all requirements. Below I use x -{a}-> y to denote LabelledConnect.

  • We need 0 and 1 for defining overlay and connect, as above.
  • We'd like to compose edge labels as overlay (x -{a}-> y) (x -{b}-> y) = x -{a \/ b}-> y. This leads to 0 \/ x = x, x \/ x = x and 1 \/ x = 1, as otherwise overlay would lose its usual laws.

This looks like a join-semilattice (https://ncatlab.org/nlab/show/semilattice) with two bounds (0 and 1).

I guess we don't need the meet operator /\ for edge-labelled graphs, which is a bit strange. If I remember correctly, /\ is required for tagged graphs discussed here.

@alpmestan
Copy link

My semiring obsession is almost cured, don't worry. So with your formulation, 0 is _|_, OK, but 1 is T (top) ? In which case we can't "go past 1"? I might be misunderstanding what you said though.

@boggle
Copy link

boggle commented Dec 15, 2017

This is interesting! On the one hand it is really elegant to have neutral elements for both overlay and connect, on the other hand, I don't quite see the need to have a full lattice with two bounds and both join and meet.

My use case would still be representing multigraphs by labeling edges with sets of tags. The empty set/lower bound would serve to represent overlay; any other set X would represent labeled connect but of course would label edges in the new graph with the union of X with any pre-existing edge label.

Using a full lattice certainly works (and I think the commutativity of join is a good requirement for edge labeling). For representing multigraphs using tag sets as labels this would require to artificially introduce an upper bound 1 (Perhaps by switching to Maybe (Set a)). I'm worried about the interpretation of that though: Such a bound would stand for the universe of all possible tags and thus imply that there are potentially infinitely many edges between a pair of nodes connected with 1 in the represented multigraph.

Regular connect could just be recovered by having a default label. I don't see the need for this to be the top/1. @snowleopard could you please explain why that's needed?

@snowleopard
Copy link
Owner Author

snowleopard commented Dec 16, 2017

So with your formulation, 0 is _|_, OK, but 1 is T (top) ? In which case we can't "go past 1"?

@alpmestan Yes, that's right. The idea is that x * y corresponds to maximum possible connection between x and y, whereas x + y corresponds to minimum possible connection (i.e. no connection).

This duality between overlay and connect exists in the unlabelled version of the algebra, and I think it may be useful to preserve it, but I admit that the justification is not particularly strong.

On the one hand it is really elegant to have neutral elements for both overlay and connect, on the other hand, I don't quite see the need to have a full lattice with two bounds and both join and meet.

@boggle Yes, I agree. Ideally we'd like a better justification (than just elegance) for having a full lattice. Here are just a couple of random thoughts on this:

  • As I mentioned above, I'd like to keep (and perhaps even reinforce) the duality between + and *.
  • It may be useful to have forall k. x + y <= labelledConnect k x y <= x * y.

Alas, not convincing enough.

Such a bound would stand for the universe of all possible tags and thus imply that there are potentially infinitely many edges between a pair of nodes connected with 1 in the represented multigraph.

This was my intention, but now that you mentioned Set a I realise that we probably don't want to deal with infinite sets. Going to Maybe (Set a) doesn't look attractive.

Regular connect could just be recovered by having a default label.

You may be right. I'm a bit uncomfortable about picking a default label, which feels rather arbitrary, but I guess one can argue that all possible choices lead to isomorphic results... which seems to bring us to initial and final objects: for sets, the empty set is the initial object (and we'll use it for +) and singleton sets are (isomorphic) final objects. Any singleton set appears to be a good candidate for *. So, we've started with one sort of duality (bottom-top), and came to another one (initial-final). Interesting indeed :)

For Hask category, we would have chosen Void and () labels for + and *. Is that useful in any sense? Probably not, because the composition (.) doesn't seem to interact with + in the right way.
Would the composition of labels a and b in this case be Either a b? Doesn't seem to be idempotent.

@boggle
Copy link

boggle commented Dec 18, 2017

I've been thinking a bit more about the full lattice encoding for the Set case. What I like about this is that it leads to a information accumulation vs consolidation point of view:

  • overlay is essentially join in the lattice between the labels provided by both arguments ("union edge labels", "collect all available information"). This would treat disconnected edges missing in one argument graph/or to be created as being labeled with BOT.
  • one could define connect to be the inverse operation that is the meet between the labels provided by both arguments ("intersect edge labels", "consolidate available information"). This would treat disconnected edges missing in one argument graph/or to be created as being labeled with TOP.
  • this could perhaps be extended to vertex labels ("keep all nodes" vs "only keep shared nodes")

This seems a very useful picture and it reveals overlay to just be union/join and connect to just be intersect/meet.
Then again one could try to define these operations just atop another encoding (like the initial/final one).
It seems we're at a cross-roads with two possible encodings with different trade-offs.
What is the right way to proceed?

@snowleopard
Copy link
Owner Author

@boggle We can avoid making the choice and simply require the user to provide a semi-lattice, plus a default label that will be used for connect. For example:

-- We require additional laws for the underlying monoid: <> must be commutative and idempotent
class Monoid a => Label a where
    defaultLabel :: a

overlay :: Label b => LabelledGraph b a -> LabelledGraph b a -> LabelledGraph b a
overlay = labelledConnect mempty

connect :: Label b => LabelledGraph b a -> LabelledGraph b a -> LabelledGraph b a
connect = labelledConnect defaultLabel

This may be a temporary solution while we are still looking for the right existing abstraction. I believe this permits both interpretations (full lattice and initial-final).

The accumulation vs consolidation point of view is indeed useful for many applications, but I think the basic abstraction should be as general as possible. One can of course build the accumulation vs consolidation machinery on top of the basic one (another graph-like typeclass with an additional connect-like method?).

@boggle
Copy link

boggle commented Dec 20, 2017

Ah! That is of course possible and indeed allows exploring this space more gradually.

Might also be insightful to try to define how to encode an edge labeled graph into a node labeled graph which should be possible with a newtype, using either to distinguish between nodes used for nodes in the labeled graph, and nodes used for edges in the labeled graph, and defining connect accordingly.

@snowleopard
Copy link
Owner Author

Might also be insightful to try to define how to encode an edge labeled graph into a node labeled graph

@boggle Indeed. We can do Graph (Either b a) with vertices labelled by a and edges labelled by b. The main difficulty here is that we'll need to somehow constrain connect to disallow direct vertex-vertex and edge-edge connections, i.e. the resulting graph must be bipartite -- I've just opened #38 for this.

@anfelor
Copy link
Contributor

anfelor commented Mar 15, 2018

Hey all,

I am currently considering whether to participate in the next GSoC and so I am trying to familiarize myself with this project. There are a couple of things in this thread I didn't understand, maybe you could help my out?

  1. If Labels are a BoundedLattice, a * b would be a-{1}->b, right? But then x \/ 1 = 1 would mean that a-{x}->b + a * b = a * b , e. g. labels get deleted by unlabeled connections. This doesn't seem right to me.

  2. To have edge labels and node labels in the same graph, I am not sure it suffices to make the graph bipartite. If we define 1-{a}->2 = 1 * a + a * 2 and (1 + 2 + 3)-{a}->{2 + 3 + 4) = 1*a + 2*a + 3*a + a*2 + a*3 + a*4 , what would be the meaning of a*2, e.g. an edge from nowhere to 2? Or star a [1..10]?

  3. Using a monoid seems to be the most elegant solution, but it makes me wonder how to define something like the bipartite newtype wrapper over overlay and connect. Isn't it impossible to pattern match on mempty? So can one actually distinguish between connect and overlay in such a labeled graph?

  4. I haven't seen much from graph theory , but wouldn't labels be mostly interesting as weights for flow problems? Because then <> from Label being idempotent would mean that a-{2}->b + a-{3}->b = a-{5}->b but a-{3}->b + a-{3}->b = a-{3}->b. Maybe one would need a different type class for weights?

Thank you for your time!

@anfelor
Copy link
Contributor

anfelor commented Mar 15, 2018

  1. Or maybe this isn't a problem because one could use something like this?
newtype Distinct a = Distinct a

instance Eq Distinct where _ == _ = False
instance Monoid a => Monoid (Distinct a) where
  ...

Then <> @(Distinct a) would be idempotent even if <> @a isn't?

@snowleopard
Copy link
Owner Author

@anfelor Hey, thanks for joining the conversation!

There are a couple of things in this thread I didn't understand

This thread has become quite confusing, because so many approaches have been intertwined in discussions. My current understanding is as follows (I am planning to push some code with this soon):

We label edges with elements from a semiring (0, 1, |+|, |*|) and define the following labelled graph datatype LG e a (I'm shortening everything here for brevity):

data LG e a = E | V a | C e (LG e a) (LG e a)

overlay :: Semiring e => LG e a -> LG e a -> LG e a
overlay x y = C 0 x y

connect :: Semiring e => LG e a -> LG e a -> LG e a
connect x y = C 1 x y

Then my understanding is that for LG e a to actually satisfy graph laws, we need some additional constraint on the semiring e. My guess is something like Tropical e should work well:

instance Tropical e => Graph (LG e a) where
    Vertex (LG e a) = a
    ...

Here is a cool paper about semirings and tropical semirings and labelled graphs:

http://stedolan.net/research/semirings.pdf

@anfelor Now, that we've fixed a concrete vocabulary, do you still have the same questions? Or are some of them resolved?

For example, I think your point (1) is no longer a concern, since 1 will actually mean we've connected two vertices with a path of length zero -- of course, adding any additional parallel edges doesn't matter in this case, since the shortest length is still zero.

@anfelor
Copy link
Contributor

anfelor commented Mar 28, 2018

I'm not sure what you mean: for directed graphs you surely have a -> b /= b -> a, but a -> b == b -> a holds for undirected graphs, and may potentially be used in simplification of undirected graph expressions.

I meant it as forall. Graph g => a, b in g => a -{0}->b == b-{0}->a, but forall. Graph g => a, b in g => x /= 0 => a-{x}->b /= b-{x}->a in the general case. So if we don't distinguish between zero and other elements we might end up not being able to use this law in code.

@snowleopard
Copy link
Owner Author

I see! Yes, without distinguishing 0 we seem to lose the operator + (cannot distinguish it from ->) as well as all its laws.

@boggle
Copy link

boggle commented Apr 5, 2018

I keep coming back to think about graphs with edge weights in terms of information loss/gain (where disconnected means "no information") and what that interpretation would require from connect/overlay.

This leads me to believe that its perhaps not good to try to unify two pieces of information here:

  • wether two nodes a, b are connected
  • with which weight/label a, b are connected

As an experiment, let's assume the algebra tracks both overlay and connect but uses only a commutative monoid for weights like (Nat, 0, max).

data LG e a = E | V a | O (LG e a) (LG e a) | C e (LG e a) (LG e a)

overlay :: Monoid e => LG e a -> LG e a -> LG e a
overlay x y = O x y 

connect :: Monoid e => LG e a -> LG e a -> LG e a
connect x y = C mempty x y

This leads to a definition of weight where overlay and connect both combine any weights found but connect will ensure nodes are at least "weakly connected" (with a small weight) while overlay will not.

weight :: (Eq a, Monoid e) => LG e a -> a -> a -> Maybe e
weight (O g h) a b = (weight g a b) <> (weight h a b)
weight (C w g h) a b = (weight g a b) <> (weight h a b) <> simple
  where
    simple = if (contains g a) && (contains h b) then (Just w) else None
    contains :: Eq a => LG e a -> a -> Bool
    contains E _ = False
    contains (V a) x = x == a
    contains (O g h) x = contains g x || contains h x
    contains (C _ g h) x = contains g x || contains h x
weight _ _ _ = None

So rather than having two dual monoids (like in a semiring), the option monoid and the edge label monoid are nested.
Is it possible to achieve a similar definition for weight by using a semiring?
I'd expect at least the following behaviour:

  • overlay combines weights (does not loose edge label information) but never connects unconnected nodes
  • connect ensures nodes are connected but also combines weights (does not loose edge label information, too)

This was written with the monoid (0, max) in mind (i.e. we start from no information and move to more information).
It's also interesting to consider the dual (+Inf, min) (i.e. we start from maximum information and want to consolidate/intersect it).

@snowleopard
Copy link
Owner Author

snowleopard commented Apr 10, 2018

@boggle As far as I understand, you want O x y to be different from C 0 x y, but I don't see how this can be useful. Since you are using max as your operation, the weight represents some kind of 'connectedness' between vertices, for example, the width of the road between two cities. In this setting, the width of 0 is semantically indistinguishable from a non-existent road.

Can you give an example where the difference between O x y and C 0 x y matters?

@anfelor
Copy link
Contributor

anfelor commented Apr 11, 2018

@boggle Furthermore, in a semiring 0 doesn't need to be 0 and 1 doesn't need to be 1 (of the underlying structure). Take for example the longest distance semiring in the excellent "Fun with semirings" paper:

data LongestDistance = LDistance Int | LUnreachable | LInfinite
instance Semiring LongestDistance where
  zero = Unreachable
  one = LDistance 0
  x @+ LUnreachable = x
  LUnreachable @+ x = x
  LDistance x @+ LDistance y = LDistance (max x y)
  ...

We would then define overlay (connect l a b) (connect k a b) = connect (l + k) a b. Furthermore overlay would connect nodes with the LUnreachable tag, which is sematically equivalent to no connection at all. I think this should be a strict superset of the (Nat, 0, max) monoid, while extending it to include a sensible operation for matrix multiplication. Would that work for you?

@anfelor
Copy link
Contributor

anfelor commented Apr 11, 2018

@boggle The set of laws @snowleopard and I agreed on so far (I believe) are as follows:
A Graph is a quadruple (G, S, -{}->, e) with S an idempotent semiring and the following laws:

  1. forall. a,b,c in G and r,s in S:
  2. a -{r}-> e = e -{r}-> a = a
  3. a -{0}-> b = b -{0}-> a
  4. (a -{r}-> b) -{s}-> c = (a -{r}-> b) -{0}-> (a -{s}-> c) -{0}-> (b -{s}-> c)
  5. a -{r}-> (b -{s}-> c) = (a -{r}-> b) -{0}-> (a -{r}-> c) -{0}-> (b -{s}-> c)
  6. (a -{r}-> b) -{0}-> (a -{s}-> b) = a -{r+s}-> b

These imply:

  1. (a -{r}-> b) -{r}-> c = (a -{r}-> b) -{0}-> (a -{r}-> c) -{0}-> (b -{r}-> c) = a -{r}-> (b -{r}-> c) (Associativity of -{}->)
  2. a -{r}-> b = (a -{r}-> b) -{s}-> e = (a -{r}-> b) -{0}-> (a -{s}-> e) -{0}-> (b -{s}-> e) = (a -{r}-> b) -{0}-> a -{0}-> b (Absorption)
  3. (a -{0}-> b) -{r}-> c = (a -{0}-> b) -{0}-> (a -{r}-> c) -{0}-> (b -{r}-> c) = (a -{r}-> c) -{0}-> a -{0}-> c -{0}-> (b -{r}-> c) -{0}-> b -{0}-> c = (a -{r}-> c) -{0}-> (b -{r}-> c) (-{r}-> distributes over -{0}->)
  4. a = (a -{r}-> e) -{s}-> e = (a -{r}-> e) -{0}-> (a -{s}-> e) -{0}-> (e -{s}-> e) = a -{0}-> a (Idempotence)

That means that by using the bool semiring (0=False, 1=True, +=or, *=and) we can always recover the original algebra of graphs.

We can extend the above set of laws by demanding that every element in our semiring except 0 has a multiplicative inverse (^-1) and extend the 2. law to:

  1. a -{r}-> b = b -{r^-1}-> a with 0^-1 := 0

This fits well with some semirings like the above mentioned LongestDistance semiring:

instance Semiring LongestDistance where
  zero = LUnreachable
  one = LDistance 0
  x @+ LUnreachable = x
  LUnreachable @+ x = x
  LDistance x @+ LDistance y = LDistance (max x y) -- or alternatively 
    | (abs x) < (abs y) = LDistance y
    | otherwise = LDistance x

  ...

  x @. LUnreachable = LUnreachable
  LUnreachable @. x = LUnreachable
  LDistance x @. LDistance y = LDistance (x + y)

  inv LUnreachable = LUnreachable -- we don't want partial functions here => 0 has an inverse
  inv (LDistance x) = LDistance (-x)

Do you think that could be a useful addition? I think it could possibly help with augmented graphs; I will comment here when I have something presentable. I wish we could extend that to a ring, but unfortunately there are no idempotent rings so that is impossible without changing the algebra of graphs.

@snowleopard
Copy link
Owner Author

@anfelor Yes, this sounds good. Have you formally proved the theorems? I haven't yet, but plan to do this in the Agda repository: https://github.com/algebraic-graphs/agda.

By the way, I haven't thought about multiplicative inverses, that's pretty cool!

To have a fixed common vocabulary, I pushed a branch with a draft implementation of edge-labelled graphs: https://github.com/snowleopard/alga/blob/edge-labels/src/Algebra/Graph/Labelled.hs.

Note that I chose the class name Dioid as it is often used to refer to idempotent semirings, not only in the context of path-finding: https://ncatlab.org/nlab/show/idempotent+semiring. It is also much shorter to type compared to, say, IdempotentSemiring or TropicalSemiring.

@anfelor
Copy link
Contributor

anfelor commented Apr 11, 2018

Have you formally proved the theorems?

No, I haven't yet, but I could do it if you want.

I pushed a branch with a draft implementation
TODO: Prove the C.Graph laws

  1. + := -{0}->. Commutativity is 1.2 and Associativity 2.1 above
  2. -> := -{1}->. Identity is 1.1 and Associativity is 2.1 above
  3. -> distributes over + is 2.3 above
  4. Decomposition is either of 1.3 or 1.4 with r=s=1

I chose the class name Dioid

Sounds good!

Related: -{s}-> looks pretty ugly, maybe we should switch to a different notation? <s> or something like that (({s}, e, <s>) forms a monoid for every element s)? The law 1.4 would become:
a <r> (b <s> c) = (a <r> b) <0> (a <r> c) <0> (b <s> c)

@snowleopard
Copy link
Owner Author

snowleopard commented Apr 11, 2018

No, I haven't yet, but I could do it if you want.

@anfelor Sure, if you are familiar with Agda, feel free to send a PR to the Agda repository. Alternatively, if you'd like to prove it using your favourite prover, just share the link to the proof. A pen-and-paper proof is fine too, although it's harder to check :-)

Your proof sketch for Graph laws looks correct.

Related: -{s}-> looks pretty ugly, maybe we should switch to a different notation?

By x -{s}-> y I actually meant an arrow between x and y with s written on top of it, as in some of my earlier papers, e.g. see page 6 here: https://www.staff.ncl.ac.uk/andrey.mokhov/algebra.pdf.

This notation worked pretty well. In Latex, it is x\overset{s}{\longrightarrow}u. Is this OK?

For ASCII, your <s> notation is pretty neat, I like it. The only potential issue is that it is symmetric, but the operation is not commutative. But then <> is not commutative either and noone is bothered by this :-)

@anfelor
Copy link
Contributor

anfelor commented Apr 11, 2018

feel free to send a PR to the Agda repository

I will do that tomorrow or friday.

In Latex, it is x\overset{s}{\longrightarrow}u. Is this OK?

I prefer the shorter a\xrightarrow{s}b, but to each his own :)

@snowleopard
Copy link
Owner Author

@anfelor I've been reviewing the definition and theorems of labelled algebraic graphs and realised that we never use distributivity of dioids (in fact, I don't think we even use the dioid multiplication at all). Perhaps, we should relax the constraint and talk about a semilattice with both least and greatest bounds? It feels a bit awkward, but is most general.

It's interesting that in order to introduce transitively-closed graphs, we will need the multiplication in order to define the following axiom:

b /= empty => a <s> b + b <t> c = a <s> b + b <t> c + a <s * t> c

We'll also need multiplication and distributivity to talk about the notions of reachability, paths, etc.

@anfelor
Copy link
Contributor

anfelor commented Aug 4, 2018

@snowleopard Do you mean something like this?

data LabelledGraph {S eq} (d : Semilattice S eq) (A : Set) : Set where ...
connect, overlay, ... on LabelledGraph

data StructuredGraph {D eq} (d : Dioid D eq) (A : Set) : Set where ...
reachable, transitive-closure, ... on StructuredGraph

embed ::  {A D S eqd eqs} {d : Dioid D eqd} {s : Semilattice S eqs } -> StructuredGraph d A -> LabelledGraph s A

But I don't see yet how to embed a semilattice into a dioid. In Haskell:

class Semilattice a where
  high :: a -- For all x: x `join` high = high
  low :: a  -- For all x: x `join` low = x
  join :: a -> a -> a -- associative, idempotent, commutative
  
class Dioid a where
  zero :: a
  one :: a
  (+) :: a -> a -> a
  (*) :: a -> a -> a
  
instance (Dioid a) => Semilattice a where
  join = (+)
  low = zero
  high = ??

Now, high = one works for most dioids we have considered so far but not for all, for example the longest distance dioid on page 4 of the semirings paper. Do we add one + x = one as a dioid law and thus rule out this case?

PS: I like your implementation of x -<e>- y, I didn't know such syntax was possible in Haskell.

@anfelor
Copy link
Contributor

anfelor commented Aug 5, 2018

Sorry, messed up the link at first; I was referring to http://stedolan.net/research/semirings.pdf

@snowleopard
Copy link
Owner Author

Do you mean something like this?

@anfelor Yes: low corresponds to the lack of information about connectivity and high to knowing everything about it. For path-like interpretations, low means unreachable, and high means 0 distance. For set-like representations, low corresponds to the empty set and high to the full set. Perhaps, we don't necessarily need to require high, because it may sometimes be not defined. For example, if we consider graphs whose vertices are types, or in Haskell terms TypeReps, then there is no sensible notion of "the set of all functions" between any two types, but we could still create graphs Int + Bool and Int -<even>- Bool, allowing us to represent diagrams from the Hask category (although the representation will probably be untyped, i.e. we won't be able to typecheck our diagrams). I guess there is just no sensible definition of connect at all in this case.

Now, high = one works for most dioids we have considered so far but not for all, for example the longest distance dioid on page 4 of the semirings paper. Do we add one + x = one as a dioid law and thus rule out this case?

Aha, interesting! If we define zero = low = LUnreachable then zero + x = x and everything is fine, but then if also define one = high = Distance 0 then we no longer have the law one + x = one -- for example, Distance 0 + Infinite = Infinite. So, I guess this is another example where we might not want to dictate that the semilattice necessarily fixes a particular upper bound.

We could do the following, assuming Semilattice e => Dioid e and low = zero:

overlay :: Semilattice e => Graph e a -> Graph e a -> Graph e a
overlay = LConnect low

connect :: Dioid e => Graph e a -> Graph e a -> Graph e a
connect = LConnect one

It's a bit less symmetric, but perhaps most general. Does this sound better to you?

PS: I like your implementation of x -<e>- y, I didn't know such syntax was possible in Haskell.

Cool :) It does look nice, especially with the Hasklig font I'm using. The only (minor) issue is that -< clashes with the Arrow notation.

@anfelor
Copy link
Contributor

anfelor commented Aug 7, 2018

It's a bit less symmetric, but perhaps most general. Does this sound better to you?

It sounds like a relatively easy change to make with potential benefits to users and much better than changing the dioid laws. It would also improve the usefulness of alga as a general graph representation.

Yes: low corresponds to the lack of information about connectivity and high to knowing everything about it.

Unfortunately, the semilattice class above has a name clash with monadic join and for meet-semilattices the relationship is reversed by convention, eg low = one. But maybe use a join-semilattice with a different name?

@snowleopard
Copy link
Owner Author

@anfelor What about using the following class hierarchy?

-- Associative, commutative and idempotent |+| with identity 'zero'
class Semilattice a where
    zero :: a
    (|+|) :: a -> a -> a

-- Associative |*| that distributes over |+| with identity 'one'
class Semilattice a => Dioid a where
    one :: a
    (|*|) :: a -> a -> a

@ghost
Copy link

ghost commented Aug 13, 2018

Hi. I'm jumping into the conversation late but I wanted clarification on what is wrong with the naive labelled connect operations. I don't see how it makes sense to talk about associativity of two different operations. What makes sense to me is asking whether they commute, and they don't:

(a <x> b) <y> (c <x> d) /= (a <y> b) <x> (c <y> d)

However, they still seem to satisfy some decomposition laws:

a <x> (b <y> c) = a <x> b + a <x> c + b <y> c
(a <x> b) <y> c = a <x> b + a <y> c + b <y> c

These laws seem intuitive to me and I think they hold for labelled graphs in general although I haven't proved it. In particular, the two ways of decomposing the following expression agree:

(a <x> b) <y> (c <z> d)

@snowleopard
Copy link
Owner Author

Hi @danharaj, welcome!

I agree with you: different labels lead to different operations, so associativity makes little sense (but when the labels are the same, associativity does hold). Also, your 4-way decomposition is very nice, I'll use it in the examples/tests, thank you!

But I don't understand this remark:

what is wrong with the naive labelled connect operations

What do you mean by "naive labelled connect operations"?

As far as I'm concerned, there is nothing wrong with our current design, and it does look rather straightforward to me, one could even call it naive :-) Essentially, we just add labelled connect operations and describe how they work by postulating a few axioms. We also establish a link with unlabelled graphs by noting that overlay and connect correspond to two specific labels.

@ghost
Copy link

ghost commented Aug 13, 2018

What do you mean by "naive labelled connect operations"?

Sorry, I got tripped up reading this thread; it seemed like the approach where you have one connect operator per edge label was inadequate and an alternative formulation was sought. I call it naive because it's the very first thing one would try and hope it works :) I see now that this is a special case of the formulation you all have been discussing via the lattice of subsets of edge labels.

@snowleopard
Copy link
Owner Author

Aha, I see :) This thread did get pretty long and confusing. I guess I better open a new one, describing the chosen design, just to figure out the remaining bits. Will do that in a few days.

@snowleopard
Copy link
Owner Author

I gave a Haskell eXchange 2018 talk about edge-labelled algebraic graphs:

https://skillsmatter.com/skillscasts/12361-labelled-algebraic-graphs

It mostly builds on the abstractions we discussed in this issue, but in the end I decided to go for a different hierarchy of type classes for edge labels: Monoid => Semiring => Dioid. A working implementation can be found in Algebra.Graph.Label and Algebra.Graph.Labelled modules in the master branch.

I intend to write a blog post about this, providing further details.

@anfelor Any comments? Do we need to modify your Agda proofs? I guess all results you proved for dioids still hold, but we might have some other interesting theorems for the case when the type of labels is not a dioid, but something weaker -- a monoid or a semiring.

snowleopard added a commit that referenced this issue Nov 29, 2018
This completes most of the work on adding edge labels #17.
@snowleopard
Copy link
Owner Author

I think we can finally close this 🎉 I'm quite happy with the final design. There are a few unpolished spots, but I'll be opening separate issues for them if need be.

Many thanks to everyone for an amazing discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants