Benchmark hasEdge #84

nobrakal · 2018-06-21T11:34:57Z

Hi,

This is just for the record:

I saw: https://github.com/snowleopard/alga/blob/master/src/Algebra/Graph.hs#L449 :

-- TODO: Benchmark to see if this implementation is faster than the default
-- implementation provided by the ToGraph type class.

So here is the benchmarks:

Alga use the the ToGraph definition, AlgaOld is the current implementation:

hasEdge

Description: Test if the given edge is in the graph (with arguments both in the graph and not in the graph (where applicable))

Mesh

	1	10	100	1000
Alga	30.38 ns	783.9 ns	11.84 μs	270.4 μs
AlgaOld	385.7 ns	2.330 μs	19.89 μs	219.1 μs

Clique

	1	10	100	1000
Alga	30.28 ns	2.733 μs	1.758 ms	870.8 ms
AlgaOld	384.7 ns	6.709 μs	588.7 μs	94.03 ms

SUMMARY:

Alga was the fastest 14 times
AlgaOld was the fastest 12 times

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

AlgaOld was 5.30 times faster than Alga

As the ToGraph option is faster on small graphs, I tried a combination of the two:

 hasEdge :: Ord a => a -> a -> Graph a -> Bool
hasEdge s t = (\g -> case foldg e v o c g of (_, _, r) -> r) . induce (`elem` [s, t])
   where
     e                             = (False   , False   , False                 )
     v x                           = (x == s  , x == t  , False                 )
     o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
     c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

Which sadly is not better:

hasEdge

Description: Test if the given edge is in the graph (with arguments both in the graph and not in the graph (where applicable))

Mesh

	1	10	100	1000
Alga	61.87 ns	1.545 μs	18.95 μs	210.9 μs
AlgaOld	362.6 ns	2.288 μs	19.88 μs	215.3 μs

Clique

	1	10	100	1000
Alga	62.05 ns	5.046 μs	541.8 μs	123.9 ms
AlgaOld	362.1 ns	6.668 μs	560.4 μs	105.9 ms

SUMMARY:

Alga was the fastest 10 times
AlgaOld was the fastest 3 times

There was 13 ex-aequo

ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

AlgaOld was 1.07 times faster than Alga

The text was updated successfully, but these errors were encountered:

snowleopard · 2018-06-21T16:03:18Z

Interesting, thanks! I don't fully understand what you benchmarked, can you show the definition you used for the Alga (new)?

Could it be that we should specialise ToGraph code to Graph or simply copy the code?

nobrakal · 2018-06-21T21:45:21Z

Sorry if I wasn't clear.

For the FIRST table, I used (copy/pasted from ToGraph)

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False                 )
        v x                           = (x == s  , x == t  , False                 )
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

For the SECOND, I used

hasEdge :: Ord a => a -> a -> Graph a -> Bool
hasEdge s t = (\g -> case foldg e v o c g of (_, _, r) -> r) . induce (`elem` [s, t])
   where
     e                             = (False   , False   , False                 )
     v x                           = (x == s  , x == t  , False                 )
     o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
     c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

snowleopard · 2018-06-22T21:21:19Z

Wow, that's pretty surprising to me. So, essentially, on clique we get a 10x slowdown, and the only difference for clique is the term xs && yt.

What about making the third field (or maybe all of them) of the tuple strict? Perhaps, we are accumulating huge unevaluated trees of Boolean expressions in the process?

snowleopard · 2018-06-22T21:25:44Z

One simple tweak could probably be as follows:

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False)
        v x                           = (x == s  , x == t  , False)
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xst || yst)
        c (_, _, True)  (_, _, _)     = (True, True, True)
        c (_, _, _)     (_, _, True)  = (True, True, True)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt)

This essentially forces the third field for the connect case. But we could make it more elegant I'm sure.

nobrakal · 2018-06-26T09:42:42Z

Sorry for taking so long to answer. Your implementation is faster but not enough:

Clique

	1000
Alga	692.3 ms
AlgaOld	94.69 ms

Note that this is certainly the fault of edges. I have forget it, but I am running benchmarks using edges, and here not clique. For example, with the clique from alga:

Clique

	1000
ToGraph	58.46 μs
ForcingValues	61.72 μs
Actual	48.52 μs

Anyway, one advantage of the ToGraph function is that it require only an Eq instance, and not an Ord one like the actual alga is requiring (because of the use of isSubgraphOf

snowleopard · 2018-06-26T16:03:33Z

Note that this is certainly the fault of edges.

I don't understand what you mean: both algorithms are working on exactly the same input, right? It's not the fault of the input that one algorithm takes 7x longer than another :)

nobrakal · 2018-06-26T17:19:27Z

Mmh, yes your are right, I wanted to say to we must not forget that we are only using a very big stack of overlayed edges from edges :)

nobrakal · 2018-06-27T11:26:09Z

I spent some time trying to improve your implementation, and I think that it is maybe not worth spending time here, the actual implementation is faster everywhere (on real graphs made using edges and on using alga's clique).

The only improvement that can be made I think is something like:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) = let !isInLeft = hasVertex u x
                                 !isInRight = hasVertex v y
                              in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)

    hasEdge' _ = False

Because it drop the Ord Constraint.

It can also be great to define hasLoop, which will be faster than hasEdge x x:

hasLoop x = hasLoop' . induce (==x)
  where
    hasLoop' (Overlay x y) = hasLoop' x || hasloop' y
    hasLoop' Connect{} = True
    hasLoop' _ = False

Which I think is working because induce remove Empty leaves.

snowleopard · 2018-06-27T15:23:29Z

@nobrakal I don't think your hasLoop is correct, it will return True for Connect Empty Empty :) Unless, of course, you rely on the feature of (some) implementations of induce that do constant folding to eliminate Empty whenever possible.

nobrakal · 2018-06-27T15:32:23Z

Yep totally, I was thinking with Algebra.Graph.induce which throw away empty leaves

snowleopard · 2018-06-27T15:38:57Z

@nobrakal Right, I see. I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :) What is the most efficient implementation?

nobrakal · 2018-06-27T16:24:02Z

I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :)

I was also ^^.

The actual one is the fastest, with graphs constructed with edges or with clique.
But a somewhat better version is:

hasEdge :: Eq a => a -> a -> Graph a -> Bool hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where 
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y 
    hasEdge' (Connect x y) = 
      let !isInLeft = hasVertex u x 
           !isInRight = hasVertex v y 
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)
    hasEdge' _ = False

This improve a bit the time of the function (it is about 1.07 times faster), and mostly drop the Ord instance required on vertices because of isSubgraphOf in the current implementation. It is now only requiring Eq.
The "naive" implementation of hasEdge' works because the induced subgraph is very tiny thanks to the absence of Empty leaves

snowleopard · 2018-06-27T16:30:14Z

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

I don't mind either, but I won't find time to do it myself in the next couple of weeks, so feel free to do it :-)

OK, let's switch to your improved implementation -- it's not immediately obvious it is correct but it actually looks quite nice. Please send a PR!

nobrakal · 2018-06-28T11:15:31Z

I was going to send a PR, and while verifying my results, I benchmarked to hasEdge' function alone. It is quite better with graphs constructed with edges, but slower with clique: Here is the detailed benchmarks:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge'
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) =
      let !isInLeft = hasVertex u x
          !isInRight = hasVertex v y
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (isInRight && hasEdge' y)
    hasEdge' _ = False

So without the suffixed induce. It is really faster hand drop the Ord instance:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.98 ns | 1.000 |
| AlgaOld || 96.58 ms | 0.966 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 9.132 ms | 1.000 |
| AlgaOld || 96.28 ms | 0.984 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 20.61 ms | 1.000 |
| AlgaOld || 98.32 ms | 0.980 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 19.01 ms | 0.999 |
| AlgaOld || 92.89 ms | 0.982 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 26.18 ms | 0.999 |
| AlgaOld || 88.50 ms | 0.975 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 18.12 ms | 0.997 |
| AlgaOld || 91.48 ms | 0.983 |
+---------++----------+-------+

The first three tested edges are edges in the graph, the last three are not in the graph.
Basically, in addition to being faster to find edges not in the graph, it is really faster to find edges "in the front" of the graph.

Sadly this is "less" true for graph constructed using clique, since it is slower for edges in the graph but "at the end", but faster in other cases:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.20 ns | 1.000 |
| AlgaOld || 46.53 μs | 1.000 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.41 μs | 1.000 |
|    Alga || 2.486 ms | 1.000 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.29 μs | 1.000 |
| AlgaOld || 46.69 μs | 1.000 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.20 μs | 1.000 |
| AlgaOld || 45.93 μs | 0.999 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.34 μs | 0.998 |
| AlgaOld || 47.13 μs | 1.000 |
+---------++----------+-------+

snowleopard · 2018-06-29T22:36:37Z

Wow, what happens here?

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

It's 100 times slower here! Could you investigate why?

nobrakal · 2018-06-30T10:03:26Z

The clique is in the form Connect 0 (Connect 1 (Connect 2)), so because I run hasVertex for every Connect node, this take very long time if we search an edge "at then end".

That is why it works great with #80 representation ^^
I suspect that it will be great fo "balanced" graphs, with similar of nodes on each part of a "Connect"...

snowleopard · 2018-06-30T10:09:21Z

Ah, of course -- the version that calls hasVertex actually has quadratic complexity. That's why I used triples in my implementation -- to memoize the invocations of hasVertex.

nobrakal · 2018-06-30T17:24:42Z

I tried to do some memoization, but nothing great :p

So do you think we can approach the time of the version using hasVertex but using a better memoization ? Or should we just stick to the previous version (here #84 (comment) ).

Or even propose the two versions ? The hasVertex solution is still faster with graph constructed with edges and by far

snowleopard · 2018-06-30T17:34:34Z

So do you think we can approach the time of the version using hasVertex but using a better memoization ?

Well, you are comparing an O(n) implementation against an O(n^2) one. Of course the latter will always lose on large benchmarks.

Or even propose the two versions ?

I don't think there is any point in including a quadratic version into the library when there is a linear implementation. Think why the quadratic version manages to be faster on some examples -- perhaps that will help you figure out how to improve the linear version too!

nobrakal · 2018-07-01T15:23:40Z

Think why the quadratic version manages to be faster on some examples

It runs hasVertex only whenConnect is encountered, and with a graph constructed with edges, Connect is always like Connect (Vertex a) (Vertex b). So hasVertex is fast :)

I spent again some time trying to find a better version, but didn't find anything, so maybe just dropping the Ord instance can be great for now (as discussed in #84 (comment) )

snowleopard · 2018-07-02T00:29:05Z

@nobrakal Sorry, I'm not sure which implementation you refer to. Perhaps, just create a PR and we'll discuss it there :)

nobrakal · 2018-08-31T15:34:32Z

I think this is closed with your super version of hasEdge ?

snowleopard · 2018-08-31T16:52:49Z

@nobrakal Yes, thank you. I think we can close this.

This was referenced Jun 28, 2018

Has loop #86

Merged

Thought: We sometimes don't need to distinct Overlay and Connect #80

Open

nobrakal mentioned this issue Jul 2, 2018

Better hasEdge for Algrebra.Graph #88

Merged

snowleopard closed this as completed Aug 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark hasEdge #84

Benchmark hasEdge #84

nobrakal commented Jun 21, 2018 •

edited

Loading

snowleopard commented Jun 21, 2018

nobrakal commented Jun 21, 2018

snowleopard commented Jun 22, 2018

snowleopard commented Jun 22, 2018 •

edited

Loading

nobrakal commented Jun 26, 2018 •

edited

Loading

snowleopard commented Jun 26, 2018 •

edited

Loading

nobrakal commented Jun 26, 2018

nobrakal commented Jun 27, 2018 •

edited

Loading

snowleopard commented Jun 27, 2018

nobrakal commented Jun 27, 2018 •

edited

Loading

snowleopard commented Jun 27, 2018

nobrakal commented Jun 27, 2018

snowleopard commented Jun 27, 2018

nobrakal commented Jun 28, 2018

snowleopard commented Jun 29, 2018

nobrakal commented Jun 30, 2018

snowleopard commented Jun 30, 2018

nobrakal commented Jun 30, 2018

snowleopard commented Jun 30, 2018 •

edited

Loading

nobrakal commented Jul 1, 2018

snowleopard commented Jul 2, 2018

nobrakal commented Aug 31, 2018

snowleopard commented Aug 31, 2018

Benchmark hasEdge #84

Benchmark hasEdge #84

Comments

nobrakal commented Jun 21, 2018 • edited Loading

hasEdge

Mesh

Clique

hasEdge

Mesh

Clique

snowleopard commented Jun 21, 2018

nobrakal commented Jun 21, 2018

snowleopard commented Jun 22, 2018

snowleopard commented Jun 22, 2018 • edited Loading

nobrakal commented Jun 26, 2018 • edited Loading

Clique

Clique

snowleopard commented Jun 26, 2018 • edited Loading

nobrakal commented Jun 26, 2018

nobrakal commented Jun 27, 2018 • edited Loading

snowleopard commented Jun 27, 2018

nobrakal commented Jun 27, 2018 • edited Loading

snowleopard commented Jun 27, 2018

nobrakal commented Jun 27, 2018

snowleopard commented Jun 27, 2018

nobrakal commented Jun 28, 2018

snowleopard commented Jun 29, 2018

nobrakal commented Jun 30, 2018

snowleopard commented Jun 30, 2018

nobrakal commented Jun 30, 2018

snowleopard commented Jun 30, 2018 • edited Loading

nobrakal commented Jul 1, 2018

snowleopard commented Jul 2, 2018

nobrakal commented Aug 31, 2018

snowleopard commented Aug 31, 2018

nobrakal commented Jun 21, 2018 •

edited

Loading

snowleopard commented Jun 22, 2018 •

edited

Loading

nobrakal commented Jun 26, 2018 •

edited

Loading

snowleopard commented Jun 26, 2018 •

edited

Loading

nobrakal commented Jun 27, 2018 •

edited

Loading

nobrakal commented Jun 27, 2018 •

edited

Loading

snowleopard commented Jun 30, 2018 •

edited

Loading