Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlays without the empty leaf at the end #103

Merged
merged 25 commits into from
Aug 10, 2018

Conversation

nobrakal
Copy link
Contributor

Hi,

As discussed in #99

This is not applying for connects because it will needs to use foldl and I don't think this is well optimized after (anyway, we don't use connects in the library code).

Note that this will drop performances of hasEdge because of a switch from the better case to the worse case (as well as hasVertex).

The rewrite rules seems to not interfere with other rules (like fusion of two maps after overlays).

@snowleopard
Copy link
Owner

Note that this will drop performances of hasEdge because of a switch from the better case to the worse case (as well as hasVertex).

This is rather annoying. So far I think this is the only aspect of benchmarking that feels to be done wrong. What do people do when benchmarking functions like elem in lists? Do they have similar issues?

@snowleopard
Copy link
Owner

This is not applying for connects because it will needs to use foldl and I don't think this is well optimized after (anyway, we don't use connects in the library code).

If possible, I'd like to have a solution that covers both overlays and connects. Instead of foldl we could also use foldr1 for connects, just like we do in non-empty graphs. Is foldr1 slower than foldr?

@nobrakal
Copy link
Contributor Author

This is rather annoying. So far I think this is the only aspect of benchmarking that feels to be done wrong. What do people do when benchmarking functions like elem in lists? Do they have similar issues?

I don't think there is a possible solution: we need to test a rather small finite set of values: thus we have to chose them.

For example: https://github.com/haskell-perf/sequences#indexing

The sequences have a size of 10005 and the index 10,100,1000 and 10000 are searched.

If possible, I'd like to have a solution that covers both overlays and connects. Instead of foldl we could also use foldr1 for connects, just like we do in non-empty graphs. Is foldr1 slower than foldr?

Yes foldr1 is slower:

foldr1 :: (a -> a -> a) -> t a -> a
foldr1 f xs = fromMaybe (errorWithoutStackTrace "foldr1: empty structure")
                (foldr mf Nothing xs)
  where
    mf x m = Just (case m of
                     Nothing -> x
                     Just y  -> f x y)

All values are encapsulated into Maybe

Indeed this seems to be a problem, since there is for example no better version for foldr1 for Data.List.NonEmpty...

@snowleopard
Copy link
Owner

For example: https://github.com/haskell-perf/sequences#indexing
The sequences have a size of 10005 and the index 10,100,1000 and 10000 are searched.

Aha, interesting. Maybe in benchmarks we should also list several points, separately, and omit the very first and last elements, which give rather unstable results. As for the regression suite, I think we should measure the worst-case time -- what do you think?

Indeed this seems to be a problem, since there is for example no better version for foldr1 for Data.List.NonEmpty...

Yes, indeed. Not sure what the right solution is here.

@nobrakal
Copy link
Contributor Author

Aha, interesting. Maybe in benchmarks we should also list several points, separately, and omit the very first and last elements, which give rather unstable results.

I will drop these very first and very lasts points you are right. I am already using several points (3 in the graphs and 1 outside for hasVertex for example).

As for the regression suite, I think we should measure the worst-case time -- what do you think?

The regression suite if for me just a warning, and thus needs to benchmark as many cases as possible. Let us after explain the result.

For example, if we take this foolish hasVertex:

hasVertex :: Eq a => a -> Graph a -> Bool
hasVertex x g = (foldg False (==x) (||) (||) g) && (not $ foldg True (/=x) (&&) (&&) g)

The worst-case runtime (a vertex not in the graph) is the same for the actual hasVertex and this one. Of course, if we test for actual vertices in the graph, this function takes 2 times longer than the actual one.

Yes, indeed. Not sure what the right solution is here.

We can leave connects as it is since the empty leaf is normally not a problem. It only bother us when we have to scale overlays.

@snowleopard
Copy link
Owner

I will drop these very first and very lasts points you are right.

👍

The regression suite if for me just a warning, and thus needs to benchmark as many cases as possible.

Yes, I think you are right. As long as false positives are not too frequent, we can live with them.

We can leave connects as it is since the empty leaf is normally not a problem.

This looks strangely asymmetric. Furthermore, NonEmpty.overlays has the same issue. So far we really only have a workaround for Algebra.Graph.overlays, which looks accidental. It's interesting to find a systematic solution that works for all these cases.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 1, 2018

Ok this one was very tricky ^^.

The idea is to use:

foldr1f :: (a -> a -> a) -> (b -> a) -> b -> [b] -> a
foldr1f k f = go
  where
    go y ys =
      case ys of
        []     -> f y
        (x:xs) -> f y `k` go x xs

which is a slightly modified foldr without an empty case and allowing fusion:

overlays :: [Graph a] -> Graph a
overlays [] = empty
overlays (x:xs) = foldr1fId overlay x xs
{-# INLINE [0] overlays #-}

{-# RULES
"overlays/map"
  forall f xs.
    overlays (map f xs) =
      case xs of
        [] -> empty
        (y:ys) -> foldr1f overlay f y ys
 #-}

(where foldr1fId k = foldr1f k id)

Strangely, even with the rewrite rules firing, the time for edges (I use edges to test the fusion) was not good as the actual one. The problem came from uncurry edge part of edges. Using a version that uses directly the tuple solved the problem.

Note also that foldr1fId is mandatory. Using foldr1f k id in overlays was not working (and I don't know why).

@@ -303,7 +306,7 @@ vertices = overlays . map vertex
-- 'edgeCount' . edges == 'length' . 'Data.List.nub'
-- @
edges :: [(a, a)] -> Graph a
edges = overlays . map (uncurry edge)
edges = overlays . map edge'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems to be unrelated to the rest of the PR? Do you really need this? Presumable, fusion doesn't care if the function you map is edge' or uncurry edge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I was too happy to find something here ^^. I will open a separate issue

@snowleopard
Copy link
Owner

snowleopard commented Aug 1, 2018

Interesting, this looks like a step in the right direction :)

foldr1f looks a bit like foldMap, where you pass a map to Monoid and the Monoid dictionary explicitly and can therefore modify both on the fly. Although, I guess foldr1f only needs a Semigroup.

A couple of things are unclear to me:

  • Instead of passing b and [b] separately, could you pass NonEmpty? Semantically this looks identical, but perhaps, performance-wise your approach is faster.

  • Why have separate rewrite rules for overlays and connects -- could we use just one foldr1fId/map?

  • Do you need to export foldr1f from the Internal module, or will we always only need foldr1fId?

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 1, 2018

Instead of passing b and [b] separately, could you pass NonEmpty?

Of course, this is optimized away after!

Why have separate rewrite rules for overlays and connects -- could we use just one foldr1fId/map?

(In the new commit) I don't know how all of this will be inlined and I canno't restrain the inling of maybe or nonEmpty.

Do you need to export foldr1f from the Internal module, or will we always only need foldr1fId?

Since I use foldr1f in my rewrite rule in Algebra.Graph, Algebra.Graph.Internal has to export it.

New benchs:

Using [("Mesh",4),("Clique",4)] as graphs

## edges

### Mesh
#### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 102.6 μs | 1.000 |
|    Alga || 105.0 μs | 0.991 |
+---------++----------+-------+

### Clique
#### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 25.29 ms | 0.999 |
| AlgaOld || 25.94 ms | 0.995 |
+---------++----------+-------+


SUMMARY:


 There was 2 ex-aequo


ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

 * Alga was 1.00 times faster than AlgaOld

@snowleopard
Copy link
Owner

Since I use foldr1f in my rewrite rule in Algebra.Graph, Algebra.Graph.Internal has to export it.

Indeed -- I didn't notice that the rewrite rules are a bit different: they use overlay and connect internally :)

Does performance stay the same if we add separate new functions overlaysMap and connectsMap and simpler rewrite rules? I prefer all rewrite rules to be very simple, since they are not normal Haskell code and are not highlighted, checked by linters etc.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 2, 2018

First of all, apologies for the benchmarks in my last comment, it is faulty, please ignore it (I was using an other branch of my repo, with the same edges).

Now two things:

  • The interesting part is:
connects :: [Graph a] -> Graph a
connects = concatg connect

concatg :: (Graph a -> Graph a -> Graph a) -> [Graph a] -> Graph a
concatg combine = maybe empty (foldr1fId combine) . nonEmpty
{-# INLINE [0] concatg #-}

{-# RULES
 "concatg/map"
  forall c f xs.
    concatg c (map f xs) = concatgMap c f xs
  #-}

-- | Utilitary function for rewrite rules of 'overlays' and 'connects'
concatgMap :: (Graph a -> Graph a -> Graph a) -> (b -> Graph a) -> [b] -> Graph a
concatgMap combine f = maybe empty (foldr1f combine f) . nonEmpty

So there is now only one (and pretty simple) rewrite rules which is better !

  • There was still a little drop (quoting the last run of the regression suite: edges: 1.24 (bad)). So I have investigated, and found that GHC created (with an other name)
edge' :: (a,a) -> Graph a
edge' = uncurry edge
{-# INLINE edge' #-}

but of course without the INLINE pragma. So the problem is that edge' is inlined by default in the current edges but not in the new one. So adding the function by hand and the pragma solved the problem.

edges' does not improve anything alone, but I think it allows more things to happen when used with the rewrite rules:

Comparing this actual PR with master:

Using [("Mesh",4),("Clique",4)] as graphs

## edges

### Mesh
#### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 103.1 μs | 0.999 |
| AlgaOld || 121.7 μs | 0.999 |
+---------++----------+-------+

### Clique
#### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 25.30 ms | 0.994 |
| AlgaOld || 27.77 ms | 0.991 |
+---------++----------+-------+


SUMMARY:

 * Alga was the fastest 1 times

 There was 1 ex-aequo


ABSTRACT:
(Based on an average of the ratio between largest benchmarks)

 * Alga was 1.14 times faster than AlgaOld

@snowleopard
Copy link
Owner

So there is now only one (and pretty simple) rewrite rules which is better !

That's great!

So I have investigated, and found that GHC created (with an other name)

Could this be common expression elimination, mentioned by Simon in your ticket? If we use uncurry edge multiple times, perhaps GHC tries to factor it out. But then it doesn't inline it? Sounds like a bug to me...

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 3, 2018

Could this be common expression elimination, mentioned by Simon in your ticket?

I think yes.
Here is the -ddump-simpl concerned part of this PR without edge':

-- RHS size: {terms: 13, types: 18, coercions: 0, joins: 0/0}
g1_rhwJ :: forall a. (a, a) -> Graph a
[GblId, Arity=1, Caf=NoCafRefs, Str=<L,U(1*U,1*U)>m4]
g1_rhwJ
  = \ (@ a_ad7M) (p_ifyt :: (a_ad7M, a_ad7M)) ->
      Algebra.Graph.Connect
        @ a_ad7M
        (Algebra.Graph.Vertex
           @ a_ad7M (case p_ifyt of { (x_ifyw, ds_ifyx) -> x_ifyw }))
        (Algebra.Graph.Vertex
           @ a_ad7M (case p_ifyt of { (ds_ifyB, y_ifyC) -> y_ifyC }))

Rec {
-- RHS size: {terms: 15, types: 24, coercions: 0, joins: 0/0}
Algebra.Graph.path1 [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
  :: forall a. (a, a) -> [(a, a)] -> Graph a
[GblId, Arity=2, Caf=NoCafRefs, Str=<L,U(U,U)><S,1*U>]
Algebra.Graph.path1
  = \ (@ a_ad7M)
      (ww2_s8re :: (a_ad7M, a_ad7M))
      (ww3_s8rf :: [(a_ad7M, a_ad7M)]) ->
      case ww3_s8rf of {
        [] -> g1_rhwJ @ a_ad7M ww2_s8re;
        : x_a4RW xs_a4RX ->
          Algebra.Graph.Overlay
            @ a_ad7M
            (g1_rhwJ @ a_ad7M ww2_s8re)
            (Algebra.Graph.path1 @ a_ad7M x_a4RW xs_a4RX)
      }
end Rec }

-- RHS size: {terms: 10, types: 19, coercions: 0, joins: 0/0}
edges :: forall a. [(a, a)] -> Graph a
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=<S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 50 10}]
edges
  = \ (@ a_ad7M) (x_X7Kg :: [(a_ad7M, a_ad7M)]) ->
      case x_X7Kg of {
        [] -> Algebra.Graph.Empty @ a_ad7M;
        : a1_afx9 as_afxa -> Algebra.Graph.path1 @ a_ad7M a1_afx9 as_afxa
      }

So g1_rhwJ is the generated function corresponding to uncurry edge.

Now with edge'

Rec {
-- RHS size: {terms: 33, types: 48, coercions: 0, joins: 0/0}
Algebra.Graph.path1 [InlPrag=NOUSERINLINE[0], Occ=LoopBreaker]
  :: forall a. (a, a) -> [(a, a)] -> Graph a
[GblId, Arity=2, Caf=NoCafRefs, Str=<L,U(1*U,1*U)><S,1*U>]
Algebra.Graph.path1
  = \ (@ a_a9cz)
      (ww2_idnB :: (a_a9cz, a_a9cz))
      (ww3_idnC :: [(a_a9cz, a_a9cz)]) ->
      case ww3_idnC of {
        [] ->
          Algebra.Graph.Connect
            @ a_a9cz
            (Algebra.Graph.Vertex
               @ a_a9cz (case ww2_idnB of { (x_ic6L, ds_ic6M) -> x_ic6L }))
            (Algebra.Graph.Vertex
               @ a_a9cz (case ww2_idnB of { (ds_ic6Q, y_ic6R) -> y_ic6R }));
        : x_idnH xs_idnI ->
          Algebra.Graph.Overlay
            @ a_a9cz
            (Algebra.Graph.Connect
               @ a_a9cz
               (Algebra.Graph.Vertex
                  @ a_a9cz (case ww2_idnB of { (x1_ic6L, ds_ic6M) -> x1_ic6L }))
               (Algebra.Graph.Vertex
                  @ a_a9cz (case ww2_idnB of { (ds_ic6Q, y_ic6R) -> y_ic6R })))
            (Algebra.Graph.path1 @ a_a9cz x_idnH xs_idnI)
      }
end Rec }

-- RHS size: {terms: 10, types: 19, coercions: 0, joins: 0/0}
edges :: forall a. [(a, a)] -> Graph a
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=<S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 50 10}]
edges
  = \ (@ a_a901) (x_ic5N :: [(a_a901, a_a901)]) ->
      case x_ic5N of {
        [] -> Algebra.Graph.Empty @ a_a901;
        : a1_ic6Z as_ic70 -> Algebra.Graph.path1 @ a_a901 a1_ic6Z as_ic70
      }

Here edge' got inlined :)

Sounds like a bug to me...

I don't know if we always want uncurry edge to be always inlined...

@snowleopard
Copy link
Owner

I don't know if we always want uncurry edge to be always inlined...

Well, the programmer (we) decided to inline this function, manually, but GHC effectively undid the inlining by factoring it out into a new function, which reduced the performance of our program.

How is this not a bug? I guess this might happen quite often with CSE, so what shall programmers do to prevent this? Introduce functions like edge' with an INLINE pragma everywhere? This sounds like a terrible solution.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 3, 2018

Well, the programmer (we) decided to inline this function,

Do we ?

overlays . map (uncurry edge)

uncurry edge is not inlined here, or I am missing something ?

Here GHC only factorized up uncurry edge and after chose not to inline it once the rewrite rule was applied because uncurry edge is now concatgMap.

@snowleopard
Copy link
Owner

Do we ?

Yes, we wrote it inlined as uncurry edge instead of factored edge' and edge' = uncurry edge :)

uncurry edge is not inlined here, or I am missing something ?

It is inlined! Compared to the same expression with edge'.

because uncurry edge is now concatgMap

I don't understand what you mean here, but it looks like we are mixing multiple separate concerns.

Let me double check: you are saying that simply by going from

... overlays . map (uncurry edge) ...

to

... overlays . map edge' ...

edge' :: (a,a) -> Graph a
edge' = uncurry edge
{-# INLINE edge' #-}

we get a performance increase. Is this correct?

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 3, 2018

Is this correct?

Yes, but remember the rewrite rule !

overlays . map (uncurry edge)

is changed in

maybe empty (foldr1f overlay edge') . nonEmpty

Without the rewrite rule there is no change in performances :)

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 4, 2018

@snowleopard The problem is mainly on little graphs, and my solution wasn't one :(

I removed the INLINE pragma and all seems fine... I don't know what happened here !

Can we use the same approach in non-empty graphs too?

I tried but sadly, NonEmpty.map has no INLINE pragma, that is it can be inlined at any time and particularly before the rewrite rules has a chance to fire. Moreover NonEmpty.edges is already implemented without NonEmpty.overlays, so it will be an improvement only for connects composed with a map.

Does it make sense for us to add newtypes Overlay and Connect for graphs?

That could be great, Sum and Product seems natural for the algebra :)
mconcat is overlays and connects!

But a problem comes with foldMap since:

    foldMap :: Monoid m => (a -> m) -> t a -> m
    {-# INLINE foldMap #-}
    -- This INLINE allows more list functions to fuse. See Trac #9848.
    foldMap f = foldr (mappend . f) mempty

So it add an empty leaf at the end... And I don't know if there is way to use rewrite rules since there is an INLINE pragma!

@snowleopard
Copy link
Owner

snowleopard commented Aug 4, 2018

I tried but sadly, NonEmpty.map has no INLINE pragma, that is it can be inlined at any time and particularly before the rewrite rules has a chance to fire. Moreover NonEmpty.edges is already implemented without NonEmpty.overlays, so it will be an improvement only for connects composed with a map.

Which NonEmpty.map are you referring to? I don't understand what the problem is. If the issue is with the NonEmpty argument of foldr1f then perhaps we should revert back to passing the head and the list separately as arguments instead of using NonEmpty.

You seem to have found a generally useful primitive, and I'd like to apply it uniformly everywhere where we do folding of lists with graphs and need to avoid empty leaves. Let's figure out how to uniformly handle both Algebra.Graph and Algebra.Graph.NonEmpty, without losing performance compared to conventional solutions using foldr.

But a problem comes with foldMap since:

I didn't mean we should be using the standard foldMap -- as you rightly point out, it adds an unnecessary mempty leave. But I think it may be possible to change the current implementation of foldr1f into something more foldMap-like where we pass the combining function via the monoid (or even better semigroup) dictionary.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 4, 2018

Which NonEmpty.map are you referring to? I don't understand what the problem is. If the issue is with the NonEmpty argument of foldr1f then perhaps we should revert back to passing the head and the list separately as arguments instead of using NonEmpty.

About Data.List.NonEmpty.map. The rewrite rule target an expression like:

concatg c (map f xs)

But because Data.List.NonEmpty.map has no pragma concerning its inlining, it can be inlined at any time, and thus prevent the rule to fire. That is not the case with Ghc.Base.map which have {-# NOINLINE [0] map #-}

Let's figure out how to uniformly handle both Algebra.Graph and Algebra.Graph.NonEmpty, without losing performance compared to conventional solutions using foldr.

Since edges is already not implemented in term of overlays, the implementation can be switched to use foldr1f. But a pretty solution like in Algebra.Graph seems complicated

But I think it may be possible to change the current implementation of foldr1f into something more foldMap-like

Totally, I will update the PR soon :)

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 4, 2018

Ok, this don't build anymore with old GHC, but I wanted advices:

newtype Overlaying a = Overlaying {getOverlaying :: Graph a}
    deriving (Foldable, Functor, Show, Traversable)

instance Semigroup (Overlaying a) where
    (<>) = coerce (overlay :: Graph a -> Graph a -> Graph a)
    stimes = stimesIdempotentMonoid

instance Monoid (Overlaying a) where
    mempty = Overlaying empty

newtype Connecting a = Connecting {getConnecting :: Graph a}
    deriving (Foldable, Functor, Show, Traversable)

instance Semigroup (Connecting a) where
    (<>) = coerce (connect :: Graph a -> Graph a -> Graph a)
    stimes = stimesMonoid

instance Monoid (Connecting a) where
     mempty = Connecting empty

-- [...]

edges :: [(a, a)] -> Graph a
edges = overlays . map (uncurry edge)

overlays :: [Graph a] -> Graph a
overlays = getOverlaying . sconcatM . coerce
{-# INLINE [0] overlays #-}

connects :: [Graph a] -> Graph a
connects = getConnecting . sconcatM . coerce
{-# INLINE [0] connects #-}

{-# RULES
 "overlays/map" forall f xs. overlays (map f xs) = getOverlaying (sconcatMap (coerce . f) xs);
 "connects/map" forall f xs. connects (map f xs) = getConnecting (sconcatMap (coerce . f) xs)
  #-}

sconcatM :: Monoid m => [m] -> m
sconcatM = maybe mempty sconcat . nonEmpty

-- | Utilitary function for rewrite rules of 'overlays' and 'connects'
sconcatMap :: Monoid m => (b -> m) -> [b] -> m
sconcatMap f = maybe mempty (sconcatf f) . nonEmpty

-- | Function allowing fusion between 'sconcat' and a composed 'map'
sconcatf :: Semigroup a => (b -> a) -> NonEmpty b -> a
sconcatf f (a :| as) = go a as
  where
    go b (c:cs) = f b <> go c cs
    go b []     = f b
{-# INLINABLE sconcatf #-}

Is this approximately what you had in mind ?

The rewrite rules are becoming a bit more complicated sadly.

@snowleopard
Copy link
Owner

But because Data.List.NonEmpty.map has no pragma concerning its inlining, it can be inlined at any time, and thus prevent the rule to fire. That is not the case with Ghc.Base.map which have {-# NOINLINE [0] map #-}

Interesting. Do you think this may be a mistake in Data.List.NonEmpty? Or is it intentional?

What I don't understand is where the Data.List.NonEmpty.map will appear. Referring to the latest code, I think this is how we will rewrite overlays1, connects1, vertices1 and edges1:

overlays1 :: NonEmpty (NonEmptyGraph a) -> NonEmptyGraph a
overlays1 = getOverlaying . sconcat . coerce 

connects1 :: NonEmpty (NonEmptyGraph a) -> NonEmptyGraph a
connects1 = getConnecting . sconcat . coerce

vertices1 :: NonEmpty a -> NonEmptyGraph a
vertices1 = getOverlaying . sconcatf vertex . coerce

edges1 :: NonEmpty (a, a) -> NonEmptyGraph a
edges1 = getOverlaying . sconcatf (uncurry edge) . coerce

There is no Data.List.NonEmpty.map!

But a pretty solution like in Algebra.Graph seems complicated

I guess you mean we can't write simply vertices1 = overlays1 . Data.List.NonEmpty.map vertex, right? Indeed. Here, if map is inlined too early we will not be able to apply our rule to rewrite this into sconcatf. What about trying fmap instead of Data.List.NonEmpty.map?

Is this approximately what you had in mind ?

Yes! I would prefer to later rename Overlaying and Connecting to Overlay and Connect, respectively, but we can't do this while we are exporting Overlay and Connect constructors. Also, I wonder if it's possible to define these newtypes somehow generally, so we don't need to redefine them for AdjacencyMap etc.?

@snowleopard
Copy link
Owner

By the way, are you sure these rules can match NonEmpty.map?

 "overlays/map" forall f xs. overlays (map f xs) = getOverlaying (sconcatMap (coerce . f) xs);
 "connects/map" forall f xs. connects (map f xs) = getConnecting (sconcatMap (coerce . f) xs)

Presumably, the map here refers to Data.List.map, so it will never match Data.List.NonEmpty.map.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 4, 2018

Interesting. Do you think this may be a mistake in Data.List.NonEmpty? Or is it intentional?

I don't know at all :) It does not seem to be intentional... There is no comment concerning the inlining.

What I don't understand is where the Data.List.NonEmpty.map will appear [...]

See the last paragraph.

I guess you mean we can't write simply vertices1 = overlays1 . Data.List.NonEmpty.map vertex, right? Indeed. Here, if map is inlined too early we will not be able to apply our rule to rewrite this into sconcatf. What about trying fmap instead of Data.List.NonEmpty.map?

fmap is even worse, rewrite rules doesn't work well with class function !

I would prefer to later rename Overlaying and Connecting to Overlay and Connect,

Totally agree there are very bad names ^^

I wonder if it's possible to define these newtypes somehow generally

Good question, maybe with the Graph class ? I will investigate.

By the way, are you sure these rules can match NonEmpty.map?

I think they can't because NonEmpty.map will be inlined... So you are right, we can implement edges1 as you said. But we cannot provide a general rule that will automatically merge overlays1 and a NonEmpty.map, as we are trying to do with Algebra.Graph

@snowleopard
Copy link
Owner

I don't know at all :) It does not seem to be intentional... There is no comment concerning the inlining.

I suggest you try to find this out :) Perhaps, just raise a ticket, explaining the problem. You could use the classic map fusion example -- one can't write a rule for map f . map g = map (f . g) for non-empty lists without the {-# NOINLINE [0] map #-} pragma.

Good question, maybe with the Graph class ? I will investigate.

Yes, it might work!

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 4, 2018

Yes, it might work!

Indeed something like:

newtype Overlaying a = Overlaying {getOverlaying :: a}

instance Graph a => Semigroup (Overlaying a) where
    (<>) = coerce (overlay :: a -> a -> a)
    stimes = stimesIdempotentMonoid

instance Graph a => Monoid (Overlaying a) where
    mempty = Overlaying empty

-- [...]

 overlays :: Graph g => [g] -> g
 overlays = getOverlaying . sconcatM . coerce

typecheck. Thus it might certainly work ^^.

I never played with the Graph class. How it is interacting with other modules ? Can we use definitions in Algebra.Graph.Class in other modules ?

@snowleopard
Copy link
Owner

@nobrakal Nice!

Can we use definitions in Algebra.Graph.Class in other modules ?

That's a problem, actually. I did quite a lot of work to decouple Algebra.Graph from the type class, so I would prefer not to re-introduce a dependency on it...

For now, I suggest we use the simple version, without any newtypes. But when we make the Graph data type abstract (#95), we will come back to this idea.

@nobrakal
Copy link
Contributor Author

nobrakal commented Aug 8, 2018

@snowleopard I think I found the solution, and without rewrite rules :)

The main function is

foldr1Safe :: (a -> a -> a) -> [a] -> Maybe a
foldr1Safe f = foldr mf Nothing
  where
    mf x m = Just (case m of
                        Nothing -> x
                        Just y  -> f x y)
{-# INLINE foldr1Safe #-}

So, this is how is implemented foldr1 but without the fromMaybe (error ....) part.

We now have

edges :: [(a, a)] -> Graph a
edges = overlays . map (uncurry edge)

overlays :: [Graph a] -> Graph a
overlays = concatg overlay

concatg :: (Graph a -> Graph a -> Graph a) -> [Graph a] -> Graph a
concatg f = fromMaybe empty . foldr1Safe f

And for non-empty graphs:

edges1 :: NonEmpty (a, a) -> NonEmptyGraph a
edges1  = overlays1 . fmap (uncurry edge)

overlays1 :: NonEmpty (NonEmptyGraph a) -> NonEmptyGraph a
overlays1 (x :| xs) = maybe x (overlay x) $ foldr1Safe overlay xs

Benchs:

edges: 0.93 (OK)
edges1: 1.03 (OK)

So what happened ?

foldr1 is just a foldr with values encapsulated in Maybe, so it merge perfectly.

The only trick is to inline foldr1Safe so, for a non-empty list, the fmap can be merged.
(because map f ~(a :| as) = f a :| fmap f as).

@snowleopard
Copy link
Owner

@nobrakal This is awesome! Much cleaner solution :-)

One inconsistency is that you use concatg in one module, but not in the other. Perhaps you need to add concatg1 for consistency?

@nobrakal
Copy link
Contributor Author

One inconsistency is that you use concatg in one module, but not in the other. Perhaps you need to add concatg1 for consistency?

You are totally right, I have added a couple of comments too, to remind the standard equivalent behind concatg and concatg1 !

@snowleopard
Copy link
Owner

@nobrakal Great! I'm happy to merge this PR if you think it's ready.

@nobrakal
Copy link
Contributor Author

For me all is green :)

@snowleopard snowleopard merged commit 8a5c9d9 into snowleopard:master Aug 10, 2018
@snowleopard
Copy link
Owner

Merged -- thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants