New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic performance issues #32

Open
jwaldmann opened this Issue Feb 5, 2016 · 14 comments

Comments

Projects
None yet
4 participants
@jwaldmann

jwaldmann commented Feb 5, 2016

Try this:

import Text.PrettyPrint
import System.Environment
main = do
  [ s ] <- getArgs
  print $ iterate ( \ x -> fsep [ text "a" , x <+> text "b" ] ) empty !! read s

on my machine:

input | runtime in sec

100   |  0.1
200   |  1.1
300   |  4.4
400   | 11.4

This testcase is simplified from https://ghc.haskell.org/trac/ghc/ticket/7666
I think this bug (gut feeling - some quadratic behaviour) has been sitting there for a long time.
I do think this is serious. I think it also hurts haddock.

pretty-1.1.3.2 for ghc-8.0.0.20160111 and pretty-1.1.2.0 for ghc-7.10.3

ghc-mirror pushed a commit to ghc/ghc that referenced this issue Feb 6, 2016

@thomie

This comment has been minimized.

Show comment
Hide comment
@thomie

thomie Feb 6, 2016

Member

Changing $! to $ in the function beside helps quite a bit:

input before (s) after (s)
100 0.19 0.12
200 1.7 0.35
400 17 1.3

This is with ghc-7.10.3 -O and pretty-1.1.3.2.

git bisect points at 6e01b2e and ghc/ghc@ac88f11.

I tested the same change in GHC's copy of pretty (compiler/utils/Pretty), hoping it would make GHC massively faster! Unfortunately, it just had a negative effect on T3294:

  tests/alloc/T3294     2679798176  + 4.21% 2792570824  bytes
Member

thomie commented Feb 6, 2016

Changing $! to $ in the function beside helps quite a bit:

input before (s) after (s)
100 0.19 0.12
200 1.7 0.35
400 17 1.3

This is with ghc-7.10.3 -O and pretty-1.1.3.2.

git bisect points at 6e01b2e and ghc/ghc@ac88f11.

I tested the same change in GHC's copy of pretty (compiler/utils/Pretty), hoping it would make GHC massively faster! Unfortunately, it just had a negative effect on T3294:

  tests/alloc/T3294     2679798176  + 4.21% 2792570824  bytes
@jwaldmann

This comment has been minimized.

Show comment
Hide comment
@jwaldmann

jwaldmann Feb 6, 2016

Interesting. Well, the bug seems to hit only for certain kinds of nesting. But I have a few applications where I think it matters. And I have seen horrendous ghc and/or haddock runtimes for modules with hundreds of identifiers, e.g., in OpenGLRaw, and I have a hunch that this is related. You could try your patched ghc on these.

I think 1second runtime for outputting 2k chars is still way high. For a similar test case with wl-pprint, it is more like 10 millisecs (but the combinators have different behaviour)

jwaldmann commented Feb 6, 2016

Interesting. Well, the bug seems to hit only for certain kinds of nesting. But I have a few applications where I think it matters. And I have seen horrendous ghc and/or haddock runtimes for modules with hundreds of identifiers, e.g., in OpenGLRaw, and I have a hunch that this is related. You could try your patched ghc on these.

I think 1second runtime for outputting 2k chars is still way high. For a similar test case with wl-pprint, it is more like 10 millisecs (but the combinators have different behaviour)

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 1, 2016

Contributor

I started debugging a stack overflow in Hoogle, see ndmitchell/hoogle#167, and it brought me to exactly this location. Specifically, given the operation:

import Text.PrettyPrint.Annotated.HughesPJ
main = print $ hsep $ replicate 1000 $ text "neil"

This takes an unbounded amount of stack. If I replace:

- beside (TextBeside t p)    g q   = TextBeside t $! rest
+ beside (TextBeside t p)    g q   = TextBeside t $ rest

Then it takes a constant amount of stack. It turns an O(n) memory algorithm into an O(1) algorithm, so seems like it should be considered a fix. @dterei, any thoughts?

Contributor

ndmitchell commented Jun 1, 2016

I started debugging a stack overflow in Hoogle, see ndmitchell/hoogle#167, and it brought me to exactly this location. Specifically, given the operation:

import Text.PrettyPrint.Annotated.HughesPJ
main = print $ hsep $ replicate 1000 $ text "neil"

This takes an unbounded amount of stack. If I replace:

- beside (TextBeside t p)    g q   = TextBeside t $! rest
+ beside (TextBeside t p)    g q   = TextBeside t $ rest

Then it takes a constant amount of stack. It turns an O(n) memory algorithm into an O(1) algorithm, so seems like it should be considered a fix. @dterei, any thoughts?

@dterei

This comment has been minimized.

Show comment
Hide comment
@dterei

dterei Jun 1, 2016

Member

Sure. Do you mind submitting a PR with a test-case please?

Member

dterei commented Jun 1, 2016

Sure. Do you mind submitting a PR with a test-case please?

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 1, 2016

Contributor

Sure. @thomie, I intend to remove that one bang, do you have any others I should include in the ticket?

I found an even better test case, take 10 $ render $ hsep $ repeat $ text "neil" works with the new formulation and loops with the old.

Contributor

ndmitchell commented Jun 1, 2016

Sure. @thomie, I intend to remove that one bang, do you have any others I should include in the ticket?

I found an even better test case, take 10 $ render $ hsep $ repeat $ text "neil" works with the new formulation and loops with the old.

@thomie

This comment has been minimized.

Show comment
Hide comment
@thomie

thomie Jun 1, 2016

Member

Fine by me.

In case you know the answer, it would be great if you could add a comment or Note explaining why the other $!s are necessary (or a test even). They were all together added in 6e01b2e.

Member

thomie commented Jun 1, 2016

Fine by me.

In case you know the answer, it would be great if you could add a comment or Note explaining why the other $!s are necessary (or a test even). They were all together added in 6e01b2e.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 1, 2016

Contributor

To be clear, I can demonstrate that one particular bang is harmful - I have no opinion on the others - but my guess is someone had bangs and thought they added performance so sprayed them with a machine gun. It's a standard Haskell optimisation without benchmarking trick 😞

Contributor

ndmitchell commented Jun 1, 2016

To be clear, I can demonstrate that one particular bang is harmful - I have no opinion on the others - but my guess is someone had bangs and thought they added performance so sprayed them with a machine gun. It's a standard Haskell optimisation without benchmarking trick 😞

@jwaldmann

This comment has been minimized.

Show comment
Hide comment
@jwaldmann

jwaldmann Jun 2, 2016

I am glad that this issue is getting some attention.

The worrisome thing is that my benchmark above is basically an iteration of a random (!) small context
(\ x -> ... ). So even if behaviour improves for this particular case, others should be tested (as ndm did) - and it's probably best to enumerate contexts in a smallcheck style. Let me see if I find the time to do that.

jwaldmann commented Jun 2, 2016

I am glad that this issue is getting some attention.

The worrisome thing is that my benchmark above is basically an iteration of a random (!) small context
(\ x -> ... ). So even if behaviour improves for this particular case, others should be tested (as ndm did) - and it's probably best to enumerate contexts in a smallcheck style. Let me see if I find the time to do that.

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 2, 2016

Contributor

@jwaldmann - so I have one specific place which causes a more discrete regression (overly strict, different space complexity) - so is worth fixing independent of any performance measurements. Is it worth waiting for your benchmarks or shall I pull request it today?

Contributor

ndmitchell commented Jun 2, 2016

@jwaldmann - so I have one specific place which causes a more discrete regression (overly strict, different space complexity) - so is worth fixing independent of any performance measurements. Is it worth waiting for your benchmarks or shall I pull request it today?

@jwaldmann

This comment has been minimized.

Show comment
Hide comment
@jwaldmann

jwaldmann Jun 2, 2016

Don't wait for me. I made this for testing: https://github.com/jwaldmann/pretty-test

jwaldmann commented Jun 2, 2016

Don't wait for me. I made this for testing: https://github.com/jwaldmann/pretty-test

@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 2, 2016

Contributor

Pull request for my piece as #35.

Contributor

ndmitchell commented Jun 2, 2016

Pull request for my piece as #35.

@jwaldmann

This comment has been minimized.

Show comment
Hide comment
@jwaldmann

jwaldmann Jun 2, 2016

For reference, here's the "winner" (most expensive context with depth 3 -- according to SmallCheck).
I write it in the form that you can execute in ghci:

import Text.PrettyPrint.HughesPJ

:set +s
length $ render $ iterate ( \ hole ->  sep [text  "l", cat [hole], text  "l"] ) (text  "l") !! 1000

4001
(10.42 secs, 9,684,556,640 bytes)

This is quite consistent for ghc-7.6.3, 7.8.4, 7.10.3, while ghc-8.0.1 is better by one second.

jwaldmann commented Jun 2, 2016

For reference, here's the "winner" (most expensive context with depth 3 -- according to SmallCheck).
I write it in the form that you can execute in ghci:

import Text.PrettyPrint.HughesPJ

:set +s
length $ render $ iterate ( \ hole ->  sep [text  "l", cat [hole], text  "l"] ) (text  "l") !! 1000

4001
(10.42 secs, 9,684,556,640 bytes)

This is quite consistent for ghc-7.6.3, 7.8.4, 7.10.3, while ghc-8.0.1 is better by one second.

@jwaldmann

This comment has been minimized.

Show comment
Hide comment
@jwaldmann

jwaldmann Jun 2, 2016

What does the original prettyprint paper ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 ) say about (asymptotic) performance?

There are no exact claims or proofs, but end of Section 9 has "seems to grow slightly faster than linearly" and "far superior to O(n^2)".

Really? Here is the runtimes for rendering iterations of context mentioned previously (x axis: iterations, y axis: time in seconds, average over 3 executions). (forgive the crude rendering)

d-1000

Looks accidentally quadratic ?

I think that for a (pretty)printing library, anything more than linear is unacceptable , and as long as the behaviour is not fixed, "dangerous" combinators must be red-flagged in the API docs.

What are these dangerous combinators? Everything that has an "either .. or" ? (sep : either hsep or vcat, etc.)

Detection of non-linearity in pretty's code would really make a nice test case for automated analysis of runtime complexity of programs. I can advertise this at http://cl-informatik.uibk.ac.at/users/tpowell/lca but don't expect immediate results ...

Two technical points here: linear, quadratic, etc. - in what parameter exactly? Size of input? Size of output?

  • the natural measure is the number of API calls (the size of the abstract syntax tree). This is to include calls to, and processing of empty, which produces no output, but has a cost.
  • there are cases where output size is more than linear of input size (because of nesting, progressive amounts of whitespace could be produced). This does not happen for the context I presented above (everything starts in column 0). But then, since we render for a fixed page width, there cannot be too much whitespace in output? Or, we'd could whitespace in some extra way.

jwaldmann commented Jun 2, 2016

What does the original prettyprint paper ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.8777 ) say about (asymptotic) performance?

There are no exact claims or proofs, but end of Section 9 has "seems to grow slightly faster than linearly" and "far superior to O(n^2)".

Really? Here is the runtimes for rendering iterations of context mentioned previously (x axis: iterations, y axis: time in seconds, average over 3 executions). (forgive the crude rendering)

d-1000

Looks accidentally quadratic ?

I think that for a (pretty)printing library, anything more than linear is unacceptable , and as long as the behaviour is not fixed, "dangerous" combinators must be red-flagged in the API docs.

What are these dangerous combinators? Everything that has an "either .. or" ? (sep : either hsep or vcat, etc.)

Detection of non-linearity in pretty's code would really make a nice test case for automated analysis of runtime complexity of programs. I can advertise this at http://cl-informatik.uibk.ac.at/users/tpowell/lca but don't expect immediate results ...

Two technical points here: linear, quadratic, etc. - in what parameter exactly? Size of input? Size of output?

  • the natural measure is the number of API calls (the size of the abstract syntax tree). This is to include calls to, and processing of empty, which produces no output, but has a cost.
  • there are cases where output size is more than linear of input size (because of nesting, progressive amounts of whitespace could be produced). This does not happen for the context I presented above (everything starts in column 0). But then, since we render for a fixed page width, there cannot be too much whitespace in output? Or, we'd could whitespace in some extra way.
@ndmitchell

This comment has been minimized.

Show comment
Hide comment
@ndmitchell

ndmitchell Jun 2, 2016

Contributor

From what I've read of the code I would be shocked if it wasn't quadratic. It seems to regularly be traversing the entire tree to ensure global properties, whereas I would expect it to consider all Doc values in some normal form and then use smart constructors to guarantee that without traversing downwards.

Contributor

ndmitchell commented Jun 2, 2016

From what I've read of the code I would be shocked if it wasn't quadratic. It seems to regularly be traversing the entire tree to ensure global properties, whereas I would expect it to consider all Doc values in some normal form and then use smart constructors to guarantee that without traversing downwards.

@dterei dterei changed the title from performance bug ( > 10 seconds for < 2k output chars) to Quadratic performance issues Jun 2, 2016

ghc-mirror pushed a commit to ghc/ghc that referenced this issue Jul 17, 2016

Pretty: remove a harmful $! (#12227)
This is backport of [1] for GHC's copy of Pretty. See Note [Differences
between libraries/pretty and compiler/utils/Pretty.hs].

[1] http://git.haskell.org/packages/pretty.git/commit/bbe9270c5f849a5bb74c9166a5f4202cfb0dba22
    haskell/pretty#32
    haskell/pretty#35

Reviewers: bgamari, austin

Reviewed By: austin

Differential Revision: https://phabricator.haskell.org/D2397

GHC Trac Issues: #12227

bgamari added a commit to bgamari/ghc that referenced this issue Aug 25, 2016

Pretty: remove a harmful $! (#12227)
This is backport of [1] for GHC's copy of Pretty. See Note [Differences
between libraries/pretty and compiler/utils/Pretty.hs].

[1] http://git.haskell.org/packages/pretty.git/commit/bbe9270c5f849a5bb74c9166a5f4202cfb0dba22
    haskell/pretty#32
    haskell/pretty#35

Reviewers: bgamari, austin

Reviewed By: austin

Differential Revision: https://phabricator.haskell.org/D2397

GHC Trac Issues: #12227

(cherry picked from commit 89a8be7)

bgamari added a commit to bgamari/ghc that referenced this issue Aug 25, 2016

Pretty: remove a harmful $! (#12227)
This is backport of [1] for GHC's copy of Pretty. See Note [Differences
between libraries/pretty and compiler/utils/Pretty.hs].

[1] http://git.haskell.org/packages/pretty.git/commit/bbe9270c5f849a5bb74c9166a5f4202cfb0dba22
    haskell/pretty#32
    haskell/pretty#35

Reviewers: bgamari, austin

Reviewed By: austin

Differential Revision: https://phabricator.haskell.org/D2397

GHC Trac Issues: #12227

(cherry picked from commit 89a8be7)

quchen added a commit to quchen/prettyprinter that referenced this issue Jun 9, 2017

Add performance test for nested fillSep
The pretty package had an issue here; luckily, prettyprinter does not.
See haskell/pretty#32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment