Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up man page generation #1892

Merged
merged 1 commit into from Nov 12, 2018
Merged

Speed up man page generation #1892

merged 1 commit into from Nov 12, 2018

Conversation

bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Oct 9, 2018

Speed up man page generation by a factor of 3 (from 18 to 6 seconds in the single-CPU case)
by not building every man page and html page several times (e.g. 24x)

Makefile Outdated
@@ -59,7 +59,7 @@ man-pages: $(HELP_ALL:=.ronn) $(HELP_ALL) $(HELP_ALL:=.txt)
groff -Wall -mtty-char -mandoc -Tutf8 -rLL=$(TEXT_WIDTH)n $< | col -b >$@

%.1: %.1.ronn bin/ronn
bin/ronn --organization=GITHUB --manual="Hub Manual" share/man/man1/*.ronn
bin/ronn --organization=GITHUB --manual="Hub Manual" $<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thank you for trying to help out with this! But I don't get this optimization. Before, this patch, this command reports taking under 4.5s:

git clean -fdx share; time make man-pages

After your patch, it takes more than 12s.

So, I'm seeing 3x slowdown instead of 3x speedup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing in openSUSE with hub-2.3.0-pre10

There we do make bin/hub man-pages
I think, that might make a difference, because git only contains one .ronn file as input, but I saw more. Those extras are generated in some step.

strace shows hub-remote.1.ronn being created by

["/bin/sh", "-c", "bin/hub help hub-remote --plain-text | script/format-ronn hub-remote share/man/man1/hub-remote.1.ronn"]
["script/format-ronn", "hub-remote", "share/man/man1/hub-remote.1.ronn"]

I'm wondering though, where your slowdown comes from, because it should be exactly the same when you just have 1 .ronn file from git as input.
Can you reproduce the slowdown?
Do you get 24 entries under /usr/share/man/man1/hub*.1* in both cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe your measurement/cleanup did not account for the one-time creation of bin/hub ?

Copy link
Member

@mislav mislav Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ensured that make bin/hub is already run before measurement. Your optimization should apply only to the man-pages step, so including bin/hub build step in the measurement doesn't make sense.

What do you get with the following script on your branch?

make bin/hub
git clean -fdxq share; time make man-pages >/dev/null 2>&1
git checkout HEAD^ -- Makefile
git clean -fdxq share; time make man-pages >/dev/null 2>&1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to simulate it, because I cannot get it to build from git

> make
script/build -o bin/hub
main.go:8:2: cannot find package "github.com/github/hub/commands" in any of:
        /usr/lib64/go/1.10/src/github.com/github/hub/commands (from $GOROOT)
        /home/abuild/go/src/github.com/github/hub/commands (from $GOPATH)
        /usr/share/go/1.10/contrib/src/github.com/github/hub/commands
main.go:9:2: cannot find package "github.com/github/hub/github" in any of:

git bisect run bash -c "make clean ; make
says, build was broken by commit 00e1b99

I noticed that we sed bin/ronn to ronn, using the system's ruby2.5-rubygem-ronn-0.7.3 but OTOH 0.7.3 is also the latest.

cp -a Makefile.unpatched Makefile ; rm -f share/man/man1/* ; cp -a orig/* share/man/man1/ ; time make man-pages >/dev/null 2>&1

real    0m13.063s
user    0m13.020s
sys     0m1.235s

cp -a Makefile.patched Makefile ; rm -f share/man/man1/* ; cp -a orig/* share/man/man1/ ; time make man-pages >/dev/null 2>&1
        
real    0m4.131s
user    0m3.987s
sys     0m0.451s

Using a local revert of 00e1b99, I even got your lines to work.

git checkout make
git clean -fdxq share; time make man-pages >/dev/null 2>&1

real    0m4.329s
user    0m4.173s
sys     0m0.477s
git checkout master
git clean -fdxq share; time make man-pages >/dev/null 2>&1

real    0m1.230s
user    0m1.323s
sys     0m0.227s

There seems to also be a downside content-wise in rendering each page indiviually:

-<p><a class="man-ref" href="hub-am.1.html">hub-am<span class="s">(1)</span></a>, <a class="man-ref" href="hub.1.html">hub<span class="s">(1)</span></a>, <span class="man-ref">git-apply<span class="s">(1)</span></span></p>
+<p><span class="man-ref">hub-am<span class="s">(1)</span></span>, <span class="man-ref">hub<span class="s">(1)</span></span>, <span class="man-ref">git-apply<span class="s">(1)</span></span></p>

but then I found that parallelism plays a role there:

git checkout master
git clean -fdxq ; mkdir -p bin; ln -sf /usr/bin/ronn bin/ronn ; time make -j2 man-pages >/dev/null 2>&1 ; ls share/man/man1/|wc -l

real    0m8.037s
user    0m15.778s
sys     0m1.555s
100

> git checkout make
Switched to branch 'make'
> git clean -fdxq ; mkdir -p bin; ln -sf /usr/bin/ronn bin/ronn ; time make -j2 man-pages >/dev/null 2>&1 ; ls share/man/man1/|wc -l

real    0m2.633s
user    0m4.913s
sys     0m0.578s
100

@bmwiedemann
Copy link
Contributor Author

I think, both the original code and my current patch are not doing everything right.
What we really want is to have hub generate all .ronn files and only when all those are done, then (in 1 ronn call) render those to man pages and html files (with cross-links between each other)

@bmwiedemann
Copy link
Contributor Author

I pushed a new version, that works nicely with any level of parallelism. The code might be a bit less straight-forward, though.

I was wondering, if we could drop the $(HELP_ALL:=.ronn) after man-pages:
so that the intermediate files get deleted by make.

Copy link
Member

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. to support hub building from git checkout, you need to check it out to
your $GOPATH/src/github.com/github/hub.

Makefile Outdated

%.txt: %.ronn
groff -Wall -mtty-char -mandoc -Tutf8 -rLL=$(TEXT_WIDTH)n $< | col -b >$@

%.1: %.1.ronn bin/ronn
%.1: %.1.ronn bin/ronn $(HELP_ALL:=.ronn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is not as straightforward as before.

What's the overall benefit of this change? On my computer, I don't see an improvement in your branch runing make man-pages, and with running make -j2 man-pages I see a 1s improvement. I'm not sure if supporting the parallelism case is worth it.

How often are you building man-pages? Is the current speed unacceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the patch was not meant so much for speedup but rather for correctness.
The old Makefile told ronn to render all manpages - once per input file, so if you have 24 input .ronn files, it could run ronn 24 times, writing every output file 24 times. That is what I dislike.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see it run one time, since a single bin/ronn invocation is used to pass in all share/man/man1/*.ronn files

$ git clean -fdxq share; time make -d man-pages 2>/dev/null | grep ^bin/ronn
bin/ronn --organization=GITHUB --manual="Hub Manual" share/man/man1/*.ronn

real    0m3.109s
user    0m2.102s
sys     0m2.024s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try

git clean -fdxq share ; rm -f bin/hub ; time make -j2 man-pages 2>/dev/null | grep ^bin/ronn

It seems to make a difference that bin/hub is getting a new timestamp in the process and parallelism is involved via -j2.

with -j1 I get 1 ronn call and

real    0m1.752s
user    0m2.077s
sys     0m0.337s

but with -j2 I get 25 ronn calls and

real    0m8.042s
user    0m15.800s
sys     0m1.508s

Copy link
Member

@mislav mislav Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try

git clean -fdxq share ; rm -f bin/hub ; time make -j2 man-pages 2>/dev/null | grep ^bin/ronn

Ah yes, thank you for that. I can see the problem now. Is this something that people will likely encounter, though? I see this as an edge case that might not be trivial nor worth it for us to fix. We've already spent a disproportionate amount of time discussing and debugging this when compared to the actual impact it has on our users.

If you have a solution that's straightforward in the Makefile, I would gladly accept it though. Thanks for your patience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen it in our package builds (because people think, using parallelism on today's 32-core systems is a good thing)
https://build.opensuse.org/package/live_build_log/openSUSE:Factory/hub/standard/x86_64

I can experiment some more with make to come up with something nicer.

@bmwiedemann
Copy link
Contributor Author

bmwiedemann commented Oct 13, 2018

When looking at the bison example I would think that this should do the trick, but it does not.

$(HELP_ALL): bin/ronn $(HELP_ALL:=.ronn)
        bin/ronn --organization=GITHUB --manual="Hub Manual" share/man/man1/*.ronn

May be a bug in GNU make?

@bmwiedemann
Copy link
Contributor Author

Filed a GNU make bug with a minimal reproducer based on what hub did.

Also pushed a workaround where man-pages are generated in a non-parallel make run.

by a factor of 3
Without this patch, using make -j
built every man page and html page 25 times

Now, man pages are always generated in a single ronn call
after all .ronn files are made.

Workarounds GNU make bug https://savannah.gnu.org/bugs/?54854
@bmwiedemann
Copy link
Contributor Author

Further refined the workaround with the help of my SUSE colleagues. Looks much nicer now. Can you please test/review/merge this?

@bmwiedemann
Copy link
Contributor Author

ping. anything left to do?

@mislav mislav merged commit 66399a4 into github:master Nov 12, 2018
@mislav
Copy link
Member

mislav commented Nov 12, 2018

@bmwiedemann Thank you for your contribution and assistance!

@bmwiedemann bmwiedemann deleted the make branch November 29, 2018 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants