Too many invocations of ar #4550

sergv · 2017-06-04T08:59:02Z

tl;dr version: Cabal currently uses slow approach to creating .a archives by repeatedly calling ar on chunks of object files produced by ghc. It should instead pass all the the object files together to one invocation of ar via @file argument.

Long version: building skylighting package on Debian Linux with ghc 8.0.2 and --enable-split-objs, produces around 150000 object files for non-profiling build (with profiling enabled, it's double than that). These object files are produced reasonably fast, but then Cabal takes a long time to combine them together into an .a archive. In order to do that, Cabal splits all 150000 object files into chunks of around 300 each and then calls ar qc target.a file1.o file2.o ... file330.o command for each chunk, which incrementally updates the archive. This process takes time comparable with time needed to produce all the object files (if not more), not in the least because at each invocation the ar utility must read the current archive, append 300 new files to it and then write it back, which is pretty much quadratic complexity! Looking at process manager it seems to be the case indeed because each successive ar invocation uses around 5 Mb more memory that previous one. If you throw in slow spinning hdd that object files and target archive are located on, the whole process takes really long time to complete.

It seems possible to avoid the overhead of chunking and significantly speed up production of the .a archive by using only one ar invocation and passing all object files in one go. Surely, all 150000 object files would not fit on the command line, but fortunately the ar tool supports @file options, which Cabal could use to pass all the object files.

Also, I think this way of passing object files should become the default one because ar seems to have this option for quite some time already, so it should work well with reasonable amount of previous versions of ghc and binutils, while still older versions can fall back to current mechanism. I managed to google ar man page from 2007 that claims to support @file option already: http://www.thelinuxblog.com/linux-man-pages/1/ar, which should cover all ghcs down to 6.12. I'm not sure about older versions though.

So, the question is: is ar invocation with @file argument is a reasonable thing to add to Cabal-the-library?

The text was updated successfully, but these errors were encountered:

23Skidoo · 2017-06-04T09:41:23Z

Yep, using response files for this makes sense. I suggest keeping the old code path and adding a version check. We can dig the binutils repo history to find the precise version number in which response file support was added. I'm also not sure that BSD/macOS ar supports response files at all, we'll probably have to stick to the old code path on those platforms. Also we should find the commit in which command line splitting was added to Cabal and check that there are no additional reasons to do things that way besides the length limit. If you're going to implement this, please also add a test case.

hvr · 2017-06-04T14:45:15Z

...there's a reason that ghc --info records (since GHC 7.2) the information

 ,("ar supports at file","YES")

In general you can't assume that any of the C toolchain tools supports response files, and I know of at least one platform where neither of the official tools support response files...

Otoh, if ghc --info says that the ar executable (recorded in the "ar command" property) supports at files, then Cabal should be fine to use them.

ezyang · 2017-06-10T21:35:54Z

@sergv Go for it!

sergv · 2017-06-11T12:41:28Z

@23Skidoo The line splitting was added in a6e3925 apparently for the sole purpose of overcoming command-line length limit.

@hvr Thanks, the --info check should be the best way to detect whether ar supports file arguments.

However, it seems to me that in general user may specify custom ar executable that may not support @file arguments. Is this scenario rare enough to disregard it or should cabal account for it?

23Skidoo · 2017-06-11T12:57:26Z

Since we'll still have the old code path, it makes sense to provide a way to force it.

sergv · 2017-07-08T15:23:10Z

@23Skidoo It seems that ld is also invoked using multiStageProgramInvocation in order to build ghci library. Is it good idea to make ld also use response files?

The only issue here is that ghc --info does not have information about that. However, ld version should generally match ar version, so "ar supports at file" should be a good proxy for whether ld supports them too.

ezyang · 2017-07-20T01:22:06Z

If you want to use a heuristic like that (and I'm not saying it's a bad heuristic), you have to be prepared to deal when it goes wrong. If the ld does not actually support response files, what are you going to do?

sergv · 2017-07-20T06:18:49Z

I think the solution should be the same as for the case when ar that does not support response files. However, instead of --ar-does-not-support-response-files argument to configure and install there should be an argument named --disable-response-files that controls whether cabal uses response files at all for both ar and ld.

sergv · 2017-07-25T18:20:05Z

With #4596 merged, I guess we can close this issue.

sergv mentioned this issue Jul 8, 2017

Leave single invocation of ar #4596

Merged

4 tasks

sergv closed this as completed Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many invocations of ar #4550

Too many invocations of ar #4550

sergv commented Jun 4, 2017

23Skidoo commented Jun 4, 2017 •

edited

hvr commented Jun 4, 2017 •

edited

ezyang commented Jun 10, 2017

sergv commented Jun 11, 2017 •

edited by 23Skidoo

23Skidoo commented Jun 11, 2017

sergv commented Jul 8, 2017 •

edited

ezyang commented Jul 20, 2017

sergv commented Jul 20, 2017

sergv commented Jul 25, 2017

Too many invocations of ar #4550

Too many invocations of ar #4550

Comments

sergv commented Jun 4, 2017

23Skidoo commented Jun 4, 2017 • edited

hvr commented Jun 4, 2017 • edited

ezyang commented Jun 10, 2017

sergv commented Jun 11, 2017 • edited by 23Skidoo

23Skidoo commented Jun 11, 2017

sergv commented Jul 8, 2017 • edited

ezyang commented Jul 20, 2017

sergv commented Jul 20, 2017

sergv commented Jul 25, 2017

23Skidoo commented Jun 4, 2017 •

edited

hvr commented Jun 4, 2017 •

edited

sergv commented Jun 11, 2017 •

edited by 23Skidoo

sergv commented Jul 8, 2017 •

edited