Skip to content

Loading…

the package indexes are very slow #281

Closed
bos opened this Issue · 7 comments

1 participant

@bos
Haskell member

(Imported from Trac #288, reported by @dcoutts on 2008-06-07)

In a large run, eg trying to make a plan to install 560 packages from hackage:

$ cabal install --dry-run $(cat pkgs)
it turns out (according to the ghc profile), 91% of the time is spent reading the index of installed and available packages.

The ghc package index is a couple of massive text files in Read/Show format so that takes for ever to read. The available package index is the tarball of .cabal files and our .cabal file parser is really slow.

For smaller runs it's not so bad:

$ cabal install --dry-run xmonad
since we only have to inspect the subset of the available package index that make up xmonad's transitive deps (any versions thereof), so that allows us to avoid forcing most of the index.

http://hackage.haskell.org/trac/ghc/ticket/2089

might help us, but then again maybe not if we still have to parse the result of calling ghc-pkg since that will give us another text format.

For our own package index, perhaps we should be generating a cache in some other format when we download the package index.

@bos
Haskell member

(Imported comment by @dcoutts on 2008-06-07)

Partially fixed for common cases with:

Sat Jun  7 15:39:13 BST 2008  Duncan Coutts <duncan@haskell.org>
  * Only inspect the needed parts of the installed and available indexes
  The available package index is loaded lazily so if we can avoid
  forcing all the packages then we can save a huge amount of slow text
  parsing. So select out the maximal subset of the index that we could
  ever need based on the names of the packages we want to install. For
  the common case of installing just one or two packages this cuts
  down the number of packages we look at by a couple orders of
  magnitude. This does not help with the installed index which is read
  strictly, though most people do not (yet) have hundreds of installed
  packages, so that's less of an immediate problem.
@bos
Haskell member

(Imported comment by @dcoutts on 2008-06-07)

For installed packages index, we should parse the output of ghc-pkg dump lazily. See #311.

All it needs is to split on "\n---" as we do now, but then instead of directly parsing, we should extract only the name and version fields and then parse the rest lazily.

@bos
Haskell member

(Imported comment by @dcoutts on 2008-08-12)

The installed package index is now respectably fast. The available index is still slow.

Here's one approach: instead of generating a full cache in an alternative format, generate an index of the tarball mapping package id to offset, length pairs in the uncompressed .tar file. Then we can lazily load the tarball. Ideally just mmap it and extract the offset,length slices and lazily parse. It would avoid loading the full 20+ megabytes of the full index.

@bos
Haskell member

(Imported comment by @aslatter on 2010-04-23)

Is there anything that would make the tar-index work done for documentation serving in hackage-server not work for this?

One thing I can think of is if we don't always have full package-id, but I don't know anything about how we make an install plan.

@bos
Haskell member

(Imported comment by @aslatter on 2010-04-25)

Experience report:

Taking the tar-index from hackage-server wasn't too hard. It doesn't scale to hackage-sized tarballs, though - it is only able to store the offsets for about half of the .cabal files in 00-index.tar.

@bos
Haskell member

(Imported comment by @dcoutts on 2010-07-14)

Replying to @aslatter:

Experience report: Taking the tar-index from hackage-server wasn't too hard. It doesn't scale to hackage-sized tarballs, though - it is only able to store the offsets for about half of the .cabal files in 00-index.tar.

It should be straightforward to extend the size of the types used to cope with bigger tarballs. The only cost will be a bigger index. The reason for the limitations in the hackage code is simply to save space by keeping the indexes very compact.

@bos
Haskell member

(Imported comment by @kosmikus on 2010-07-14)

This is done:

Sun Oct 23 23:32:53 CEST 2011  Duncan Coutts <duncan@community.haskell.org>

  • Add a source package index cache to speed up reading e.g. about 3x faster for cabal info pkgname
Some timings using a list of all packages from Hackage (~4000 packages):

  • with cabal-install-0.10.2: more than 8 minutes (didn't wait for it to finish)
  • with cabal-install-0.13.3: 2 minutes 44 seconds
  • with cabal-install-0.13.3 and modular solver: 49 seconds
Of course, this package list results in an error.

@bos bos closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.