Home

ahmadia edited this page Oct 22, 2014 · 17 revisions

Hashdist

We want to solve the problem of distributing scientific software

Many attempts already exists, mostly in the form of source or binary distributions shipping a self-contained silo of Python and non-Python software: EPD, SAGE, Python(x,y), FEMHub, qsnake, python-hpcmp, and numerous in-house solutions in research institutions. A common pattern currently is that when none of the existing options fits, one creates a new one. This is particularly the case in HPC, where customization for a specific cluster is vital.

However, the reality is perhaps that for most users, none of the current distributions fit their usecase. A common pattern is to pick the one that comes closest, then manually patch and install software into it, to the point that one dares not make an upgrade to the next distribution release.

While the lack of sophistication makes it easy to just roll your own distribution, it also means that it is difficult to create a single distribution that is flexible enough for all users. Sophistication must be added, but carefully.

The stateless distribution

We believe current distributions lack the following important features:

  • A stateless approach to the software stack. Current solutions are all full of state; the distribution is the result of which chain of package install commands has been run; custom software installs blends with and overwrites the state of the software distribution. One should be able to declare the distribution one wants, and then build it.
  • Reliable caching, so that a full rebuild of the entire software stack is fast. This makes the stateless approach practical: Uninstalling or upgrading software becomes a matter of changing the declaration and do a full rebuild.
  • Make the software stack branchable. Jumping between different customizations of your software stack, or go back in time or replicate somebody else's environment, should be as simple (and quick!) as jumping around in git branches.

Step 1

We do not intend to make this happen simply by providing the 15th source distribution to rule them all. Each distribution was created for a specific use-case, and we can have no hope of understanding them all well enough, at least yet.

Instead, we want to develop standalone backend tools that can improve the state of the art in Python software distribution in general. Ourselves we will focus on refactoring two different non-root source distributions, qsnake and python-hpcmp (with different package formats), in order to make them use a common tool stack.

Step 2

When a handful of existing source distributions becoming more sophisticated, one can start thinking about creating a new distribution that is flexible enough to fit in all the existing use cases, and work for all the users whom the current solutions do not work for.

Status

HashDist is currently in active beta. See http://hashdist.github.io/ for more information.