Minimal rebuilds for add-source dependencies #1121

Closed
23Skidoo opened this Issue Nov 17, 2012 · 8 comments

Projects

None yet

2 participants

@23Skidoo
Member

Right now there is no way to tell whether an add-source dependency needs to be rebuilt, so sandbox-build conservatively reinstalls all add-source dependencies. This, in turn, triggers the installation of all their reverse dependencies, which can make the build unnecessary long. We should only reinstall those add-source dependencies that have been modified since the last time we installed them.

As an interim measure there should at least be a flag to disable reinstalling add-source dependencies.

@23Skidoo 23Skidoo was assigned Nov 17, 2012
@23Skidoo
Member

I think we can get this in 1.18.

@23Skidoo
Member

My current plan for this:

  • Add a new status file .cabal-sandbox/sandbox-timestamp (or maybe just use the directory timestamp)
  • Keep a copy of all add-source deps inside .cabal-sandbox/add-source-deps instead of building in-place
  • In reconfigure, if we're inside a sandbox, check if any add-source deps have source files with a newer timestamp than sandbox-timestamp, and if so, update & reinstall our copies + update sandbox-timestamp.

This will also make it easy to implement add-source --snapshot (update our copy only when the user tells us to).

@tibbe
Member
tibbe commented Apr 22, 2013

I'm not quite sure I follow how this would work. I think it might be sufficiently complicated that a short design doc is in order.

@23Skidoo
Member

@tibbe This is the same idea as checking whether the library/exe timestamp is older than any of the sources, except that we'll be using a single timestamp for the whole sandbox instead of a one for each library. Since the whole sandbox is updated as a single unit, I think that this will work. This will make the timestamp check much easier (all you need to know is the list of the dependency's source files, and this info can be extracted from the .cabal file).

One complication with keeping copies of add-source deps in the sandbox is that files can be deleted/added in the source, but we can start with rebuilding from scratch each time the dependency is updated and later implement smarter updating.

@tibbe
Member
tibbe commented Apr 23, 2013

Here are a question I'd like you think about:

In the future, we'd like to move away from a model where building is focused on the package in the current directory. Instead we expect that users (e.g. a company) will have a large source tree (e.g. in git) containing hundreds of packages. In such a system we'd like to to focus less on the "current" package and more on the whole tree. Here's a mocked-up interaction:

$ git clone https://mycompany.com/all-the-source-code ~/src/my-project
$ cd ~/src/my-project
# Not inside any package directory but in the root of the workspace:
$ cabal sandbox init
$ ls
package1 package2 package3 subdir1 subdir2 ...
$ $EDITOR package1/Library.hs
$ $EDITOR subdir1/package3/ExeName.hs
# Depends on package1:
$ cabal build subdir1/package3:exe-name

In such a system (with potentially tens of thousands of source files), do we really want to copy all the source files into some new directory? I don't think so, as it might bring a host of problems with keeping things in sync (in addition to potentially being slow).

@23Skidoo
Member

In such a system (with potentially tens of thousands of source files), do we really want to copy all the source files into some new directory? I don't think so, as it might bring a host of problems with keeping things in sync (in addition to potentially being slow).

It should be easy to extend this scheme to support in-place building (just add another type of build tree reference). Timestamp check should still work. With multi-package source trees we can use in-place building by default.

Motivation for copying by default is that it'll avoid surprising behaviour such as described in #1281 - but we can still have add-source --inplace for those who want it. This can even be a config file setting.

Additionally, copying will make the sandbox relocatable - if you copy the project directory to another computer, you'll still have a snapshot of all dependencies (though it won't be possible to update them).

In any case, copying is not essential to this scheme, minimal rebuilds should be still implementable without it. I think I'll implement both --inplace and --copy so that we can experiment and see what works best by default.

@tibbe
Member
tibbe commented Apr 23, 2013

Additionally, copying will make the sandbox relocatable - if you copy the project directory to another computer, you'll still have a snapshot of all dependencies (though it won't be possible to update them).

I think this is a non-goal. I don't want people to think in terms of sandboxes. They're just a mechanism to get hermetic builds.

In any case, copying is not essential to this scheme, minimal rebuilds should be still implementable without it. I think I'll implement both --inplace and --copy so that we can experiment and see what works best by default.

We can try this for now, but remember the above statement. Sandboxes are not the goal and neither is lots of add-source usage. For now we will have users add-source their dependencies, but in the future we might just walk the directory tree from the current directory to find them (using add-source to add hundred packages after you cloned the company git repo is nothing users will want to do).

In other word, lets implement this mechanism but remember that it's just that.

@23Skidoo
Member

Fixed by #1292.

@23Skidoo 23Skidoo closed this Apr 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment