Skip to content
This repository has been archived by the owner on Sep 19, 2018. It is now read-only.

Artifact cleanup logic #22

Open
chenglou opened this issue May 16, 2016 · 8 comments
Open

Artifact cleanup logic #22

chenglou opened this issue May 16, 2016 · 8 comments

Comments

@chenglou
Copy link

chenglou commented May 16, 2016

This has been discussed in chat so I'll summarize here. Correct me if anything's wrong here.

The current way Jenga handles artifacts makes it so that if they're not colocated with the source, I have to pass the artifacts callback to Env and indicate which files can be considered as artifact. The cleaning itself is delayed, aka only happens when, say, a glob_listing is called on the artifact directory. When the artifacts are alongside the source, this hasn't been a problem since globbing on the source directory will clean the artifacts anyway.

Another problem is that empty, stale folders aren't cleaned (I guess because of version control logic?). This would be troublesome if one does subdirs on a stale artifact directory. Fortunately I don't do that, but still.

@Nick-Chapman
Copy link

Currently, jenga never removes directories, nothing to do with version control logic (I don't understand what this means!)

@chenglou
Copy link
Author

Sorry bad guess then. I thought you were deferring to hg to remove stale artifacts, and that hg ignores empty folders.

@Nick-Chapman
Copy link

Ah, now I understand what you meant by "version control logic". Yes, hg
does cleanup empty directories. But this is not the reason jenga doesn't,
it just doesn't know how.

Also, if jenga has rules which create artifacts in a directory without
having any deps. (For example, we have a ".merlin" target in our standard
build rules.) This artifact will never be considered stale, and so jenga
will not remove it, and so hg will not get an empty directory to remove.
Even more annoying, if you have jenga running in polling mode, and try to
remove the artifact by hand. jenga will immediately try to recreate the
artifact that it just noticed went missing!

On 17 May 2016 at 07:29, Cheng Lou notifications@github.com wrote:

Sorry bad guess then. I thought you were deferring to hg to remove stale
artifacts, and that hg ignores empty folders.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#22 (comment)

Nick Chapman nic@amadido.co.uk
Beira Cottage, The Phygtle, Chalfont St. Peter, Bucks, SL9 0JT, UK.
+44 1494 876885 (home) +44 7779 121533 (mobile)

@jordwalke
Copy link

jordwalke commented Jul 20, 2016

Why is it that when there is an output directory, and a separate (different) source directory, that globing the source directory will not detect that a source has been removed, and therefore its old artifact in the separate build directory should be removed? (If I understand correctly, this is what Cheng said is happening).

I would imagine it should work like this:

  • Jenga tracks that artifact _build/a.cmo was generated from input file src/a.ml, and therefore when globbing the src directory, if it sees that a.ml is no longer present, it should remove _build/a.cmo before building anything.

But if I understand correctly, it does not work like that. Is there a reason?

@lpw25
Copy link
Member

lpw25 commented Jul 20, 2016

It doesn't work like that because the build is the other way around. You don't try to build everything you know how to build which might be out-of-date, only those things which your build target depends on. Globbing the src directory is not at all dependent on the existence of _build/a.cmo so there is no reason to consider removing it.

I suspect one issue is that you are thinking of stale artefact deletion as a mechanism similar to make clean, but that is not what it is for. It is only there to make it easier to produce deterministic builds in the presence of awkward tools like ocamlc which make it difficult to pass exact dependencies to (e.g. you can only use -I foo to have it use foo/bar.cmi you cannot tell it directly use only foo/bar.cmi). For cleaning it is much better to use your version control tools.

(As a side-note, the tracking of artefacts should not depend on tracking "that artifact _build/a.cmo was generated from input file src/a.ml" but on the knowledge that _build/a.cmo is not a source file and has no rule to produce it. Otherwise your artefact handling will not work if the jenga database is removed.)

@lpw25
Copy link
Member

lpw25 commented Jul 20, 2016

I should make clear that I'm not saying the way Jenga currently handles stale artefacts is correct (the need to glob the output directory is still clearly a hack), I was just explaining why it doesn't work as you were suggesting.

@jordwalke
Copy link

jordwalke commented Jul 20, 2016

Okay.

And yes, the case that you described is exactly the one that we ran into (I was merely simplifying the problem in this thread for sake of discussion).

So how should Jenga handle this case where artifacts are not automatically removed when the files that caused them to be produced are removed?

It seems that Jenga should allow us to express (or infer when possible) that files are generated as a result of running rules on other files (that were the result of a glob). Is version control the ideal way to deal with this (what would that even look like), or should Jenga understand this natively?

@lpw25
Copy link
Member

lpw25 commented Jul 20, 2016

So how should Jenga handle this case where artifacts are not automatically removed when the files that caused them to be produced are removed?

Well I think the basic idea of Jenga's approach is correct: when your build depends on foo, but foo isn't a source file and has no rule to build it, then you should remove foo. And the user should specify both the build rules and the source files, as it does now.

The only weird bit is that it also removes some stale files when they are not depended on, so globing on *.ml will remove the stale *.cmi files even though there is no real dependency between those two things. And worse our build actually relies on this behaviour.

For those things which rely on this weird behaviour (e.g. ocamlc) I would rather we specified that we need certain directories to be "cleaned" as a dependency for running the command. So one of the dependency of a rule that ran ocamlc -I foo would be along the lines of Clean("foo", Glob "*.cmi").

It seems that Jenga should allow us to express (or infer when possible) that files are generated as a result of running rules on other files (that were the result of a glob).

I don't follow what you mean here. But assuming it was related to my side-note, I should try to clarify what I meant. The build rules (i.e. "file foo should be produced by running command bar which depends on baz) is stored in the jengaroot and Jenga always has it available, whereas information about what was done by previous runs of Jenga: (i.e. "file foo has digest 123 and was produced by running command bar when baz had digest 456") is stored in a database which may or may not exist. So to be deterministic, the state of the system after running jenga should only depend on the build rules. The information about what was done previously is merely used as an optimisation, not to affect the output.

So when you said that Jenga should observe that "_build/a.cmo was generated from input file src/a.ml" I merely wanted to draw your attention to the subtle distinction between that and "_build/a.cmo is not a source file and there is a rule to create _build/a.cmo from input file src/a.ml". The two pieces of information are very similar, but it is important that Jenga bases its decisions on the second one not the first.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants