Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

Artifact is not re-created when it was manually removed #342

Closed
davido opened this issue Jun 13, 2015 · 9 comments
Closed

Artifact is not re-created when it was manually removed #342

davido opened this issue Jun 13, 2015 · 9 comments

Comments

@davido
Copy link
Contributor

davido commented Jun 13, 2015

So I disabled watchman for now, and have a simple java_library() rule with jar outcome, as described in #341:

$ buck targets --json plugins/foo:foo
Not using buckd because watchman isn't installed.
[+] PARSING BUCK FILES...0,1s
[
{
  "annotationProcessorDeps" : [ ],
  "annotationProcessorOnly" : null,
  "annotationProcessorParams" : [ ],
  "annotationProcessors" : [ ],
  "buck.base_path" : "plugins/foo",
  "deps" : [ ],
  "exportedDeps" : [ ],
  "extraArguments" : [ ],
  "javac" : null,
  "javacJar" : null,
  "name" : "foo",
  "postprocessClassesCommands" : [ ],
  "proguardConfig" : null,
  "providedDeps" : [ ],
  "resources" : [ ],
  "resourcesRoot" : null,
  "source" : null,
  "srcs" : [ "src/main/java/org/ostrovsky/buck/Main.java" ],
  "target" : null,
  "type" : "java_library",
  "visibility" : [ "PUBLIC" ]
}
]

$ buck targets --show_output plugins/foo:foo
Not using buckd because watchman isn't installed.
[+] PARSING BUCK FILES...0,1s
//plugins/foo:foo buck-out/gen/plugins/foo/lib__foo__output/foo.jar

Running it creates the artifact buck-out/gen/plugins/foo/lib__foo__output/foo.jar:

$ buck build plugins/foo:foo
Not using buckd because watchman isn't installed.
[-] PROCESSING BUCK FILES...FINISHED 0,0s
[-] BUILDING...FINISHED 0,2s (1/1 JOBS)

$ ls -all buck-out/gen/plugins/foo/lib__foo__output/foo.jar
-rw-r--r-- 1 davido users 1589 Jun 13 06:32 buck-out/gen/plugins/foo/lib__foo__output/foo.jar

However, when this artifact is removed for some reasons, I would expect Buck invocation re-creates it, but it doesn't happen:

$ rm buck-out/gen/plugins/foo/lib__foo__output/foo.jar

$ buck build plugins/foo:foo
Not using buckd because watchman isn't installed.
[-] PROCESSING BUCK FILES...FINISHED 0,0s
[-] BUILDING...FINISHED 0,2s (1/1 JOBS)

$ ls -all buck-out/gen/plugins/foo/lib__foo__output/foo.jar
ls: cannot access buck-out/gen/plugins/foo/lib__foo__output/foo.jar: No such file or directory

Even when I remove the cache entry for this target, it's not re-created:

$ buck targets --show-rulekey plugins/foo:foo
Not using buckd because watchman isn't installed.
[+] PARSING BUCK FILES...0,1s
//plugins/foo:foo 77c8818e718a4f5d1356ed0347e9fd09cfdac92b

$ rm -rf ./buck-out/cache/77c8818e718a4f5d1356ed0347e9fd09cfdac92b

$ buck build plugins/foo:foo
Not using buckd because watchman isn't installed.
[-] PROCESSING BUCK FILES...FINISHED 0,0s
[-] BUILDING...FINISHED 0,2s (1/1 JOBS)

$ ls -all buck-out/gen/plugins/foo/lib__foo__output/foo.jar
ls: cannot access buck-out/gen/plugins/foo/lib__foo__output/foo.jar: No such file or directory

The only known way to recover from here is to run buck clean:

$ buck clean
Not using buckd because watchman isn't installed.

$ buck build plugins/foo:foo
Not using buckd because watchman isn't installed.
[-] PROCESSING BUCK FILES...FINISHED 0,0s
[-] BUILDING...FINISHED 0,6s (1/1 JOBS)

$ ls -all buck-out/gen/plugins/foo/lib__foo__output/foo.jar
-rw-r--r-- 1 davido users 1589 Jun 13 06:40 buck-out/gen/plugins/foo/lib__foo__output/foo.jar
@Coneko
Copy link

Coneko commented Jun 13, 2015

There is some metadata associated with each rule saved in buck-out, Buck always assumes the metadata and the built artefacts in buck-out are in a consistent state, so modifying the files in buck-out manually will break Buck in weird ways.

Why are you deleting the jar file? Is there a particular use case?

@davido
Copy link
Contributor Author

davido commented Jun 15, 2015

Why are you deleting the jar file? Is there a particular use case?

Let's say we have some genrue() with name bar, that depends on foo.jar, with cmd = do_some magic and with out = '__bar__'. Let's assume I executed it once: buck build bar.

The magic Python script do_some magic had some side effect(s), for example deployed some stuff to local Maven repository. Now, for some reasons, this side effect was undone. What is the supported way to re-execute only do_some magic python script, by re-invocing the buck build bar? All these attempts don't work:

  • rm buck-out/<...>/bar
  • invalidate rule_key for bar rule

Probably buck clean or rm -rf buck-out or rm -rf buck-cache followed by buck build bar would work. But shouldn't there be a way to somehow re-execute one single rule, with very little (or without any) impact on already built rules and Buck cache?

@Coneko
Copy link

Coneko commented Jun 15, 2015

Unfortunately there isn't.

buck build --no-cache bar lets you ignore the buck-cache directory, but there is no way of ignoring the buck-out directory.

Buck assumes all rules declare all their inputs and only depend on them. We're about to introduce more advanced caching that absolutely depends on this. Having side effects in rules will always cause problems for Buck.

@davido
Copy link
Contributor Author

davido commented Jun 15, 2015

Thanks for the clarification.

@davido davido closed this as completed Jun 15, 2015
@davido
Copy link
Contributor Author

davido commented Jun 17, 2015

I added a Python script to overcome this design limitation and replay Buck command manually: [1]. One improvement for this specific use case would be to add support for '--resolve-macros' option to

  buck targets --json --resolve-macros //foo:bar

invocation.

@shs96c any comments on this requirement? @sdwilsh suggested to ask you before filing a feature request ;-)

@Coneko
Copy link

Coneko commented Jun 18, 2015

I don't think that script is a good idea: what it's working around is not a design limitation as much as it's a design prerequisite.

If you need to do side effects together with the build it's better to do them at the end of the build, outside of Buck, rather than during the build inside the rule executions.

@davido
Copy link
Contributor Author

davido commented Jun 18, 2015

I see your point. In this specific case, we are using python_binary() and deploy to remote or install in local Maven repository built plugin API. As every Buck rule must have some output, we are using fake file in buck-out directory.

But let's say that some one would like to contribute built-in Buck rule for Maven deployment, with fetch_file we do have built-in rule for fetching from Maven. So why not push_file for pushing to Maven?

So let's assume somehow you are able to configure it to either push to local or remote Maven repository. Let's call the outcome of this operation, the file is under "$HOME/.m2/repository/..." or on Google storage bucket, or whatever as a "side effect", because we left buck-out directory (that wouldn't be side effect), right?

I'm curious how this push_file built-in rule can be implemented to be conform with the Buck philosophy:

  • some metadata associated with each rule saved in buck-out
  • rule outcome is cached in Buck cache
  • every side effect breaks Buck in weird way

So that a simple use case like this can be done from within Buck:

  $ buck build push_file_foo_to_local_maven_repository
  $ rm -rf $HOME/.m2/repository/.../foo.jar
  $ buck build push_file_foo_to_local_maven_repository

So that the last buck build invocation will re-push the same (cached and not re-built) artifact to the local maven repository (in this case).

Or are you saying, that push_file can't be a built-in rule in Buck and must always be implemented outside of the Buck?

@Coneko
Copy link

Coneko commented Jun 19, 2015

It either has to be implemented outside of Buck, or as a separate command.

You mention fetch_file, I assume that is the custom rule you implemented in python? Buck's is called remote_file, and only downloads artefacts when you run buck fetch, not during the build.

I can see a buck publish command, that publishes some artefact to a remote repository.

There's also been some discussion of adding an attribute for maven coordinates on prebuilt_jar, or add a prebuilt_java_library with such an attribute, to better integrate fetching and building.

In that vein, you could add a maven coordinate to a java_library, and use the hypothetical buck publish command to publish it. It would build the library and publish it to maven or something, I don't know, just throwing darts at the board here.

I think the main thing here is we absolutely do not want side effects during the build.

We can however introduce new operations that depend on building that have side effects. Or new operations that have side effects that have to be run manually before building, and whose effects must be controlled explicitly by the user, so that they can check the results into source control, or do anything they want with them.

But the build itself must remain pure.

@shs96c
Copy link
Contributor

shs96c commented Jun 24, 2015

There's someone working on maven interop right now (we should see some diffs land soon). We'll probably either extend "buck install" or add a new command to handle pushing artifacts to maven central.

If you're interested in hacking on it, I think the first step would be to add an optional "maven_coord", "binary_sha1" and "source_sha1" to "prebuilt_jar" and allow us to (optionally) download the jar by adding "remote_file" rules into the action graph if necessary.

There's still some thinking to be done on minimizing the number of artifacts that we'd push to a maven repo --- naively, you can imagine one artifact per buck target, but that would be a nightmare to maintain, and I'd not want to have to handle versioning :)

Conceptually, Buck is a build tool, rather than SDLC tool. More concretely, "buck build" is just for the assembly of artifacts: side-effects break the functional model it's based on. Other phases of an artifact's lifecycle are handled by separate commands, and support side-effects. For example, things like deploying or running commands are handled by "buck install", "buck run" and "buck test" respectively.

uwolfer pushed a commit to gerrit-review/gerrit that referenced this issue Jul 2, 2015
Buck extensively uses caching and storing metadata in buck-out
directory, so that it's not possible per design to re-trigger the
execution of custom rule, without wiping out the whole buck-out
directory. See also the discussion on this issue: [1].

The implementation of Maven deployment as a custom build rule with
a side effect is wrong approach to start with. It was only done as a
workaround, because buck doesn't offer `install` or `publish` command
that must not be free of side effects like it's the case with `build`
command. Having side effects with `build` command breaks bucks model.

As workaround for now add standalone Python script, that re-uses Buck
api_{deploy|install} targets, resolves $(location <target>) macros and
executes the deployment by calling mvn.py utility directly:

  $ tools/maven/api.py {deploy,install}

Dry run mode is supported as well:

  $ tools/maven/api.py -n {deploy,install}

[1] facebook/buck#342

Change-Id: I7fb86ad6967a1fa1e7ac842ba5e0e8cf0103b773
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants