Skip to content

Commit

Permalink
Working getInputs override
Browse files Browse the repository at this point in the history
  • Loading branch information
fmeum committed Jul 28, 2022
1 parent 2f7d965 commit a2fdd4a
Show file tree
Hide file tree
Showing 9 changed files with 289 additions and 22 deletions.
10 changes: 6 additions & 4 deletions scripts/bootstrap/compile.sh
Expand Up @@ -335,8 +335,9 @@ if [ "${PLATFORM}" = "windows" ]; then
# We don't rely on runfiles trees on Windows
cat <<'EOF' >${ARCHIVE_DIR}/build-runfiles${EXE_EXT}
#!/bin/sh
mkdir -p $2
cp $1 $2/MANIFEST
# Skip over --allow_relative.
mkdir -p $3
cp $2 $3/MANIFEST
EOF
else
cat <<'EOF' >${ARCHIVE_DIR}/build-runfiles${EXE_EXT}
Expand All @@ -350,8 +351,9 @@ else
# bootstrap version of Bazel, but we'd still need a shell wrapper around it, so
# it's not clear whether that would be a win over a few lines of Lovecraftian
# code)
MANIFEST="$1"
TREE="$2"
# Skip over --allow_relative.
MANIFEST="$2"
TREE="$3"
rm -fr "$TREE"
mkdir -p "$TREE"
Expand Down
1 change: 1 addition & 0 deletions src/main/java/com/google/devtools/build/lib/analysis/BUILD
Expand Up @@ -249,6 +249,7 @@ java_library(
"starlark/StarlarkRuleContext.java",
"starlark/StarlarkRuleTransitionProvider.java",
"starlark/StarlarkTransition.java",
"starlark/UnresolvedSymlinkAction.java",
"test/AnalysisTestActionBuilder.java",
"test/BaselineCoverageAction.java",
"test/CoverageCommon.java",
Expand Down
Expand Up @@ -13,16 +13,19 @@
// limitations under the License.
package com.google.devtools.build.lib.analysis;

import static com.google.common.collect.ImmutableList.toImmutableList;
import static java.nio.charset.StandardCharsets.ISO_8859_1;
import static java.nio.charset.StandardCharsets.UTF_8;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableList;
import com.google.devtools.build.lib.actions.ActionExecutionContext;
import com.google.devtools.build.lib.actions.ActionKeyContext;
import com.google.devtools.build.lib.actions.ActionOwner;
import com.google.devtools.build.lib.actions.Artifact;
import com.google.devtools.build.lib.analysis.actions.AbstractFileWriteAction;
import com.google.devtools.build.lib.analysis.actions.DeterministicWriter;
import com.google.devtools.build.lib.collect.nestedset.NestedSet;
import com.google.devtools.build.lib.collect.nestedset.NestedSetBuilder;
import com.google.devtools.build.lib.collect.nestedset.Order;
import com.google.devtools.build.lib.concurrent.ThreadSafety.Immutable;
Expand Down Expand Up @@ -99,6 +102,8 @@ void writeEntry(

private final boolean remotableSourceManifestActions;

private NestedSet<Artifact> symlinkArtifacts = null;

/**
* Creates a new AbstractSourceManifestAction instance using latin1 encoding to write the manifest
* file and with a specified root path for manifest entries.
Expand Down Expand Up @@ -135,6 +140,17 @@ public SourceManifestAction(
this.remotableSourceManifestActions = remotableSourceManifestActions;
}

@Override
public synchronized NestedSet<Artifact> getInputs() {
if (symlinkArtifacts == null) {
ImmutableList<Artifact> symlinks = runfiles.getArtifacts().toList().stream()

This comment has been minimized.

Copy link
@alexjski

alexjski Aug 15, 2022

toList is certainly something we would want to avoid given that will expand the nested set.

This comment has been minimized.

Copy link
@fmeum

fmeum Aug 15, 2022

Author Owner

Do you see a different way to achieve the goal of having SourceManifestFileAction depend on a subset of the runfiles? Ultimately, I need it to be able to read the symlinks in order to write the proper paths into the manifest.

This comment has been minimized.

Copy link
@alexjski

alexjski Aug 15, 2022

One thing which comes to my mind is that we could maybe split the symlink artifacts at the level of runfiles (that would spill to the API though, so not great)? Then we would have a nested set of those which we could add as inputs in the constructor.

Another thing I could dream is a nested set with lazy filtering (the filter happens during to{List,Set}) which doesn't expand until execution.

That may actually not be such a big deal, since we typically expand the runfiles::artifacts right after anyway (to compute the action key or write the file itself)--could be totally not worth to defer it by couple instructions. The case where that could matter is access to actions.input using the ActionAPI, which would leak the nested set expansion to the analysis phase, but not sure if this is an important use case.

For now, I would recommend you against doing either of those things since I think it would make most sense to talk to @lberki about this feature at a higher level first.

This comment has been minimized.

Copy link
@lberki

lberki Aug 24, 2022

Do I understand correctly that the change to getInputs() is necessary because while regular artifacts are represented in runfiles trees by a symlink to their execpath, unresolved symlinks are represented by the actual symlink and the former does not need the artifact to actually exist, but the latter does (since it needs to do readlink() on it)?

I am not too worried about backwards compatibility because unresolved symlinks are officially experimental so we don't need to worry about keeping behavior backwards compatible when they are enabled. However, one can add a whole nested set of artifacts to Runfiles using addTransitiveArtifacts(), so I don't think separating unresolved symlinks and regular artifacts can be done without a lot of pain (either due to a migration or due to not being able to treat unresolved symlinks the same as regular artifacts). So let's not do that.

To work around that the above dependency, I could imagine digging the generating action of the unresolved symlink out from ActionExecutionContext and thus avoiding the need for this dependency, but that would constrain our future options because then the target of the symlink would need to be determined in the analysis phase, so that's probably a bad option, too.

There has been a lot of talk about adding some sort of filtering to NestedSet, but it's a complicated and contentious question so I'd rather we not make that a dependency of productionizing unresolved symlinks. It's also not obvious how we can make it more efficient than just vanilla toList(). So let's not go in that direction, either.

I do agree with @alexjski that iterating over nested sets is not a great idea, but considering all the bad options above, it looks like the least bad option if we can make it so that the common case (where there are no unresolved symlinks) does not get worse. This change at least adds a cache for the set of unresolved symlinks so it's about as good as can be done without intrusive changes to the action execution machinery, and if it shows up on the performance dashboards, we can think again.

Then next best would be digging the target of the symlink out of ActionExecutionContext() using getGeneratingAction() and an ugly downcast from Action to UnresolvedSymlinkAction then we'll deal with unresolved symlinks whose target is only known in the execution phase when we'll need to.

And if that doesn't work, either, I'd take a closer look at the semantics of runfiles trees and their input and output manifests. We might be able to get away with some careful tweaks (e.g. adding the functionality "this is an unresolved symlink, do readlink() when creating the actual tree") to them (handwave-handwave, I haven't thought this through very well)

So I recommend this course of action:

  1. @alexjski puts this change through our performance test battery. If passes, submit.
  2. If not, use ActionExecutionContext to dig out the targets of unresolved symlinks when writing the manifest, thereby avoiding the need for unresolved symlinks to be in getInputs()
  3. If that does not work, think some more.

WDYT?

This comment has been minimized.

Copy link
@fmeum

fmeum Aug 24, 2022

Author Owner

Do I understand correctly that the change to getInputs() is necessary because while regular artifacts are represented in runfiles trees by a symlink to their execpath, unresolved symlinks are represented by the actual symlink and the former does not need the artifact to actually exist, but the latter does (since it needs to do readlink() on it)?

Yes, precisely.

I am not too worried about backwards compatibility because unresolved symlinks are officially experimental so we don't need to worry about keeping behavior backwards compatible when they are enabled. However, one can add a whole nested set of artifacts to Runfiles using addTransitiveArtifacts(), so I don't think separating unresolved symlinks and regular artifacts can be done without a lot of pain (either due to a migration or due to not being able to treat unresolved symlinks the same as regular artifacts). So let's not do that.

Went there and felt the pain - I agree this is not an option.

  1. @alexjski puts this change through our performance test battery. If passes, submit.

Sounds good, thanks. @alexjski Are you fine with trying this commit or should I turn it into a PR?

  1. If not, use ActionExecutionContext to dig out the targets of unresolved symlinks when writing the manifest, thereby avoiding the need for unresolved symlinks to be in getInputs()

This is a very interesting idea, but it also feels so wrong. I will give it a try if performance tests show that the alternative doesn't work.

This comment has been minimized.

Copy link
@lberki

lberki Aug 24, 2022

This is a very interesting idea, but it also feels so wrong. I will give it a try if performance tests show that the alternative doesn't work.

Yeah, I am only suggesting it out of desperation. In an ideal world, one would sit down and think through the complex interplay of input and output manifests and runfiles trees and local and remote actions because it feels possible to make things work much simpler, but that's a way too big chunk of work to foist on you and this ActionExecutionContext hack is not too limiting as an alternative. All it would need is a nicely worded comment explaining how that solution came to be and why alternatives don't work.

This comment has been minimized.

Copy link
@lberki

lberki Sep 7, 2022

Update. I ran our internal benchmarks myself (build :gws_and_dev_fileset with their blazerc) and the official opinion of the benchmarking suite is that this is even an improvement. However, the standard deviations for build times are quite large and it's surprising that this change can improve things so I'll poke at it a bit more before declaring victory.

This comment has been minimized.

Copy link
@lberki

lberki Sep 9, 2022

Update: I finally got the benchmark suite to pass and I have good news: this change is clean, it can go in as far as performance is concerned. Can you create a pull request from this?

(I'll also add a few comments now in order to spare a review round)

This comment has been minimized.

Copy link
@lberki

lberki Sep 9, 2022

Please add a comment here that this linearizes a nested set but that its performance is measured to be okay and that there are no obviously faster approaches (mention the "reaching into the action graph at the cost of execution phase not being able to affect the target of the symlink" and the "input discovery" approach as alternatives)

This comment has been minimized.

Copy link
@alexjski

alexjski Sep 9, 2022

Pasting one comment from above to make sure you took that into account:

The case where that could matter is access to actions.input using the ActionAPI, which would leak the nested set expansion to the analysis phase.

What that means is that well meaning bzl code may now end up materializing nested sets during analysis phase. Typically, when you ask for ActionApi::inputs that is a nested set, hence safe to take (as long as you don't toList it), here asking for that nested set actually materializes a potentially much bigger one.

This comment has been minimized.

Copy link
@lberki

lberki Sep 9, 2022

Good observation! On the philosophical side, I don't mind the nested set expansion too much because unfortunately we do it all the time, both in Starlark and in Java so this only increases the badness by a tiny bit. It would be different if we were in the middle of a concerted effort to remove nested set expansion from the analysis phase, but AFAICT we don't have such intentions and so we'd be holding Fabian to a standard higher than we hold ourselves to.

The only flaw in that argument is that following your idea of using input discovery would change the result of .inputs() which is technically an incompatible change so eliminating nested set expansion from the analysis phase would need to be an incompatible change but practically, a very tiny incompatible change compared to everything else we'd need to do to achieve that, so meh.

This comment has been minimized.

Copy link
@fmeum

fmeum Sep 9, 2022

Author Owner

It may very well be a terrible idea, but couldn't we override getStarlarkInputs to return an empty depset to prevent any accidental performance regressions? Would you expect people to be interested in the (newly added) inputs of a SourceManifestAction? If they are, they could probably always get them from the runfiles themselves.

This comment has been minimized.

Copy link
@alexjski

alexjski Sep 9, 2022

That's great, didn't realize those are separate methods. You can just emit a starlark error like "not implemented".

This comment has been minimized.

Copy link
@fmeum

fmeum Sep 9, 2022

Author Owner

Returning an empty depset might work better as it

  1. is backwards compatible;
  2. is very close to the correct answer (most inputs won't be symlinks)
  3. preserves the ability for generic code to safely inspect actions and their inputs, since a Starlark "not implemented" error would be fatal and impossible to detect without knowing upfront that SourceManifestAction throws it.

What do you think?

This comment has been minimized.

Copy link
@fmeum

fmeum Sep 11, 2022

Author Owner

While working on the PR, I started to wonder whether using discoverInputs and updateInputs would be cleaner than the ad-hoc caching logic in getInputs?
@lberki Is that what you referred to as the "input discovery" alternative above? Would it have a downside?

This comment has been minimized.

Copy link
@lberki

lberki Sep 14, 2022

Yep, that's the alternative I mentioned above. I didn't think it was preferable to what you have here for two reasons:

  1. It's complicated: It's not an especially straightforward part of Bazel because there are all sorts of odd quirks that one needs to know to be able to write correct code. If we went this way, I'd also ask for test cases to verify the incremental correctness of runfiles trees with respect to changes to input files to make sure that they are updated exactly when they should be.
  2. It's "lying": we already know what the input files are by the end of the analysis phase, so why pretend we don't? I could imagine performance trumping this argument, but my measurements show that it's a no-op, as far as performance is concerned.

(2) is also applicable for @alexjski 's suggestion of overriding getStarlarkInputs(): sure it's possible, but why complicate the life of some future developer?

TL;DR: I prefer the simplest approach which is just outright telling about the set of necessary inputs.

This comment has been minimized.

Copy link
@fmeum

fmeum Sep 14, 2022

Author Owner

Just to clarify: @alexjski just expressed concerns about inputs performance, the suggestion to override getStarlarkInputs() was mine - it's certainly not necessary if you think it's overly complicated. I included it in bazelbuild#16272, but can also drop it.

This comment has been minimized.

Copy link
@lberki

lberki Sep 14, 2022

Ah, my bad; I remembered it wrong and was too lazy to scroll up to verify. I think I'd slightly prefer not to lie, if @alexjski can live with that.

This comment has been minimized.

Copy link
@alexjski

alexjski Sep 15, 2022

I don't have a strong opinion, my slight preference was to fail that starlark API for this action so we don't lie and prevent users from shooting themselves in a foot by sneakily materializing nested sets.

TBH, seems like there are no great options here. I also have a slight preference of (2) over (1), but either could work for me.

This comment has been minimized.

Copy link
@lberki

lberki Sep 15, 2022

I hereby decree that we should not lie then. Failing is appealing, but it looks weird to special-case this one single action, so meh.

.filter(Artifact::isSymlink)
.collect(toImmutableList());
symlinkArtifacts = NestedSetBuilder.wrap(Order.STABLE_ORDER, symlinks);
}
return symlinkArtifacts;
}

@VisibleForTesting
public void writeOutputFile(OutputStream out, EventHandler eventHandler) throws IOException {
writeFile(out, runfiles.getRunfilesInputs(eventHandler, getOwner().getLocation()));
Expand Down Expand Up @@ -227,7 +243,11 @@ public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath, Art
// This trailing whitespace is REQUIRED to process the single entry line correctly.
manifestWriter.append(' ');
if (symlink != null) {
manifestWriter.append(symlink.getPath().getPathString());
if (symlink.isSymlink()) {
manifestWriter.append(symlink.getPath().readSymbolicLink().getPathString());
} else {
manifestWriter.append(symlink.getPath().getPathString());
}
}
manifestWriter.append('\n');
}
Expand Down
Expand Up @@ -36,7 +36,10 @@
import java.io.IOException;
import javax.annotation.Nullable;

/** Action to create a symbolic link. */
/**
* Action to create a symlink to a known-to-exist target with alias semantics similar to a true copy
* of the input (if any).
*/
public final class SymlinkAction extends AbstractAction {
private static final String GUID = "7f4fab4d-d0a7-4f0f-8649-1d0337a21fee";

Expand Down Expand Up @@ -152,13 +155,6 @@ public static SymlinkAction toFileset(
owner, execPath, primaryInput, primaryOutput, progressMessage, TargetType.FILESET);
}

public static SymlinkAction createUnresolved(
ActionOwner owner, Artifact primaryOutput, PathFragment targetPath, String progressMessage) {
Preconditions.checkArgument(primaryOutput.isSymlink());
return new SymlinkAction(
owner, targetPath, null, primaryOutput, progressMessage, TargetType.OTHER);
}

/**
* Creates a new SymlinkAction instance, where an input artifact is not present. This is useful
* when dealing with special cases where input paths that are outside the exec root directory
Expand Down
Expand Up @@ -273,7 +273,7 @@ public void symlink(
? (String) progressMessageUnchecked
: "Creating symlink " + outputArtifact.getExecPathString();

SymlinkAction action;
Action action;
if (targetFile != Starlark.NONE) {
Artifact inputArtifact = (Artifact) targetFile;
if (outputArtifact.isSymlink()) {
Expand Down Expand Up @@ -324,10 +324,10 @@ public void symlink(
}

action =
SymlinkAction.createUnresolved(
new UnresolvedSymlinkAction(
ruleContext.getActionOwner(),
outputArtifact,
PathFragment.create((String) targetPath),
(String) targetPath,
progressMessage);
}
registerAction(action);
Expand Down
@@ -0,0 +1,112 @@
// Copyright 2022 The Bazel Authors. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package com.google.devtools.build.lib.analysis.starlark;

import com.google.common.collect.ImmutableSet;
import com.google.devtools.build.lib.actions.AbstractAction;
import com.google.devtools.build.lib.actions.ActionExecutionContext;
import com.google.devtools.build.lib.actions.ActionExecutionException;
import com.google.devtools.build.lib.actions.ActionKeyContext;
import com.google.devtools.build.lib.actions.ActionOwner;
import com.google.devtools.build.lib.actions.ActionResult;
import com.google.devtools.build.lib.actions.Artifact;
import com.google.devtools.build.lib.actions.Artifact.ArtifactExpander;
import com.google.devtools.build.lib.analysis.actions.SymlinkAction;
import com.google.devtools.build.lib.collect.nestedset.NestedSetBuilder;
import com.google.devtools.build.lib.collect.nestedset.Order;
import com.google.devtools.build.lib.server.FailureDetails;
import com.google.devtools.build.lib.server.FailureDetails.FailureDetail;
import com.google.devtools.build.lib.server.FailureDetails.SymlinkAction.Code;
import com.google.devtools.build.lib.util.DetailedExitCode;
import com.google.devtools.build.lib.util.Fingerprint;
import com.google.devtools.build.lib.vfs.Path;
import com.google.devtools.build.lib.vfs.PathFragment;
import java.io.IOException;
import javax.annotation.Nullable;

/**
* Action to create a possibly unresolved symbolic link to a raw path.
*
* To create a symlink to a known-to-exist target with alias semantics similar to a true copy of the
* input, use {@link SymlinkAction} instead.
*/
public final class UnresolvedSymlinkAction extends AbstractAction {

This comment has been minimized.

Copy link
@lberki

lberki Sep 9, 2022

Why does this have to be split out to a separate action?

This comment has been minimized.

Copy link
@fmeum

fmeum Sep 9, 2022

Author Owner

It doesn't have to be from a technical point of view. I found all the different cases covered by SymlinkAction rather confusing, which probably contributed to unresolved symlinks accidentally being turned into absolute symlinks. Since all existing cases except for unresolved symlinks a) use absolute paths as symlink targets and b) have the semantics of an alias rather than an actual symlink, I felt that a different action would be much easier to reason about. But if you feel that this isn't necessary, let me know and I can try to fix this right in SymlinkAction.

This comment has been minimized.

Copy link
@lberki

lberki Sep 9, 2022

I see. I don't have strong feelings either way. I thought that there was a deeper reason for it, but if not, meh. I'm fine with both alternatives.

private static final String GUID = "0f302651-602c-404b-881c-58913193cfe7";

private final String target;
private final String progressMessage;

public UnresolvedSymlinkAction(
ActionOwner owner,
Artifact primaryOutput,
String target,
String progressMessage) {
super(
owner,
NestedSetBuilder.emptySet(Order.STABLE_ORDER),
ImmutableSet.of(primaryOutput));
this.target = target;
this.progressMessage = progressMessage;
}

@Override
public ActionResult execute(ActionExecutionContext actionExecutionContext)
throws ActionExecutionException {

Path outputPath = actionExecutionContext.getInputPath(getPrimaryOutput());
try {
// TODO: PathFragment#create normalizes the symlink target, which may change how it resolves
// when combined with directory symlinks. Ideally, Bazel's file system abstraction would
// offer a way to create symlinks without any preprocessing of the target.
outputPath.createSymbolicLink(PathFragment.create(target));
} catch (IOException e) {
String message =
String.format(
"failed to create symbolic link '%s' to '%s' due to I/O error: %s",
getPrimaryOutput().getExecPathString(), target, e.getMessage());
DetailedExitCode code = createDetailedExitCode(message, Code.LINK_CREATION_IO_EXCEPTION);
throw new ActionExecutionException(message, e, this, false, code);
}

return ActionResult.EMPTY;
}

@Override
protected void computeKey(
ActionKeyContext actionKeyContext,
@Nullable ArtifactExpander artifactExpander,
Fingerprint fp) {
fp.addString(GUID);
fp.addString(target);
}

@Override
public String getMnemonic() {
return "UnresolvedSymlink";
}

@Override
protected String getRawProgressMessage() {
return progressMessage;
}

private static DetailedExitCode createDetailedExitCode(String message, Code detailedCode) {
return DetailedExitCode.of(
FailureDetail.newBuilder()
.setMessage(message)
.setSymlinkAction(FailureDetails.SymlinkAction.newBuilder().setCode(detailedCode))
.build());
}
}
Expand Up @@ -180,8 +180,8 @@ Command createCommand(Path execRoot, BinTools binTools, Map<String, String> shel
Preconditions.checkNotNull(shellEnvironment);
List<String> args = Lists.newArrayList();
args.add(binTools.getEmbeddedPath(BUILD_RUNFILES).asFragment().getPathString());
args.add("--allow_relative");
if (filesetTree) {
args.add("--allow_relative");
args.add("--use_metadata");
}
args.add(inputManifest.relativeTo(execRoot).getPathString());
Expand Down
Expand Up @@ -48,10 +48,11 @@ public void checkCreatedSpawn() {
assertThat(command.getEnvironment()).isEmpty();
assertThat(command.getWorkingDirectory()).isEqualTo(execRoot.getPathFile());
ImmutableList<String> commandLine = command.getArguments();
assertThat(commandLine).hasSize(3);
assertThat(commandLine).hasSize(4);
assertThat(commandLine.get(0)).endsWith(SymlinkTreeHelper.BUILD_RUNFILES);
assertThat(commandLine.get(1)).isEqualTo("input_manifest");
assertThat(commandLine.get(2)).isEqualTo("output/MANIFEST");
assertThat(commandLine.get(1)).isEqualTo("--allow_relative");
assertThat(commandLine.get(2)).isEqualTo("input_manifest");
assertThat(commandLine.get(3)).isEqualTo("output/MANIFEST");
}

@Test
Expand Down

0 comments on commit a2fdd4a

Please sign in to comment.