Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple exporters #2760

Closed

Conversation

a-palchikov
Copy link
Contributor

@a-palchikov a-palchikov commented Mar 30, 2022

fixes #1555
This is a follow-up on #1788.

Multiple exporters

This PR enables buildkit to output to multiple destinations:

$ buildctl build \
	    --frontend=dockerfile.v0 \
	    --local context=. \
	    --local dockerfile=. \
	    --export-cache type=inline \
	    --output type=local,dest=dump \
	    --output type=local,dest=dump2 \
	    --output type=image,name=localhost:5000/out:test \
	    --output type=image,name=localhost:5000/out2:latest \
	    --output type=tar,dest=out.tar \
	    --output type=tar,dest=out2.tar \
	    --output type=oci,dest=oci.tar \
	    --output type=docker,dest=docker.tar \
	    --metadata-file metadata.json

No attempt was made to avoid metadata collisions that multiple exporter instances might be running into at this point.

@tonistiigi
Copy link
Member

Have not reviewed this yet but making the returned metadata into array with objects with exporter types does not seem right. I think all exporters would just add their keys to the metadata. Exporters should be careful to not include keys that another exporter might use for a different value.

@a-palchikov
Copy link
Contributor Author

Have not reviewed this yet but making the returned metadata into array with objects with exporter types does not seem right. I think all exporters would just add their keys to the metadata. Exporters should be careful to not include keys that another exporter might use for a different value.

The reasoning behind an array is the freedom it provides regarding the use of multiple instances of the same exporter(s). But if the mental model of the export is the use of unique exporters - e.g. that no exporters can reuse keys / should be careful about overwriting existing keys, then I understand that a single map is a more appealing and simpler approach.

@a-palchikov
Copy link
Contributor Author

@tonistiigi I updated the implementation to pack multiple exporter responses into a single map.

@tonistiigi
Copy link
Member

@a-palchikov Check the CI results.

@a-palchikov a-palchikov force-pushed the dima/enable-multi-exporters branch 2 times, most recently from 664635e to 1a24044 Compare April 12, 2022 21:08
@a-palchikov
Copy link
Contributor Author

@tonistiigi Let me know if there's anything I need to do on my side.

@tonistiigi tonistiigi self-requested a review April 13, 2022 21:18
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests still look red in CI.

message SolveResponse {
map<string, string> ExporterResponse = 1;
ExporterResponse ExporterResponse = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backward compatible in proto level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is, I'll go back to the previous message layout - there is no more need for this.

map<string, string> Response = 1;
// Name identifies the exporter. It is empty
// for the common exporter metadata (e.g. cache exporter)
string Name = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaics this is unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - will remove.

@@ -1893,6 +1895,74 @@ func testUser(t *testing.T, sb integration.Sandbox) {
checkAllReleasable(t, c, sb, true)
}

func testPreventsMultipleExportWithSameExporter(t *testing.T, sb integration.Sandbox) {
integration.SkipIfDockerd(t, sb, "multiple exporters")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple local exporters should be allowed in dockerd.

}

func testMultipleExporters(t *testing.T, sb integration.Sandbox) {
integration.SkipIfDockerd(t, sb, "multiple exporters")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip should not be required here but need to change the image exporter name to "moby" if Dockerd. Eg. look at testBuildInfoExporter().

Attrs: map[string]string{
"name": strings.Join([]string{target1, target2}, ","),
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tar exporters here as well for good measure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just realized that this could be a bigger issue with the multiple file sender implementations like local and tar exporters. There needs to be some sort of fan-out for the multiple file exporters on the gRPC level.

control/control.go Outdated Show resolved Hide resolved
exporter/containerimage/exptypes/types.go Outdated Show resolved Hide resolved
solver/llbsolver/solver.go Outdated Show resolved Hide resolved
solver/llbsolver/solver.go Outdated Show resolved Hide resolved
exporterResponse, err = e.Export(ctx, inp, j.SessionID)
return err
if err := inBuilderContext(ctx, j, exp.Name(), "", func(ctx context.Context, _ session.Group) error {
resp, err := exp.Export(ctx, inp, j.SessionID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should run in parallel if multiple set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try and see if it can be made to run in parallel. Hopefully there's no mutable state to share.

@a-palchikov
Copy link
Contributor Author

a-palchikov commented Apr 27, 2022

My recent changes had another undesired ripple effect - the propagation of compression to the inline cache exporter. I traced the changes to this PR and now have a question regarding this behavior in the presence of multiple exporters. The behavior could be extended as following:

  • inline cache exporter inherits the compression from the exporter if only one exporter is used
  • inline cache exporter inherits compression configuration from some exporter (e.g. the first configured) if multiple exporters have been specified. This is going to be confusing.
  • inline cache exporter inherits compression configuration from an image exporter (OCI, image).

Has a support for explicit compression configuration for the inline cache exporter been considered? (sorry if I'm out of the loop here):

     --output type=image,name=registry:5000/image:1,push=true,compression=zstd \
     --export-cache type=inline,compression=zstd

I know this was possibly a valid choice with just one export target, but now that we want multiple exporters at a time and I think it's valid to assume they might have different compression requirements.

@sipsma sipsma mentioned this pull request Apr 28, 2022
14 tasks
@a-palchikov a-palchikov marked this pull request as draft May 2, 2022 13:54
@tonistiigi
Copy link
Member

tonistiigi commented May 10, 2022

Has a support for explicit compression configuration for the inline cache exporter been considered? (sorry if I'm out of the loop here):

Inline cache export should not define a compression because it shouldn't create any new layers. Its job is to match the compression of the exported image. So there is always 1-to-1 match between image exporter and inline cache exporter. If there are multiple image exporters with multiple compressions then the inline cache needs to also run multiple times with each result connected to the correct image export.

Did this appear from making the exporters run in parallel or just a new aspect you didn't think about before? If former then we could leave that to follow-up.

FYI @ktock

@a-palchikov
Copy link
Contributor Author

No, it was something I did not think about before. Thanks for explaining - I get it now. Also sorry it is taking that much time - I have actually started looking into a way to make multiple client exporters (file and directory) work together after realizing this and thinking about your suggestion.

Currently, it is not possible to run two or more local exporters (i.e. by changing the destination) or write multiple tarballs due to the limitation of a single FileSendServer. I'm running with an idea of implementing a generic FileSendServer for multiple sinks (multiple files or multiple directories) and we'll see if that pans out.

var exporters []exporter
ids := make(map[string]int)
for _, exp := range opt.Exports {
if id, ok := ids[exp.Type]; !ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A naive way to auto-id the exporters. Potentially could also be provided id from configuration.

}

func (e *imageExporterInstance) ExportImage(ctx context.Context, src exporter.Source, inlineCache exptypes.InlineCache, sessionID string) (map[string]string, error) {
meta := make(map[string][]byte)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy of the metadata as the exporters are executed concurrently.

@@ -113,10 +124,29 @@ func syncTargetDiffCopy(ds grpc.ServerStream, dest string) error {
return true
}
}(),
}))
})
for _, dest := range dests[1:] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs unconditionally of the previous error as fsutil.Receive seems to alwalys fail with context.Cancelled (even on master) and I did not look into this just yet.

}

func CopyToCaller(ctx context.Context, fs fsutil.FS, c session.Caller, progress func(int, bool)) error {
method := session.MethodURL(_FSSend_serviceDesc.ServiceName, "diffcopy")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to add another grpc server to sync directory for the local exporter since it was impossible (well, complicated) to make both file exporters and the local exporter run on the same protocol.

I think it could have already been the precedent with the FileSync service but I might be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need another service here.

We can do what we use for multiplexing input directories, using the grpc header. So in the filesend handler, we should get the header value (and use something similar to the DirSource interface).

@a-palchikov
Copy link
Contributor Author

I also left the metadata conflict resolution up for discussion and potentially another PR.

@a-palchikov a-palchikov marked this pull request as ready for review June 28, 2022 19:01
@AkihiroSuda
Copy link
Member

Needs rebase

@a-palchikov a-palchikov force-pushed the dima/enable-multi-exporters branch from 1fea6d7 to 27a1987 Compare May 1, 2023 15:53
@a-palchikov a-palchikov requested a review from crazy-max May 8, 2023 10:40
@tonistiigi tonistiigi added this to the v0.12.0 milestone May 22, 2023
Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-palchikov sorry for the massively delayed review.

I've had a glance through, I think the protocol changes make sense - I've just got a couple notes around the local exporting, I think we can simplify some stuff there.

It's been a while - I'm happy to help carry this, otherwise, I'll definitely stay on top of reviews and feedback for this, really sorry for the delays again ❤️

Comment on lines 207 to 202
src := *inp
src.Metadata = meta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making a copy, we should overwrite inp with the copy so we don't accidentally use it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also updated the inputs to be values instead of references to present a stronger cue that they're not supposed to relay values back to caller.

r.mu.Unlock()
}

type mutableResult struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is a good point.

I'd rather we rework the Result type though (making all of the public types into private ones), instead of having wrapper types around it.

}

func CopyToCaller(ctx context.Context, fs fsutil.FS, c session.Caller, progress func(int, bool)) error {
method := session.MethodURL(_FSSend_serviceDesc.ServiceName, "diffcopy")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need another service here.

We can do what we use for multiplexing input directories, using the grpc header. So in the filesend handler, we should get the header value (and use something similar to the DirSource interface).

}
return inBuilderContext(ctx, job, exp.Name(), job.SessionID+"-export", func(ctx context.Context, _ session.Group) (err error) {
var dref exporter.DescriptorReference
resps[i], dref, err = imageExporter.ExportImage(ctx, inp, *inlineCache, sessionID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the interface juggling by passing a function that runs the runInlineCacheExporter function. Exporters that don't support inline cache just don't need to call that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface helps to determine whether it's necessary to run the inline cache exporter in the first place - if that were a simple function / or placed into Source as additional metadata before the corresponding exporter's Export is entered, such optimization wouldn't be possible. I'm ok dropping it if it really simplifies things by a lot in your POV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Result type and the need to handle concurrent updates - I see the reasons of simplicity of having a single thread-safe type but I also don't see how to do this with as little changes as possible - this change will be massive, penetrating a lot of places due to potential change of APIs - e.g. instead of allowing direct attribute access, it would have getters/setters instead to exercise greater control (that concerns the metadata of all attributes).

Which also makes me wonder if there won't be a strong case for the 2 type hierarchy - a simple Result not safe for concurrent use (and suiting the majority of use-cases) but also a thread-safe wrapper to compensate for cases where there is a legitimate need to populate the result from multiple goroutines. So far, I could only find a single place where that was necessary. Please let me know if I'm in the wrong here.

With that said, can we also reach some sort of intermediate ground and delay this refactoring to a dedicated issue?

Attrs: exp.Attrs,
})
} else if localExporter == nil {
// TODO(dima): different options per local exporter?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we do my suggestion below for multi-plexing between multiple output directories, this issue should go away? We then shouldn't need to do the de-duplication.

@a-palchikov
Copy link
Contributor Author

@jedevc No worries - I'll look into your suggestions, thanks!

@tonistiigi tonistiigi modified the milestones: v0.12.0, v0.13 Jul 12, 2023
@molinch
Copy link

molinch commented Jul 18, 2023

Do you already have a clue if this feature will land in a release, or in a preview release?

@a-palchikov
Copy link
Contributor Author

Do you already have a clue if this feature will land in a release, or in a preview release?

I'm working on the review comments, albeit slowly. Sorry for the delay.

@a-palchikov a-palchikov marked this pull request as draft July 22, 2023 18:49
fahedouch and others added 7 commits August 2, 2023 12:49
In order to support multiple local exporters, a new gRPC API has been
added in order to distinguish the origin of the message packet.

Also, I felt that the solver/result package was slightly confusing with
its use of the mutex to protect some metadata access patterns while not
providing enough protection for potential data races. Instead, where
necessary to protect against data races, the result values are explicitly
duplicated (including the metadata).

Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>
Signed-off-by: a-palchikov <deemok@gmail.com>
…h the implementations

Signed-off-by: a-palchikov <deemok@gmail.com>
remains as to what to do to possibly different options per local
exporter since it's currently replicated on the client but the
attestation files are generated on the server.

Signed-off-by: a-palchikov <deemok@gmail.com>
Signed-off-by: a-palchikov <deemok@gmail.com>
Signed-off-by: a-palchikov <deemok@gmail.com>
 * assign the input source in exporters to avoid re-using the stale
   reference
 * remove the FSSend gRPC service - instead rely on gRPC metadata to
   communicate the functional protocol

Signed-off-by: a-palchikov <deemok@gmail.com>
thread-safe Result.
Address review comments:
 * Re-use the input variable in exporters to avoid accidental re-use of
   stale values

Signed-off-by: a-palchikov <deemok@gmail.com>
@molinch
Copy link

molinch commented Sep 6, 2023

Do you already have a clue if this feature will land in a release, or in a preview release?

I'm working on the review comments, albeit slowly. Sorry for the delay.

No worries thanks a lot for the feedback !
I see that some of it was merged, but not all yet, I guess nothing can be tested as of now. Correct?

@a-palchikov
Copy link
Contributor Author

No worries thanks a lot for the feedback ! I see that some of it was merged, but not all yet, I guess nothing can be tested as of now. Correct?

There's actually a replacement PR for this - I'll close this one to avoid confusion.

@a-palchikov a-palchikov closed this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support multiple exporters for build
9 participants