add multipart file upload #268

benzolium · 2024-05-25T12:18:20Z

Resolves #154

@pkqk this a draft. Please take a look. Any suggestions or remarks?

I'm currently testing it manually.
Though I'm using almost the same approach for 6 month on production. 🙂

benzolium · 2024-05-26T04:30:44Z

There is a problem with logging.
My code:

bramble/instrumentation.go

Line 58 in 3dc62ad

log.WithFields(log.Fields(e.fields)).Debug(e.name)

breaks tests in instrumentation_test.go:
However, code from the head:

bramble/instrumentation.go

Line 55 in 949badb

Info(e.name)

just logs the whole requests body with log level info. Which is kinda not ok in case of large files (it even caused me some OOMs).

For now I've just reverted my code with changes to log level, but it is an issue to address.

pkqk · 2024-05-26T22:24:41Z

Hi @benzolium,

This looks cool, I'll give it a test when I have a chance. Is it possible you could add an example server to the examples folder to show it in use?

benzolium · 2024-05-27T08:53:07Z

@pkqk sure. Done.

I've also added one more test case and updated docs with headers plugin config: to make file upload work we must pass Content-Type headers, so I've updated gateway.json config.

benzolium · 2024-05-27T09:02:00Z

There’s still a problem with logging the whole body. For large files it will be a big issue.

@pkqk i cannot just remove logging of body completely because there is some instrumentation logic relying on that (there are even tests for having body in logs). So what should I do? Remove body from logging by checking content type header?

pkqk · 2024-05-28T02:55:00Z

It's probably worth removing the request body from the logging by default, it was in there initially as it was useful during early development.

For now you could stop it logging the body on a multipart/form-data request with something like:

diff --git a/middleware.go b/middleware.go
index a370351..f74f874 100644
--- a/middleware.go
+++ b/middleware.go
@@ -104,15 +104,20 @@ func addRequestBody(e *event, r *http.Request, buf bytes.Buffer) {
 	contentType := r.Header.Get("Content-Type")
 	e.addField("request.content-type", contentType)
 
-	if r.Method != http.MethodHead &&
-		r.Method != http.MethodGet &&
-		contentType == "application/json" {
-		var payload interface{}
-		if err := json.Unmarshal(buf.Bytes(), &payload); err == nil {
-			e.addField("request.body", &payload)
-		} else {
+	if r.Method != http.MethodHead && r.Method != http.MethodGet {
+		switch contentType {
+		case "application/json":
+			var payload interface{}
+			if err := json.Unmarshal(buf.Bytes(), &payload); err == nil {
+				e.addField("request.body", &payload)
+			} else {
+				e.addField("request.body", buf.String())
+				e.addField("request.error", err)
+			}
+		case "multipart/form-data":
+			e.addField("request.body", fmt.Sprintf("%d bytes", len(buf.Bytes())))
+		default:
 			e.addField("request.body", buf.String())
-			e.addField("request.error", err)
 		}
 	} else {
 		e.addField("request.body", buf.String())

client.go

pkqk · 2024-05-29T01:33:56Z

@benzolium I tried testing the example service but I'm getting these errors

{
  "errors": [
    {
      "message": "unexpected response code: 422 Unprocessable Entity",
      "locations": [
        {
          "line": 2,
          "column": 2
        },
        {
          "line": 3,
          "column": 2
        }
      ],
      "extensions": {
        "selectionSet": "{ uploadGizmoFile(upload: $fileA) uploadGadgetFile(upload: {upload:$fileB}) }"
      }
    }
  ],
  "data": null
}

This is the curl command I'm using

curl --url http://localhost:8082/query \
  --form 'operations={"query":"mutation upload($fileA: Upload!, $fileB: Upload!) {\n\tuploadGizmoFile(upload: $fileA)\n\tuploadGadgetFile(upload: {\n\t\tupload: $fileB\n\t})\n}","variables":{"fileA":null,"fileB":null}}' \
  --form 'map={
 "a": ["variables.fileA"],
 "b": ["variables.fileB"]
}' \
  --form a=@a.txt \
  --form b=@b.txt

benzolium · 2024-05-29T05:27:25Z

@pkqk good catch. Fixed. I was populating files map incorrectly.

client.go

The named structs don't serve a lot of purpose when they are only used in one location. The variables map is being modified in place which is not very visible. This can lead to unexpected bugs.

Bring the stack helper struct definition into the function Remove the unused key field and preset the path to start with "variables".

pkqk · 2024-05-30T03:11:07Z

@benzolium I had a bunch more suggestions that were a bit too big to do through the review system. If you'd like to look at my fork

Also looking through the spec examples, I noticed the current file map approach won't handle arrays of Files, i.e making a file map with variables.fileList.0, variables.fileList.1 etc. So you might need a case to handle []graphql.Upload and []*graphql.Upload in the loop.

Instead of always doing so if the upstream request was. Only services taking an upload need a multipart request, as we can't guarantee all services support the mulitpart transport.

CreateFormFile sets it to application/octet-stream without the option of specifying it, so if the upstream payload had a more specific type that would be lost.

benzolium · 2024-05-30T05:23:12Z

@pkqk thanks for suggestions! I'd definitely apply some of them.

However there are several things I'd like to discuss:

mime.ParseMediaType instead of just HasPrefix (mime is doing much more than checking prefix and content type always starts with type/subtype as of RFC)
combining case graphql.Upload, *graphql.Upload removes some duplicated code, however it requires costly type assertion (which we are already doing)
isMultipart is deciding if downstream request should be a multipart one, but:
- user has already decided that request is multipart with content-type
- I cannot imagine the case where it is possible for user to sent a multipart request and some downstream service should receive post request
- isMultipart iterates variables one more time which is kind of overhead

Should we open a PR to my fork and discuss suggestions there? Or maybe I should apply other suggestions and we could discuss remaining here?

As for array files, I'll look into that. Haven't ever used files array so overlooked.

pkqk · 2024-05-30T21:47:00Z

Hi @benzolium, good questions, performance wise, we are dealing with a lot of json parsing and serialisation which is going to be an order of magnitude slower than any of the checks we do here, so I wouldn't consider optimising these until it proves to be an issue.

parsing the media type using the mime library gives us a few more robustness checks, such as normalising the typename, we could do it with strings.HasPrefix("type/subtype", strings.ToLower(strings.TrimSpace(contentType))) but I think using the mime library is clearer.
I'm not sure of the overhead of the type assertion, but I would expect the compiler to be able to skip it as it's already done it once and the variable value hasn't changed, so I think an optimiser would be able to unroll the case statement.
The multipart request type isn't part of the base graphql transport, so it's possible another service doesn't support it, so if the request was being federated to two services, one that supported multipart and one that didn't then it could break.
iterating over the map twice is more overhead, but O(n) + O(n) is still O(n), only with a small increase in the constant factor.

In general I'd go for clarity and slightly less efficient code than optimising before it's necessary.

pkqk · 2024-06-04T01:20:09Z

@benzolium I can either PR into your branch, or you can cherry pick the ones that you agree with and discuss the other ones here.

benzolium · 2024-06-05T04:31:14Z

Hey @pkqk! Thanks for the answers and for the ping. You can surely PR into my branch.

Sorry for the delay. I’m a little but overwhelmed with work now.

pkqk · 2024-06-05T04:35:14Z

There's no rush @benzolium, looking forward to merging this in but I want us to get it polished and working well before we do.

Multipart suggestions

benzolium · 2024-06-13T19:23:35Z

Hey @pkqk! I've merged your suggestions and added a case for list of files.

benzolium added 2 commits May 25, 2024 15:13

add multipart file upload

b52dec2

test variables mutated body

3dc62ad

benzolium added 2 commits May 26, 2024 07:33

revert instrumentation log level

10c9a9d

test multipart request

9e5d3ac

benzolium marked this pull request as ready for review May 26, 2024 08:21

benzolium requested a review from a team as a code owner May 26, 2024 08:21

benzolium marked this pull request as draft May 26, 2024 08:21

benzolium added 2 commits May 27, 2024 11:47

add headers docs; more client test cases

53c639d

add file upload example service

5e7894c

pkqk reviewed May 28, 2024

View reviewed changes

client.go Outdated Show resolved Hide resolved

benzolium added 2 commits May 28, 2024 07:46

skip logging body for multipart

6d45359

address comments: inverse check content type stmt

65e479f

benzolium marked this pull request as ready for review May 28, 2024 04:55

fix multiple files map

4ccf31e

fix multiple files map

f76bae7

pkqk reviewed May 29, 2024

View reviewed changes

client.go Outdated Show resolved Hide resolved

pkqk reviewed May 29, 2024

View reviewed changes

client.go Outdated Show resolved Hide resolved

pkqk reviewed May 29, 2024

View reviewed changes

client.go Outdated Show resolved Hide resolved

pkqk added 5 commits May 30, 2024 10:16

Use json.NewEncoder to serialize json

5315038

Return file info in example service

c0c4a08

Use multiple return values over a struct

f91e00c

Use multiple return values building multipart body

6939986

The named structs don't serve a lot of purpose when they are only used in one location. The variables map is being modified in place which is not very visible. This can lead to unexpected bugs.

Simplify variable map walk

66fbaf5

Bring the stack helper struct definition into the function Remove the unused key field and preset the path to start with "variables".

Use mime package to parse content type

afe5211

pkqk added 3 commits May 30, 2024 15:26

Determine if multipart request is needed

c77ce1b

Instead of always doing so if the upstream request was. Only services taking an upload need a multipart request, as we can't guarantee all services support the mulitpart transport.

Reduce duplication in file map creation

dac343a

Preserve content-type of uploaded files

16c7212

CreateFormFile sets it to application/octet-stream without the option of specifying it, so if the upstream payload had a more specific type that would be lost.

benzolium and others added 2 commits June 13, 2024 21:17

Merge pull request #1 from pkqk/multipart-suggestions

6dc4170

Multipart suggestions

process arrays of files

3964ed0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multipart file upload #268

add multipart file upload #268

benzolium commented May 25, 2024

benzolium commented May 26, 2024 •

edited

Loading

pkqk commented May 26, 2024

benzolium commented May 27, 2024

benzolium commented May 27, 2024

pkqk commented May 28, 2024

pkqk commented May 29, 2024 •

edited

Loading

benzolium commented May 29, 2024

pkqk commented May 30, 2024 •

edited

Loading

benzolium commented May 30, 2024

pkqk commented May 30, 2024

pkqk commented Jun 4, 2024

benzolium commented Jun 5, 2024

pkqk commented Jun 5, 2024

benzolium commented Jun 13, 2024

add multipart file upload #268

Are you sure you want to change the base?

add multipart file upload #268

Conversation

benzolium commented May 25, 2024

benzolium commented May 26, 2024 • edited Loading

pkqk commented May 26, 2024

benzolium commented May 27, 2024

benzolium commented May 27, 2024

pkqk commented May 28, 2024

pkqk commented May 29, 2024 • edited Loading

benzolium commented May 29, 2024

pkqk commented May 30, 2024 • edited Loading

benzolium commented May 30, 2024

pkqk commented May 30, 2024

pkqk commented Jun 4, 2024

benzolium commented Jun 5, 2024

pkqk commented Jun 5, 2024

benzolium commented Jun 13, 2024

benzolium commented May 26, 2024 •

edited

Loading

pkqk commented May 29, 2024 •

edited

Loading

pkqk commented May 30, 2024 •

edited

Loading