Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multipart file upload #268

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

benzolium
Copy link
Contributor

Resolves #154

@pkqk this a draft. Please take a look. Any suggestions or remarks?

I'm currently testing it manually.
Though I'm using almost the same approach for 6 month on production. 馃檪

@benzolium
Copy link
Contributor Author

benzolium commented May 26, 2024

There is a problem with logging.
My code:

log.WithFields(log.Fields(e.fields)).Debug(e.name)

breaks tests in instrumentation_test.go:
However, code from the head:
Info(e.name)

just logs the whole requests body with log level info. Which is kinda not ok in case of large files (it even caused me some OOMs).

For now I've just reverted my code with changes to log level, but it is an issue to address.

@benzolium benzolium marked this pull request as ready for review May 26, 2024 08:21
@benzolium benzolium requested a review from a team as a code owner May 26, 2024 08:21
@benzolium benzolium marked this pull request as draft May 26, 2024 08:21
@pkqk
Copy link
Member

pkqk commented May 26, 2024

Hi @benzolium,

This looks cool, I'll give it a test when I have a chance. Is it possible you could add an example server to the examples folder to show it in use?

@benzolium
Copy link
Contributor Author

@pkqk sure. Done.

I've also added one more test case and updated docs with headers plugin config: to make file upload work we must pass Content-Type headers, so I've updated gateway.json config.

@benzolium
Copy link
Contributor Author

There鈥檚 still a problem with logging the whole body. For large files it will be a big issue.

@pkqk i cannot just remove logging of body completely because there is some instrumentation logic relying on that (there are even tests for having body in logs). So what should I do? Remove body from logging by checking content type header?

@pkqk
Copy link
Member

pkqk commented May 28, 2024

It's probably worth removing the request body from the logging by default, it was in there initially as it was useful during early development.

For now you could stop it logging the body on a multipart/form-data request with something like:

diff --git a/middleware.go b/middleware.go
index a370351..f74f874 100644
--- a/middleware.go
+++ b/middleware.go
@@ -104,15 +104,20 @@ func addRequestBody(e *event, r *http.Request, buf bytes.Buffer) {
 	contentType := r.Header.Get("Content-Type")
 	e.addField("request.content-type", contentType)
 
-	if r.Method != http.MethodHead &&
-		r.Method != http.MethodGet &&
-		contentType == "application/json" {
-		var payload interface{}
-		if err := json.Unmarshal(buf.Bytes(), &payload); err == nil {
-			e.addField("request.body", &payload)
-		} else {
+	if r.Method != http.MethodHead && r.Method != http.MethodGet {
+		switch contentType {
+		case "application/json":
+			var payload interface{}
+			if err := json.Unmarshal(buf.Bytes(), &payload); err == nil {
+				e.addField("request.body", &payload)
+			} else {
+				e.addField("request.body", buf.String())
+				e.addField("request.error", err)
+			}
+		case "multipart/form-data":
+			e.addField("request.body", fmt.Sprintf("%d bytes", len(buf.Bytes())))
+		default:
 			e.addField("request.body", buf.String())
-			e.addField("request.error", err)
 		}
 	} else {
 		e.addField("request.body", buf.String())

client.go Outdated Show resolved Hide resolved
@benzolium benzolium marked this pull request as ready for review May 28, 2024 04:55
@pkqk
Copy link
Member

pkqk commented May 29, 2024

@benzolium I tried testing the example service but I'm getting these errors

{
  "errors": [
    {
      "message": "unexpected response code: 422 Unprocessable Entity",
      "locations": [
        {
          "line": 2,
          "column": 2
        },
        {
          "line": 3,
          "column": 2
        }
      ],
      "extensions": {
        "selectionSet": "{ uploadGizmoFile(upload: $fileA) uploadGadgetFile(upload: {upload:$fileB}) }"
      }
    }
  ],
  "data": null
}

This is the curl command I'm using

curl --url http://localhost:8082/query \
  --form 'operations={"query":"mutation upload($fileA: Upload!, $fileB: Upload!) {\n\tuploadGizmoFile(upload: $fileA)\n\tuploadGadgetFile(upload: {\n\t\tupload: $fileB\n\t})\n}","variables":{"fileA":null,"fileB":null}}' \
  --form 'map={
 "a": ["variables.fileA"],
 "b": ["variables.fileB"]
}' \
  --form a=@a.txt \
  --form b=@b.txt

@benzolium
Copy link
Contributor Author

@pkqk good catch. Fixed. I was populating files map incorrectly.

client.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
pkqk added 5 commits May 30, 2024 10:16
The named structs don't serve a lot of purpose when they are only used
in one location.

The variables map is being modified in place which is not very visible.
This can lead to unexpected bugs.
Bring the stack helper struct definition into the function
Remove the unused key field and preset the path to start with
"variables".
@pkqk
Copy link
Member

pkqk commented May 30, 2024

@benzolium I had a bunch more suggestions that were a bit too big to do through the review system. If you'd like to look at my fork

Also looking through the spec examples, I noticed the current file map approach won't handle arrays of Files, i.e making a file map with variables.fileList.0, variables.fileList.1 etc. So you might need a case to handle []graphql.Upload and []*graphql.Upload in the loop.

pkqk added 3 commits May 30, 2024 15:26
Instead of always doing so if the upstream request was. Only services
taking an upload need a multipart request, as we can't guarantee all
services support the mulitpart transport.
CreateFormFile sets it to application/octet-stream without the option of
specifying it, so if the upstream payload had a more specific type
that would be lost.
@benzolium
Copy link
Contributor Author

@pkqk thanks for suggestions! I'd definitely apply some of them.

However there are several things I'd like to discuss:

  • mime.ParseMediaType instead of just HasPrefix (mime is doing much more than checking prefix and content type always starts with type/subtype as of RFC)
  • combining case graphql.Upload, *graphql.Upload removes some duplicated code, however it requires costly type assertion (which we are already doing)
  • isMultipart is deciding if downstream request should be a multipart one, but:
    • user has already decided that request is multipart with content-type
    • I cannot imagine the case where it is possible for user to sent a multipart request and some downstream service should receive post request
    • isMultipart iterates variables one more time which is kind of overhead

Should we open a PR to my fork and discuss suggestions there? Or maybe I should apply other suggestions and we could discuss remaining here?

As for array files, I'll look into that. Haven't ever used files array so overlooked.

@pkqk
Copy link
Member

pkqk commented May 30, 2024

Hi @benzolium, good questions, performance wise, we are dealing with a lot of json parsing and serialisation which is going to be an order of magnitude slower than any of the checks we do here, so I wouldn't consider optimising these until it proves to be an issue.

  • parsing the media type using the mime library gives us a few more robustness checks, such as normalising the typename, we could do it with strings.HasPrefix("type/subtype", strings.ToLower(strings.TrimSpace(contentType))) but I think using the mime library is clearer.
  • I'm not sure of the overhead of the type assertion, but I would expect the compiler to be able to skip it as it's already done it once and the variable value hasn't changed, so I think an optimiser would be able to unroll the case statement.
  • The multipart request type isn't part of the base graphql transport, so it's possible another service doesn't support it, so if the request was being federated to two services, one that supported multipart and one that didn't then it could break.
  • iterating over the map twice is more overhead, but O(n) + O(n) is still O(n), only with a small increase in the constant factor.

In general I'd go for clarity and slightly less efficient code than optimising before it's necessary.

@pkqk
Copy link
Member

pkqk commented Jun 4, 2024

@benzolium I can either PR into your branch, or you can cherry pick the ones that you agree with and discuss the other ones here.

@benzolium
Copy link
Contributor Author

Hey @pkqk! Thanks for the answers and for the ping. You can surely PR into my branch.

Sorry for the delay. I鈥檓 a little but overwhelmed with work now.

@pkqk
Copy link
Member

pkqk commented Jun 5, 2024

There's no rush @benzolium, looking forward to merging this in but I want us to get it polished and working well before we do.

@benzolium
Copy link
Contributor Author

Hey @pkqk! I've merged your suggestions and added a case for list of files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request syntax error while trying to upload file
2 participants