Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Binary File Uploads #500

Open
davejohnston opened this issue Dec 12, 2017 · 24 comments
Open

Handling Binary File Uploads #500

davejohnston opened this issue Dec 12, 2017 · 24 comments

Comments

@davejohnston
Copy link

Hi,

I have a protobuf that defines one field

message FileUpload {
bytes fileContents = 1;
}

I want to be able to upload a file, and have the contents of that upload stored in fileContents. When sending json data, I understand that the keys map to grpc message fields. But in the case of a binary upload there is no field name, so how can I map the contents of the request body into a message?

rpc UploadFile(FileUpload) returns (google.protobuf.Empty)) {
    option (google.api.http) = {
        post: "/v1/files"
        body: "*"
    };
}

If I use curl to make a request like this:
curl -X POST --data-binary "@/tmp/test_file.txt" http://localhost:9090/v1/files

The HTTP request looks like this:

POST /v1/cmd/secretfiles HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 1675
Content-Type: application/x-www-form-urlencoded
Expect: 100-continue

uEH2KxtiW5Tpuj7bU5hmPO1h4QZuCcrrNDA/OvNyYmw0U02Uy/mR5iWnS6tMdPmZ
iJBmN06MZ2Khx5rV+rqvwF9CUGYzja/dDCm+2wIDAQABAoIBAB050/iJ0YpyGtig
hRPQ7IetKx8HRfJYLSlYu9+eo4/e7EGAfm1dZWz9pJO0kcnUCn3iOKZUDqdGqIz0
9mXR0wK8DaWgYFf0Wx9d8/EEOdUUrQ9Eh82CCWk94fbCbC/b1NdZ90DaUhIZ1J7y
keCiyuPj5rClhrAdf/GYy8bEXp2W9+zi5vH2dTi3JDe2rImh+urEjCYThV3dxoms
HrwX+nWrwNVXfAIJ3R8ojRoiFQtckYgSytyyYdGaxvaZf4rw3CWK+4U1b6NiC4XP
IDVpBfWSXsbPvliA4A35F/15Let/ASw4YZiUryrsYBVNMBMjiydULFvNX0WQVVJE
@achew22
Copy link
Collaborator

achew22 commented Dec 14, 2017

@davejohnston, thanks for the inquery. Unfortunately at this moment we don't support and don't have a plan to support file uploads. You can see some discussion of the topic here. If you were interested in giving it a go and documenting it I would love to have some notes on strategies that work/don't work for this problem. Wanna give it a try?

@ChinaHDJ1
Copy link

2017.... why no close

@rosspatil
Copy link

I actually get it managed by adding custom routes in runtime Mux.

@berkant
Copy link

berkant commented Jan 24, 2021

@rosspatil How did you do it? Can you share an example?

@ghevge
Copy link

ghevge commented Mar 15, 2021

@rosspatil can you share some details on how you did it? I'm also looking for a solution . Thanks!

@purebadger
Copy link

I did it like so with a custom route:

...
mux := runtime.NewServeMux()
...
// Attachment upload from http/s handled manually
if err := mux.HandlePath("POST", "/v1/publisher/attach/{type}/{identifier}", h.AttachmentHandler); err != nil {
    panic(err)
}

and the h.AttachmentHandler is something like:

func (r *RequestHandler) AttachmentHandler(w http.ResponseWriter, rq *http.Request, params map[string]string) {
        ...
	f, header, err := rq.FormFile("attachment")
	if err != nil {
		zap.L().Warn("Failed getting file from upload", zap.Error(err))
		writeErr(http.StatusInternalServerError, err.Error(), w)
		return
	}
	defer f.Close()

	// Get type, identifier from params
	ofType, ok := params["type"]
	if !ok {
		writeErr(http.StatusBadRequest, "Missing 'type' param", w)
		return
	}

	identifier, ok := params["identifier"]
	if !ok {
		writeErr(http.StatusBadRequest, "Missing 'identifier' param", w)
		return
	}

	err = r.store.Attach(ofType, identifier, header.Filename, f)
	if err != nil {
		writeErr(http.StatusInternalServerError, err.Error(), w)
		return
	}

	w.WriteHeader(http.StatusOK)
}

@johanbrandhorst
Copy link
Collaborator

Hi @purebadger, would you be willing to contribute your solution to our docs to help users in the future?

@purebadger
Copy link

Sure, how do I do this?

@johanbrandhorst
Copy link
Collaborator

I think a new file in https://github.com/grpc-ecosystem/grpc-gateway/tree/master/docs/docs/mapping would be a good place to start. You can follow the general format provided by the other docs pages, and check out https://grpc-ecosystem.github.io/grpc-gateway/ to see what it looks like online.

Let me know if you need anymore pointers!

@rosspatil
Copy link

rosspatil commented Mar 25, 2021

@0xbkt @ghevge
Hi Guys,

Hope this example will help you to achieve binary file uploads -

package main

import (
	"bytes"
	"fmt"
	"io"
	"net/http"

	"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
)

func main() {
	mux := runtime.NewServeMux()
	// Register custom route for POST /upload
	err := mux.HandlePath("POST", "/upload", UploadFile)
	if err != nil {
		panic(err)
	}
	http.ListenAndServe(":8080", mux)
}

func UploadFile(w http.ResponseWriter, r *http.Request, pathParams map[string]string) {
	// Parse Form from the request
	err := r.ParseForm()
	if err != nil {
		panic(err)
	}

	multipartFile, fileHeader, err := r.FormFile("file")
	if err != nil {
		panic(err)
	}

	fmt.Println("filename", fileHeader.Filename)
	fmt.Println("size", fileHeader.Size)
	fmt.Println("file-header", fileHeader.Header)

	buffer := &bytes.Buffer{}
	_, err = io.Copy(buffer, multipartFile)
	if err != nil {
		panic(err)
	}
	// process buffer for further usage
	// ...

	w.WriteHeader(http.StatusOK)
}

@rosspatil
Copy link

@johanbrandhorst If you don't mind, Should I raise a PR for the above example in the doc section -> https://github.com/grpc-ecosystem/grpc-gateway/tree/master/docs/docs/operations for binary file uploads?

@johanbrandhorst
Copy link
Collaborator

Hi Ross, thanks for sharing your solution. Lets give @purebadger a little more time to make their contribution if they wish, I did invite them first 🙂.

@rosspatil
Copy link

Hi @johanbrandhorst, Ok no problem. Let me know if you want me to add it. Thanks 🙂

@jonathanbp
Copy link
Contributor

@jonathanbp = @purebadger ;)

jonathanbp pushed a commit to jonathanbp/grpc-gateway that referenced this issue Mar 26, 2021
Update docs/docs/mapping/binary_file_uploads.md

Co-authored-by: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>

Update docs/docs/mapping/binary_file_uploads.md

Co-authored-by: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>

Fix example up from PR comments
jonathanbp pushed a commit to jonathanbp/grpc-gateway that referenced this issue Mar 26, 2021
jonathanbp added a commit to jonathanbp/grpc-gateway that referenced this issue Mar 26, 2021
johanbrandhorst pushed a commit that referenced this issue Mar 27, 2021
* Add doc with info about binary upload a custom route, for #500

* Add missing return statement
@ghevge
Copy link

ghevge commented May 3, 2021

Hi Folks,
Envoy added last year support for HTTPBody upload: envoyproxy/envoy#10110

I was able to manually modify the swagger.json (added the formData param), and was able to upload the content through envoy, from swagger UI.

Any chance to fix the protoc-gen-grpc-gateway to auto generate this param entry for files upload?

Thanks

"/v4/projects/{projectId}/.upload": {
    "post": {
        "summary": "Upload a project .zip file",
        "description": "Upload a project .zip file",
        "operationId": "FileUpload",
        "responses": {
            "200": {
                "description": "Upload a project .zip file",
                "schema": {
                    "type": "array",
                    "$ref": "#/definitions/api.FileUploadResponse"
                }
            },
            "400": {
                "description": "Request contains invalid arguments",
                "schema": {}
            },
            "401": {
                "description": "Request could not be authorized",
                "schema": {}
            },
            "404": {
                "description": "No content found",
                "schema": {}
            },
            "500": {
                "description": "Internal server error",
                "schema": {}
            },
            "default": {
                "description": "An unexpected error response.",
                "schema": {
                    "$ref": "#/definitions/grpc.gateway.runtime.Error"
                }
            }
        },
        "parameters": [{
                "name": "projectId",
                "in": "path",
                "required": true,
                "type": "string"
            }, {
                "name": "body",
                "in": "formData",
                "type": "file",
                "required": true
            }
        ],
        "tags": ["Projects"]
    }
}
  // File upload
  //
  // File upload
  rpc FileUpload (FileUploadRequest) returns (FileUploadResponse) {
    option (google.api.http) = {
      post : "/v4/projects/{projectId}/.upload"
      body : "content"
    };
    option (grpc.gateway.protoc_gen_swagger.options.openapiv2_operation) = {
      description: "Upload a project .zip file";
      summary: "Upload a project .zip file";
      tags: "Projects";
      responses: {
        key: "200"
        value: {
          description: "Upload a project .zip file";
          schema: {
            json_schema: {
              type: ARRAY;
              ref: '#/definitions/api.FileUploadResponse'
            }
          }
        }
      }
      responses: {
        key: "401"
        value: {
          description: "Request could not be authorized";
        }
      }
      responses: {
        key: "400"
        value: {
          description: "Request contains invalid arguments";
        }
      }
      responses: {
        key: "404"
        value: {
          description: "No content found";
        }
      }
      responses: {
        key: "500"
        value: {
          description: "Internal server error";
        }
      }
    };
  }
  
  
message FileUploadRequest {
  string projectId = 1;
  google.api.HttpBody content = 2;
}

message FileUploadResponse {
  string status = 1;
}

@johanbrandhorst
Copy link
Collaborator

That's super cool! What exactly did you have to change, just the body parameter? I'd love to have this feature.

@ghevge
Copy link

ghevge commented May 5, 2021

@johanbrandhorst yes just the body param section

@johanbrandhorst
Copy link
Collaborator

Hm, I can't see any obvious way in which we could infer that it should use formData as the in type. Parameters aren't currently configurable on the operation since it's inferred from the message: https://github.com/grpc-ecosystem/grpc-gateway/blob/master/protoc-gen-openapiv2/options/openapiv2.proto#L154-L155. You can see all the places where we set this explicitly, based on the annotations used:



I don't know how we'd be able to tell that something should use formData instead.

Also, this feature is asking for us to support something that only works with envoy as a proxy right? I'd sooner we were able to actually support envoys behaviour in the gateway mux.

@zoulux
Copy link

zoulux commented Mar 11, 2023

mark

@zaakn
Copy link

zaakn commented Mar 29, 2023

After reading the source code, I found a way, but it's a bit ungraceful.

idea:

  • application/octet-stream is the type that just used for binary streams
  • the incoming struct of proto.Message must have a field named data, either customized message or google.api.HttpBody
  • some fields about size, checksum, etc., are passed by http header
  • runtime.WithMetadata() picks up the http header, and saves into grpc metadata
  • runtime.WithMarshalerOption() reads the http body, and saves raw []byte into the data field

proto file:

message Stuff {
	string md5 = 1;
	string type = 2;
}

service XXX {
	rpc CreateStuff(stream google.api.HttpBody) returns(Stuff) {
		option (google.api.http) = {
			post: "/stuffs"
			body: "*"
		};	
	}
}

grpc server:

func (s *XXXServer) CreateStuff(css pb.XXX_CreateStuffServer) (err error) {
	ctx := css.Context()

	contentType := ""
	if meta, ok := metadata.FromIncomingContext(ctx); ok {
		if ct := meta.Get("X-Content-Type"); len(ct) > 0 {
			contentType = ct[0]
		}
	}

	hasher := md5.New()
	for {
		body, err := css.Recv()
		if err == io.EOF {
			break
		}
		if err != nil {
			return err
		}
		hasher.Write(body.Data)
	}

	return css.SendAndClose(&pb.Stuff{
		Md5:  hex.EncodeToString(hasher.Sum(nil)),
		Type: contentType,
	})
}

gateway server:

type RawBinaryUnmarshaler runtime.HTTPBodyMarshaler

func NewRawBinaryUnmarshaler() *RawBinaryUnmarshaler {
	return &RawBinaryUnmarshaler{
		// just the built-in default marshaler,
		// to handle the outgoing marshalling
		Marshaler: &runtime.JSONPb{
			MarshalOptions: protojson.MarshalOptions{
				EmitUnpopulated: true,
			},
			UnmarshalOptions: protojson.UnmarshalOptions{
				DiscardUnknown: true,
			},
		},
	}
}

func (m *RawBinaryUnmarshaler) NewDecoder(r io.Reader) runtime.Decoder {
	return &BinaryDecoder{"Data", r}
}

type BinaryDecoder struct {
	fieldName string
	r         io.Reader
}

func (d *BinaryDecoder) fn() string {
	if d.fieldName == "" {
		return "Data"
	}
	return d.fieldName
}

var typeOfBytes = reflect.TypeOf([]byte(nil))

func (d *BinaryDecoder) Decode(v interface{}) error {
	rv := reflect.ValueOf(v).Elem() // assert it must be a pointer
	if rv.Kind() != reflect.Struct {
		return d
	}

	data := rv.FieldByName(d.fn())
	if !data.CanSet() || data.Type() != typeOfBytes {
		return d
	}

	// if only `google.api.HttpBody` is used, the above reflect
	// actions can also be changed to the type assertion:
	// httpBody, ok := v.(*httpbody.HttpBody)

	p, err := io.ReadAll(d.r)
	if err != nil {
		return err
	}
	if len(p) == 0 {
		return io.EOF
	}

	data.SetBytes(p)

	return err
}

func (d *BinaryDecoder) Error() string {
	d.r = nil
	return "cannot set: " + d.fn()
}

func HeaderToMetadata(ctx context.Context, r *http.Request) metadata.MD {
	md := metadata.New(nil)

	setter := func(k string, newKey ...func(old string) string) {
		if v := r.Header.Values(k); len(v) > 0 {
			if len(newKey) > 0 {
				k = newKey[0](k)
			}
			md.Set(k, v...)
		}
	}
	setter("X-Content-Type")
	setter("Content-Length", func(old string) string {
		return "X-" + old
	})

	return md
}

// ...

muxOpts := []runtime.ServeMuxOption{
	runtime.WithMetadata(HeaderToMetadata),
	runtime.WithMarshalerOption(
		"application/octet-stream",
		NewRawBinaryUnmarshaler(),
	),
}

mux := runtime.NewServeMux(muxOpts...)

test by curl:

[root@trial /tmp]# dd if=/dev/urandom of=./1MB.bin bs=1K count=1024 
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.014323 s, 73.2 MB/s
[root@trial /tmp]# md5sum 1MB.bin 
e39b990d8cd32d01b1fed6ef16954c6d  1MB.bin
[root@trial /tmp]# curl -v -X POST http://localhost:8080/v1/stuffs --data-binary "@/tmp/1MB.bin" -H "Content-Type: application/octet-stream" -H "X-Content-Type: application/what-ever"
* About to connect() to localhost port 8080 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /v1/stuffs HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
> Content-Type: application/octet-stream
> X-Content-Type: application/what-ever
> Content-Length: 1048576
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 29 Mar 2023 18:26:12 GMT
< Content-Length: 73
< 
* Connection #0 to host localhost left intact
{"md5":"e39b990d8cd32d01b1fed6ef16954c6d","type":"application/what-ever"}

@emcfarlane
Copy link
Contributor

👋 hello, I've solved this in my gRPC-transcoding project https://github.com/emcfarlane/larking by letting the handler access the underlying reader/writer stream. The API is:

func AsHTTPBodyReader(stream grpc.ServerStream, msg proto.Message) (io.Reader, error)
func AsHTTPBodyWriter(stream grpc.ServerStream, msg proto.Message) (io.Writer, error)

Which handles asserting the stream is a stream of google.api.HttpBody and correctly unmarshals the first payloads.

So if you have an API like:

import "google/api/httpbody.proto";

service Files {
  rpc LargeUploadDownload(stream UploadFileRequest)
      returns (stream google.api.HttpBody) {
    option (google.api.http) = {
      post : "/files/large/{filename}"
      body : "file"
    };
  }
}
message UploadFileRequest {
  string filename = 1;
  google.api.HttpBody file = 2;
}

You can use the AsHTTPBody methods to access the reader and writer of the http request without chunking into streams of messages. Like:

// LargeUploadDownload echoes the request body as the response body with contentType.
func (s *asHTTPBodyServer) LargeUploadDownload(stream testpb.Files_LargeUploadDownloadServer) error {
	var req testpb.UploadFileRequest
	r, _ := larking.AsHTTPBodyReader(stream, &req)
	log.Printf("got %s!", req.Filename)

	rsp := httpbody.HttpBody{
		ContentType: req.File.GetContentType(),
	}
	w, _ := larking.AsHTTPBodyWriter(stream, &rsp)

	_, err := io.Copy(w, r)
	return err
}

@black-06
Copy link

Hey guys, I wrote a plugin for grpc-gateway. It can support file upload and download, and the api is directly defined in grpc proto file.

https://github.com/black-06/grpc-gateway-file

I am looking forward to your suggestion

@vtolstov
Copy link

repo is not avail (github says 404)

@black-06
Copy link

repo is not avail (github says 404)

Sorry, I set it to private by mistake. You should be able to see it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests