Add multipart/form-data support #160

e00E · 2017-07-09T12:36:22Z

Inspired by voider1's implementation in #143 . Fix #4

The most important feature of this version is that the request body is implemented as Read so that it does not need to be stored in memory when files are sent.

Additionally the API is different. I just went with what felt right to me because I had to change most things from voider1 anyways:
In this version there is no additional MultipartRequestBuilder struct, instead there is a multipart method (like json or form) which sets the body of the request appropriately.

There are still many things marked as TODO in comments in the code where I wasnt sure what the best way to proceed was.

I also need to add tests, so far I have only written but never ran it.

We can probably cherry pick the best ideas from both pull requests. I also looked into using existing multipart crates but to me it seems like they cant easily be used none of them implement Read.

This is an implementation and does not need to be exposed. The boundary still has to be retrievable to set the multipart header, but not changeable.

Also ran rustfmt on multipart.rs

e00E · 2017-07-09T22:49:21Z

I have added documentation and tests. The implementation seems to work but I did not test it with a real webserver yet.

There are still some things marked todo in the code where I think I need input from someone else.

This should definitely have been used all along, I just did not know it existed.

e00E · 2017-07-11T08:28:14Z

For this comment by @seanmonstar :

It might be nice if the params method worked similar to the form method on the regular build. That is, that instead of Params, this were generic over Serialize.

pub fn params<T: Serialize>(&mut self, params: &T) -> ::Result<&mut MultipartRequestBuilder> {
}

To make that easier, it'd require some MultipartSerializer::new(boundary).serialize(params)...

Alternatively, we could look at the FormData API in web browsers. You call append a bunch of times, adding form fields. Is it better for users call req.param("username", "sean").param("password", "pass"), or to be generic over Serialize? I lean towards being generic, as it is the most flexible...

To me it seems we would need a MultipartSerializer with the complexity of serde_urlencoded's serializer which is more than double the lines of code of this pull request in its current state. I do not feel like the payoff is big enough to justify this.

seanmonstar · 2017-07-11T22:34:27Z

Wow, awesome work! I haven't review yet, will dig in in a moment.

I do not feel like the payoff is big enough to justify this.

I do, but I don't expect you to write it all. I'll see if I can make a serde_multipart crate myself this week.

e00E · 2017-07-12T07:15:27Z

I I think I will try to write it unless you really want to do it yourself. Im itching to write more Rust at the moment and with the urlencoded crate to guide me it should be possible, but it might take longer than if you wrote it.

e00E · 2017-07-12T08:22:17Z

Is it a correct assumption that the point of a multipart serializer is to make it easier to add parameters to a multipart request?
In that case it wouldnt actually serialize into a string or bytes would it, but instead into a MultipartRequest (how I called it in my pull request). Also no deserialization is needed so the code should be simpler than I initially thought.

This allows body::sized to be used when making a multipart request.

e00E · 2017-07-12T10:06:36Z

From the other multipart PR and because it is marked todo in my code.

If the name is not valid UTF-8, the RFC suggests to use percent-encoding of the bytes.

Rereading the RFC these are the relevant sections:
First:

Within this specification, "percent-encoding" (as defined in [RFC3986]) is offered as a possible way of encoding characters in file names that are otherwise disallowed, including non-ASCII characters, spaces, control characters, and so forth. The encoding is created replacing each non-ASCII or disallowed character with a sequence, where each byte of the UTF-8 encoding of the character is represented by a percent-sign (%) followed by the (case-insensitive) hexadecimal of that byte.

Second:

The handling of non-ASCII field names has changed -- the method described in RFC 2047 is no longer recommended; instead, it is suggested that senders send UTF-8 field names directly and that file names be sent directly in the form-charset.

I interpret this as they may be restrictions filenames but these are application specific and not given by the RFC. The percent encoding is a way of encoding non ascii characters not non utf8 characters.

Since these restrictions are vague I think we should just use utf8 for filenames if possible and either send no filename or a lossy converted filename if the file path cannot be converted to utf8 cleanly.

e00E · 2017-07-12T10:21:37Z

I found this issue in python requests psf/requests#2117 but it did not make things clearer...

And https://stackoverflow.com/a/28283207 which suggests just using utf8 because we encode the rest of the form like that already (the name parameter).

BW155 · 2017-07-15T20:49:37Z

I am very happy to see some progress after time with this feature.

e00E · 2017-07-19T15:40:09Z

@seanmonstar
Serializer is done.
I am still not sure about non utf8 filenames see #160 (comment) and the next comment.

seanmonstar · 2017-07-19T23:54:15Z

The handling of non-ASCII field names has changed

It seems this is talking specifically about the name field, not filename. The spec still explicitly states that for filenames, percent encoding is an option. However, it does state that it isn't mandatory or cases where it's not available or private. Non-UTF8 probably doesn't fit that, but it should be fine. The server can invent a name.

Slightly related, it's worth keeping in mind that names might have some bad ASCII characters that reqwest should probably look out for. If a someone were to naively send unsanitized user data, they could try to somehow attack the request by injecting \r\n or similar.

e00E · 2017-07-20T06:13:26Z

Yes I was thinking about the last part too. Specifically I left a comment in the code about what would happen if name or filename contained a ". Since both fields are surrounded with quotation marks in the request, stray quotation marks in the field it self should be the only problem, right?
I did not find this in the multipart RFC but I think what we have to do is turn " into \" and \ into \\.

e00E · 2017-07-20T07:18:23Z

What I wrote above is not right, at least not accepted by httpbin. Googling around more, there really seems to be no correct way to do it.
From what I can tell by testing with httpbin you can put anything in name and filename except " and \r and \n and there is no way to escape those.

What python requests does is is if any non ascii or disallowed ascii characters appear it switches to an encoding described in https://stackoverflow.com/a/20592910 .
For example keystart"\r\nkeyend turns into name*=utf-8''keystart%22%0D%0Akeyend; filename*=utf-8''keystart%22%0D%0Akeyend.

This seems like a sensible approach because it will keep legal ascii only fields exactly as the user intended and is reasonable for other fields for which there is no single correct solution.

seanmonstar · 2017-07-22T00:01:03Z

Seems reasonable to me.

e00E · 2017-07-22T09:38:48Z

Done. You can review now @seanmonstar . There are still a few things marked TODO in the code where I am not sure if this is the right approach but all the features discussed here are implemented.

seanmonstar

Thanks @e00E for all this work! The comments I left are for mostly minor things, I think the bulk of the work is great!

seanmonstar · 2017-07-25T18:29:13Z

src/lib.rs

@@ -1,4 +1,3 @@
-#![deny(warnings)]


We'll want to put this back :D

seanmonstar · 2017-07-25T18:32:17Z

src/multipart/multipart.rs

+}
+
+// TODO: MultipartField cannot derive debug because value is not Debug
+// Not sure how to best resolve this...


It's OK if a Debug output cannot show everything. This can have something like this:

f.debug_struct("MultipartField") .field("name", &self.name) // etc

seanmonstar · 2017-07-25T18:45:57Z

src/request.rs

+        {
+            let mut req = self.request_mut();
+            // TODO: I tried to define the mimetype in code only, without parse() but could not
+            // find a way to set the boundary parameter. Is there a way to do that?


No, there isn't a way yet. The boundary has to parsed to look for illegal characters. Perhaps Mime could grow a set_param method.

seanmonstar · 2017-07-25T18:50:04Z

src/multipart/multipart.rs

+                Some(ref mime) => {
+                    format!(
+                        // TODO: Apparently I still have to write out Content-Type here?!
+                        // I thought header would format itself that way on its own


No, Display of ContentType will not write the header name. The way the full header is output depends on the HTTP version (either Content-Type: foo or content-type = foo).

seanmonstar · 2017-07-25T18:51:06Z

src/multipart/multipart.rs

+            format_parameter("name", self.name.as_ref()),
+            match self.file_name {
+                Some(ref file_name) => format!("; {}", format_parameter("filename", file_name)),
+                None => "".to_string(),


Could be just String::new().

seanmonstar · 2017-07-25T19:33:58Z

src/multipart/multipart.rs

+        }
+
+        format!(
+            // TODO: I would use hyper's ContentDisposition header here, but it doesnt seem to have


Hm, looking at the ContentDisposition header in hyper, it could use some help. It'd be a breaking change for hyper, so can't be changed for this pull request, unfortunately.

seanmonstar · 2017-07-25T19:35:10Z

src/multipart/multipart.rs

+                        // TODO: Apparently I still have to write out Content-Type here?!
+                        // I thought header would format itself that way on its own
+                        "\r\nContent-Type: {}",
+                        ::header::ContentType(mime.clone())


Actually, the Display of ContentType just forwards to Mime, so the header isn't needed. That would remove the need for the clone, also.

seanmonstar · 2017-07-25T19:36:02Z

src/multipart/multipart.rs

+}
+
+// TODO: RequestReader cannot derive debug because active_reader is not Debug
+// Not sure how to best resolve this...


If the type is not public, the impl won't be needed. Is the lint triggering here?

RequestReader has to be public because it is returned in the public reader method on MultipartRequest. The only alternative I see is making reader return Box<Read>.

Ah, I thought the lint only triggered if the type was publicly exported out of the crate, not just the module. If needed, then a simple f.debug_struct("RequestReader").finish() could be good enough.

Actually, would the name MultipartReader fit better (I know it's just internal...)?

seanmonstar · 2017-07-25T19:40:02Z

src/request.rs

@@ -277,6 +277,47 @@ impl RequestBuilder {
        Ok(self)
    }

+    /// Sends a multipart/formdata body.
+    ///
+    /// ```no_run


By wrapping the example in similar stuff as the rest of the examples in the crate, the no_run can be removed, and all the unwraps can be replaced with ?s.

# fn run() -> Result<(), Box<::std::error::Error> { let response = client.post("https://httpbin.org/post")? .multipart(...) .send()?; # } # fn main() {}

seanmonstar · 2017-07-25T19:43:00Z

src/request.rs

+    ///
+    /// See [`to_multipart`](fn.to_multipart.html), [`MultipartRequest`](struct.MultipartRequest.html)
+    /// and [`MultipartField`](struct.MultipartField.html) for more examples.
+    pub fn multipart(&mut self, multipart: MultipartRequest) -> &mut RequestBuilder {


I wonder if there's a way to make this method be as convenient as form and json, possibly allowing a user to send very simple forms without importing more types.

Could it be generic over T: Serialize? That probably makes it difficult to use the MultipartField builder pieces to adjust extra pieces, like the file name or mime...

Well, actually, if the MultipartField were to implement Serialize, then someone could use that directly, and then call req.multipart(vec![field1, field2])...

I suppose there could be an issue if you built a HashMap<String, MultipartField>, cause then the question is which name to use...

The point of this signature is to allow the builder and Serialize via to_multipart.

If it took Serialize directly then like you say how would the builder methods be called.

So, if a user has made use of a Serialize thing, I suspect they wouldn't need to use the builder to add more fields. They could have added them to their previous thing, right?

And for customizing parameters by using MultipartField, likewise, those could be just be added to a thing that implements Serialize, right?

let bike = vec![ MultipartField::param("gears", "21"), MultipartField::param("color", "blue"), MultipartField::file("photo", "/usr/foo/photo.png") .mime(mime::IMAGE_PNG), ]; client.post(url)? .multipart(bike)? .send()?

Could something like this work? I should do another look at ser.rs...

I dont know Serde well enough to know if that is possible. I feel it does not work because the thing implements Serialize could be a list, struct or hashmap. How would MultipartField be added to a struct or hashmap?

I gave this some more thought. How are nested values serialized? Does it serialize to a thing of name = "products[bike][name]" or something? I'm not sure if that is usual, or just something that some individuals do.

Anywho, if nested fields don't work, then we could instead make the serialize look certain structure patterns, kind of like requests does. Either with use of a specific order in tuples, or by blessing certain names... Examples:

#[derive(Serialize)] struct Bike { gears: usize, color: String, photo: (PathBuf, Mime) }

let form = &[ ("photo", (some_path, some_mime)), ];

The field names of MultipartField could be special too, such that someone can use those builders instead of tuples...

I gave this some more thought. How are nested values serialized? Does it serialize to a thing of name = "products[bike][name]" or something? I'm not sure if that is usual, or just something that some individuals do.

I dont quite understand this part.

Anywho, if nested fields don't work, then we could instead make the serialize look certain structure patterns, kind of like requests does. Either with use of a specific order in tuples, or by blessing certain names... Examples:

What you describe seems possible but are you sure it is not too convoluted for what it brings? Not only would it make the serializer a lot more complex (at least I think it would but again this was my first time implementing a Serde trait) but it also makes it unclear what kind of serializeable things are valid and lead to what kind of multipart values since the description of that only exists in the serializer implementation and then has to be communicated to the user over the documentation. It is not type checked and might fail or do unexpected things when run.

The current serializer makes it easy to add simple parameters. For more complex things it might even be harder to use them with serialize because the builder methods are very clear in what they do whereas what exactly the serializer might do is harder to understand.

Sorry, I've been busy the past few days trying to get the http crate ready for preview. I'm going to download this branch locally and try some things with the serializer, instead of asking you to spend your time trying out crazy ideas I spout out.

e00E · 2017-07-25T23:12:23Z

src/multipart/multipart.rs

+                    // TODO: Should we instead cache the computed header for when it is used again
+                    // when sending the request? This would use more memory as all headers would be
+                    // held in memory as opposed to only the current one but it would only compute
+                    // each header once.


Oh, I'm not worried either way. I suppose for this to user a whole lot of memory, there would need to be a lot of params, which is unlikely, and the memory needed for these strings is probably tiny in comparison. So, I suppose caching them would be an improvement.

BW155 · 2017-08-06T10:38:44Z

What's the status on this feature? I really would like to use it.

e00E · 2017-08-06T12:11:46Z

It is working and nearly done. Only some discussion on the ergonomics of Serialize is remaining where seanmonstar wanted to try something.

seanmonstar · 2017-08-11T23:56:21Z

Ok, I've done some exploring. It turns out, going down the serde path in this case was probably a dumb idea. We not only need to specify how the name, value, filename, and media types can be given as arguments, but we also need to specify how those things get written. Serde is not meant for that. Sorry for pushing on that idea.

It seems that reqwest should provide a Multipart map thingy that users configure, and then pass to RequestBuilder.multipart, as you've been suggesting all along. The question now is what exactly that API looks like. I think there are 3 solutions that are nice, and maybe not even mutually exclusive, but of course, good API design means there aren't 5 ways to do the same thing.

A MultipartField builder, as is currently implemented in this PR. It allows an API like this:

MultipartRequest::new()
    .field(MultipartField::param("foo", "bar"))
    .field(MutlipartField::file("photo", "path").mime(png))

Making MultipartField a trait instead. Multipart::field can be generic over T: MultipartField. Example usage could look like this:
```
Multipart::new()
    .field("foo", "bar")
    .field("photo", (File::open("path")?, mime::IMAGE_PNG))
```
The trait could be implemented for String, Vec<u8>, File, and some tuples, like (File, Mime), (String, Mime), etc. If the tuples seem confusing, specific types can be created instead, like reqwest::multipart::FileField or reqwest::mutlipart::DataField and so on. This is the sort of OO design that you see in places like Java and C#.

The trait would have methods for getting the value, media type, filename...
Have only a single type, Multipart, with a larger method to set all the properties of a field, and convenience methods for defining common fields. Example usage:
```
Mutlipart::new()
    .data("foo", "bar")
    .file("photo", file, file_name)
    .field(name, value, file_name, mime)
```

The thing I like about options 2 and 3 is that in the common case, it's only 1 import, Multipart. Methods can be used for configuring it.

I actually kind of think perhaps some sort of mix of 2 and 3, since a downside I see of 3 is that the "larger set-everything method" can't ever change once stabilized. If we wanted to add another thing to customize a field, it'd require a new method, field2 or something sad like that. A trait like in option 2 however can grow new methods. This would allow reqwest to eventually allow you to specify individual headers to accompany a multipart value, as the spec says.

What do you think of all this?

e00E · 2017-08-14T15:14:30Z

In the trait version with tuples I dislike the ambiguity introduced. For example a tuple<string,string> could be a normal key value or a key and path to file. If specific types are used then the negative of the builder version that additional imports are needed applies again (but there it would only be 1 type, while it would be multiple here) so making it a trait in that case does not seem to improve anything.

What I like about the builder (and I might be biased because I created it) is that it is very simple. The code is simple, the intentions of the code are clear and there is no indirection. Is having to import MultipartField really a negative and does removing it make up for the complexities introduced elsewhere?

This is not a strong opinion and you have more experience designing rust apis than I do, so I won't have negative feelings letting you decide.

Ok, I've done some exploring. It turns out, going down the serde path in this case was probably a dumb idea. We not only need to specify how the name, value, filename, and media types can be given as arguments, but we also need to specify how those things get written. Serde is not meant for that. Sorry for pushing on that idea.

Should I remove the existing serde related code then?

seanmonstar · 2017-08-17T17:26:14Z

Sorry, my time has been sucked up with trying to get the http crate ready for release by this weekend (RustConf)!

It probably makes sense to remove the serde stuff (again, so sorry!), since it can be confusing how it works, especially if someone tries to serialize a file or something. As for the builder, I think we can start with by mixing the builder you made and adding some convenience methods to MultipartRequest, so that hopefully in most cases, people can use the one type, and we will leave the MultipartField as a way to customize all the knobs. If in the future a trait way seems better, well, MultipartField can always implement the trait!

Don't worry, I'll pull your branch locally and make these changes myself (in fact, I'm half way done). I really do appreciate all the work you've put in here, and I want to get your contribution merged.

seanmonstar · 2017-08-18T17:16:16Z

Ok, nearly finished. I think the only question I have left is about naming and namespaces. With both a form-like struct and fields, is it better to hang them off the top level namespace, or keep them in a module? I can see benefits to both, so I'd welcome other's thoughts.

reqwest::Multipart and reqwest::MultipartField?
reqwest::multipart::Form and reqwest::multipart::Part (or Field)?

e00E · 2017-08-18T18:02:29Z

I do not see a practical difference but I subjectively prefer the module version because it groups the related types logically. Maybe you want to use reqwest::multipart:: but not all the other reqwest types.

BW155 · 2017-08-20T14:46:58Z

For me, it looks like reqwest::Multipart would look better because you may get confused if you just have Part or Form in your code.

seanmonstar · 2017-08-21T21:26:41Z

Ok, I've opened #190 that has this PR, plus the tweaks I mentioned.

e00E added 11 commits July 9, 2017 14:25

Add multipart/form-data support

6c853b4

Add tests for byte reader

3da6739

Prevent warnings

5943117

Use hyper::header instead of hardcoded string in multipart header

7dbbf0e

Fix copy paste error in comment

91db1b6

Make comment about header clearer

35359d7

Prevent user from choosing boundary.

1e7b42d

This is an implementation and does not need to be exposed. The boundary still has to be retrievable to set the multipart header, but not changeable.

Add ergonomic builder style methods to MultipartRequest

316fd0d

Also ran rustfmt on multipart.rs

Add default mime type for files

fe11f75

Redesign Multipart API to make it more builder style.

c30eb9e

Add much more documentation to multipart

ecfbc55

e00E force-pushed the master branch from 9d1154f to ecfbc55 Compare July 9, 2017 19:18

e00E added 2 commits July 9, 2017 21:19

Comment cleanup

a0eb6d1

Fix MultipartRequest Reader implemenentation and add test for it

6d8449e

e00E added 2 commits July 11, 2017 09:41

Replace BytesReader with std::io::Cursor

0f24411

This should definitely have been used all along, I just did not know it existed.

Better type for value in MultipartField::param

d29ab15

e00E added 2 commits July 11, 2017 22:49

Merge branch 'master' into master

ccc31db

Merge branch 'master' into master

38989a0

e00E added 2 commits July 12, 2017 11:41

Make MultipartRequest compute its own size if possible

a09ac92

This allows body::sized to be used when making a multipart request.

Merge branch 'master' of https://github.com/e00E/reqwest

0a1d824

e00E added 2 commits July 13, 2017 19:52

Merge remote-tracking branch 'upstream/master'

1012c79

Merge remote-tracking branch 'upstream/master'

83ba313

cargo fmt on my stuff

861acfb

e00E added 2 commits July 22, 2017 11:31

Handle parameters with illegal characters

1f61c86

Remove superfluous comment

ff83824

seanmonstar requested changes Jul 25, 2017

View reviewed changes

e00E added 4 commits July 25, 2017 23:24

Merge branch 'master' of https://github.com/seanmonstar/reqwest

2f71105

PR review

bddd51f

deny warnings to line 1

c759d08

PR review

1fb2e72

e00E commented Jul 25, 2017

View reviewed changes

PR review

ce994fb

tinaun mentioned this pull request Jul 27, 2017

Updated dependencies SpaceManiac/discord-rs#143

Closed

e00E added 2 commits July 28, 2017 12:14

Remove unwraps in doc example

36cadf7

Merge branch 'master' of https://github.com/seanmonstar/reqwest

5e30c5a

seanmonstar mentioned this pull request Aug 21, 2017

Multipart support #190

Merged

seanmonstar closed this in #190 Aug 22, 2017

Add multipart/form-data support #160

Add multipart/form-data support #160

Conversation

e00E commented Jul 9, 2017 • edited Loading

e00E commented Jul 9, 2017

e00E commented Jul 11, 2017

seanmonstar commented Jul 11, 2017

e00E commented Jul 12, 2017

e00E commented Jul 12, 2017

e00E commented Jul 12, 2017

e00E commented Jul 12, 2017 • edited Loading

BW155 commented Jul 15, 2017

e00E commented Jul 19, 2017

seanmonstar commented Jul 19, 2017

e00E commented Jul 20, 2017

e00E commented Jul 20, 2017 • edited Loading

seanmonstar commented Jul 22, 2017

e00E commented Jul 22, 2017

seanmonstar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BW155 commented Aug 6, 2017

e00E commented Aug 6, 2017

seanmonstar commented Aug 11, 2017

e00E commented Aug 14, 2017

seanmonstar commented Aug 17, 2017

seanmonstar commented Aug 18, 2017

e00E commented Aug 18, 2017

BW155 commented Aug 20, 2017 • edited Loading

seanmonstar commented Aug 21, 2017

e00E commented Jul 9, 2017 •

edited

Loading

e00E commented Jul 12, 2017 •

edited

Loading

e00E commented Jul 20, 2017 •

edited

Loading

BW155 commented Aug 20, 2017 •

edited

Loading