Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multipart/form-data support #160

Closed
wants to merge 32 commits into from
Closed

Add multipart/form-data support #160

wants to merge 32 commits into from

Conversation

e00E
Copy link
Contributor

@e00E e00E commented Jul 9, 2017

Inspired by voider1's implementation in #143 . Fix #4

The most important feature of this version is that the request body is implemented as Read so that it does not need to be stored in memory when files are sent.

Additionally the API is different. I just went with what felt right to me because I had to change most things from voider1 anyways:
In this version there is no additional MultipartRequestBuilder struct, instead there is a multipart method (like json or form) which sets the body of the request appropriately.

There are still many things marked as TODO in comments in the code where I wasnt sure what the best way to proceed was.

I also need to add tests, so far I have only written but never ran it.

We can probably cherry pick the best ideas from both pull requests. I also looked into using existing multipart crates but to me it seems like they cant easily be used none of them implement Read.

@e00E
Copy link
Contributor Author

e00E commented Jul 9, 2017

I have added documentation and tests. The implementation seems to work but I did not test it with a real webserver yet.

There are still some things marked todo in the code where I think I need input from someone else.

e00E added 2 commits July 11, 2017 09:41
This should definitely have been used all along, I just did not know it
existed.
@e00E
Copy link
Contributor Author

e00E commented Jul 11, 2017

For this comment by @seanmonstar :

It might be nice if the params method worked similar to the form method on the regular build. That is, that instead of Params, this were generic over Serialize.

pub fn params<T: Serialize>(&mut self, params: &T) -> ::Result<&mut MultipartRequestBuilder> {
}

To make that easier, it'd require some MultipartSerializer::new(boundary).serialize(params)...

Alternatively, we could look at the FormData API in web browsers. You call append a bunch of times, adding form fields. Is it better for users call req.param("username", "sean").param("password", "pass"), or to be generic over Serialize? I lean towards being generic, as it is the most flexible...

To me it seems we would need a MultipartSerializer with the complexity of serde_urlencoded's serializer which is more than double the lines of code of this pull request in its current state. I do not feel like the payoff is big enough to justify this.

@seanmonstar
Copy link
Owner

Wow, awesome work! I haven't review yet, will dig in in a moment.

I do not feel like the payoff is big enough to justify this.

I do, but I don't expect you to write it all. I'll see if I can make a serde_multipart crate myself this week.

@e00E
Copy link
Contributor Author

e00E commented Jul 12, 2017

I I think I will try to write it unless you really want to do it yourself. Im itching to write more Rust at the moment and with the urlencoded crate to guide me it should be possible, but it might take longer than if you wrote it.

@e00E
Copy link
Contributor Author

e00E commented Jul 12, 2017

Is it a correct assumption that the point of a multipart serializer is to make it easier to add parameters to a multipart request?
In that case it wouldnt actually serialize into a string or bytes would it, but instead into a MultipartRequest (how I called it in my pull request). Also no deserialization is needed so the code should be simpler than I initially thought.

@e00E
Copy link
Contributor Author

e00E commented Jul 12, 2017

From the other multipart PR and because it is marked todo in my code.

If the name is not valid UTF-8, the RFC suggests to use percent-encoding of the bytes.

Rereading the RFC these are the relevant sections:
First:

Within this specification, "percent-encoding" (as defined in [RFC3986]) is offered as a possible way of encoding characters in file names that are otherwise disallowed, including non-ASCII characters, spaces, control characters, and so forth. The encoding is created replacing each non-ASCII or disallowed character with a sequence, where each byte of the UTF-8 encoding of the character is represented by a percent-sign (%) followed by the (case-insensitive) hexadecimal of that byte.

Second:

The handling of non-ASCII field names has changed -- the method described in RFC 2047 is no longer recommended; instead, it is suggested that senders send UTF-8 field names directly and that file names be sent directly in the form-charset.

I interpret this as they may be restrictions filenames but these are application specific and not given by the RFC. The percent encoding is a way of encoding non ascii characters not non utf8 characters.

Since these restrictions are vague I think we should just use utf8 for filenames if possible and either send no filename or a lossy converted filename if the file path cannot be converted to utf8 cleanly.

@e00E
Copy link
Contributor Author

e00E commented Jul 12, 2017

I found this issue in python requests psf/requests#2117 but it did not make things clearer...

And https://stackoverflow.com/a/28283207 which suggests just using utf8 because we encode the rest of the form like that already (the name parameter).

@BW155
Copy link

BW155 commented Jul 15, 2017

I am very happy to see some progress after time with this feature.

@e00E
Copy link
Contributor Author

e00E commented Jul 19, 2017

@seanmonstar
Serializer is done.
I am still not sure about non utf8 filenames see #160 (comment) and the next comment.

@seanmonstar
Copy link
Owner

The handling of non-ASCII field names has changed

It seems this is talking specifically about the name field, not filename. The spec still explicitly states that for filenames, percent encoding is an option. However, it does state that it isn't mandatory or cases where it's not available or private. Non-UTF8 probably doesn't fit that, but it should be fine. The server can invent a name.

Slightly related, it's worth keeping in mind that names might have some bad ASCII characters that reqwest should probably look out for. If a someone were to naively send unsanitized user data, they could try to somehow attack the request by injecting \r\n or similar.

@e00E
Copy link
Contributor Author

e00E commented Jul 20, 2017

Yes I was thinking about the last part too. Specifically I left a comment in the code about what would happen if name or filename contained a ". Since both fields are surrounded with quotation marks in the request, stray quotation marks in the field it self should be the only problem, right?
I did not find this in the multipart RFC but I think what we have to do is turn " into \" and \ into \\.

@e00E
Copy link
Contributor Author

e00E commented Jul 20, 2017

What I wrote above is not right, at least not accepted by httpbin. Googling around more, there really seems to be no correct way to do it.
From what I can tell by testing with httpbin you can put anything in name and filename except " and \r and \n and there is no way to escape those.

What python requests does is is if any non ascii or disallowed ascii characters appear it switches to an encoding described in https://stackoverflow.com/a/20592910 .
For example keystart"\r\nkeyend turns into name*=utf-8''keystart%22%0D%0Akeyend; filename*=utf-8''keystart%22%0D%0Akeyend.

This seems like a sensible approach because it will keep legal ascii only fields exactly as the user intended and is reasonable for other fields for which there is no single correct solution.

@seanmonstar
Copy link
Owner

Seems reasonable to me.

@e00E
Copy link
Contributor Author

e00E commented Jul 22, 2017

Done. You can review now @seanmonstar . There are still a few things marked TODO in the code where I am not sure if this is the right approach but all the features discussed here are implemented.

Copy link
Owner

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @e00E for all this work! The comments I left are for mostly minor things, I think the bulk of the work is great!

src/lib.rs Outdated
@@ -1,4 +1,3 @@
#![deny(warnings)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to put this back :D

}

// TODO: MultipartField cannot derive debug because value is not Debug
// Not sure how to best resolve this...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK if a Debug output cannot show everything. This can have something like this:

f.debug_struct("MultipartField")
    .field("name", &self.name)
    // etc

src/request.rs Outdated
{
let mut req = self.request_mut();
// TODO: I tried to define the mimetype in code only, without parse() but could not
// find a way to set the boundary parameter. Is there a way to do that?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there isn't a way yet. The boundary has to parsed to look for illegal characters. Perhaps Mime could grow a set_param method.

Some(ref mime) => {
format!(
// TODO: Apparently I still have to write out Content-Type here?!
// I thought header would format itself that way on its own
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Display of ContentType will not write the header name. The way the full header is output depends on the HTTP version (either Content-Type: foo or content-type = foo).

format_parameter("name", self.name.as_ref()),
match self.file_name {
Some(ref file_name) => format!("; {}", format_parameter("filename", file_name)),
None => "".to_string(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be just String::new().

}

format!(
// TODO: I would use hyper's ContentDisposition header here, but it doesnt seem to have
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the ContentDisposition header in hyper, it could use some help. It'd be a breaking change for hyper, so can't be changed for this pull request, unfortunately.

// TODO: Apparently I still have to write out Content-Type here?!
// I thought header would format itself that way on its own
"\r\nContent-Type: {}",
::header::ContentType(mime.clone())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the Display of ContentType just forwards to Mime, so the header isn't needed. That would remove the need for the clone, also.

}

// TODO: RequestReader cannot derive debug because active_reader is not Debug
// Not sure how to best resolve this...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is not public, the impl won't be needed. Is the lint triggering here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestReader has to be public because it is returned in the public reader method on MultipartRequest. The only alternative I see is making reader return Box<Read>.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought the lint only triggered if the type was publicly exported out of the crate, not just the module. If needed, then a simple f.debug_struct("RequestReader").finish() could be good enough.


Actually, would the name MultipartReader fit better (I know it's just internal...)?

src/request.rs Outdated
@@ -277,6 +277,47 @@ impl RequestBuilder {
Ok(self)
}

/// Sends a multipart/formdata body.
///
/// ```no_run
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By wrapping the example in similar stuff as the rest of the examples in the crate, the no_run can be removed, and all the unwraps can be replaced with ?s.

# fn run() -> Result<(), Box<::std::error::Error> {
let response = client.post("https://httpbin.org/post")?
    .multipart(...)
    .send()?;
# }
# fn main() {}

src/request.rs Outdated
///
/// See [`to_multipart`](fn.to_multipart.html), [`MultipartRequest`](struct.MultipartRequest.html)
/// and [`MultipartField`](struct.MultipartField.html) for more examples.
pub fn multipart(&mut self, multipart: MultipartRequest) -> &mut RequestBuilder {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way to make this method be as convenient as form and json, possibly allowing a user to send very simple forms without importing more types.

Could it be generic over T: Serialize? That probably makes it difficult to use the MultipartField builder pieces to adjust extra pieces, like the file name or mime...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually, if the MultipartField were to implement Serialize, then someone could use that directly, and then call req.multipart(vec![field1, field2])...

I suppose there could be an issue if you built a HashMap<String, MultipartField>, cause then the question is which name to use...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this signature is to allow the builder and Serialize via to_multipart.

If it took Serialize directly then like you say how would the builder methods be called.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if a user has made use of a Serialize thing, I suspect they wouldn't need to use the builder to add more fields. They could have added them to their previous thing, right?

And for customizing parameters by using MultipartField, likewise, those could be just be added to a thing that implements Serialize, right?

let bike = vec![
    MultipartField::param("gears", "21"),
    MultipartField::param("color", "blue"),
    MultipartField::file("photo", "/usr/foo/photo.png")
        .mime(mime::IMAGE_PNG),
];

client.post(url)?
    .multipart(bike)?
    .send()?

Could something like this work? I should do another look at ser.rs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know Serde well enough to know if that is possible. I feel it does not work because the thing implements Serialize could be a list, struct or hashmap. How would MultipartField be added to a struct or hashmap?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this some more thought. How are nested values serialized? Does it serialize to a thing of name = "products[bike][name]" or something? I'm not sure if that is usual, or just something that some individuals do.

Anywho, if nested fields don't work, then we could instead make the serialize look certain structure patterns, kind of like requests does. Either with use of a specific order in tuples, or by blessing certain names... Examples:

#[derive(Serialize)]
struct Bike {
    gears: usize,
    color: String,
    photo: (PathBuf, Mime)
}
let form = &[
    ("photo", (some_path, some_mime)),
];

The field names of MultipartField could be special too, such that someone can use those builders instead of tuples...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this some more thought. How are nested values serialized? Does it serialize to a thing of name = "products[bike][name]" or something? I'm not sure if that is usual, or just something that some individuals do.

I dont quite understand this part.

Anywho, if nested fields don't work, then we could instead make the serialize look certain structure patterns, kind of like requests does. Either with use of a specific order in tuples, or by blessing certain names... Examples:

What you describe seems possible but are you sure it is not too convoluted for what it brings? Not only would it make the serializer a lot more complex (at least I think it would but again this was my first time implementing a Serde trait) but it also makes it unclear what kind of serializeable things are valid and lead to what kind of multipart values since the description of that only exists in the serializer implementation and then has to be communicated to the user over the documentation. It is not type checked and might fail or do unexpected things when run.

The current serializer makes it easy to add simple parameters. For more complex things it might even be harder to use them with serialize because the builder methods are very clear in what they do whereas what exactly the serializer might do is harder to understand.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been busy the past few days trying to get the http crate ready for preview. I'm going to download this branch locally and try some things with the serializer, instead of asking you to spend your time trying out crazy ideas I spout out.

// TODO: Should we instead cache the computed header for when it is used again
// when sending the request? This would use more memory as all headers would be
// held in memory as opposed to only the current one but it would only compute
// each header once.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm not worried either way. I suppose for this to user a whole lot of memory, there would need to be a lot of params, which is unlikely, and the memory needed for these strings is probably tiny in comparison. So, I suppose caching them would be an improvement.

@BW155
Copy link

BW155 commented Aug 6, 2017

What's the status on this feature? I really would like to use it.

@e00E
Copy link
Contributor Author

e00E commented Aug 6, 2017

It is working and nearly done. Only some discussion on the ergonomics of Serialize is remaining where seanmonstar wanted to try something.

@seanmonstar
Copy link
Owner

Ok, I've done some exploring. It turns out, going down the serde path in this case was probably a dumb idea. We not only need to specify how the name, value, filename, and media types can be given as arguments, but we also need to specify how those things get written. Serde is not meant for that. Sorry for pushing on that idea.


It seems that reqwest should provide a Multipart map thingy that users configure, and then pass to RequestBuilder.multipart, as you've been suggesting all along. The question now is what exactly that API looks like. I think there are 3 solutions that are nice, and maybe not even mutually exclusive, but of course, good API design means there aren't 5 ways to do the same thing.

  1. A MultipartField builder, as is currently implemented in this PR. It allows an API like this:

    MultipartRequest::new()
        .field(MultipartField::param("foo", "bar"))
        .field(MutlipartField::file("photo", "path").mime(png))
  2. Making MultipartField a trait instead. Multipart::field can be generic over T: MultipartField. Example usage could look like this:

    Multipart::new()
        .field("foo", "bar")
        .field("photo", (File::open("path")?, mime::IMAGE_PNG))

    The trait could be implemented for String, Vec<u8>, File, and some tuples, like (File, Mime), (String, Mime), etc. If the tuples seem confusing, specific types can be created instead, like reqwest::multipart::FileField or reqwest::mutlipart::DataField and so on. This is the sort of OO design that you see in places like Java and C#.

    The trait would have methods for getting the value, media type, filename...

  3. Have only a single type, Multipart, with a larger method to set all the properties of a field, and convenience methods for defining common fields. Example usage:

    Mutlipart::new()
        .data("foo", "bar")
        .file("photo", file, file_name)
        .field(name, value, file_name, mime)

The thing I like about options 2 and 3 is that in the common case, it's only 1 import, Multipart. Methods can be used for configuring it.

I actually kind of think perhaps some sort of mix of 2 and 3, since a downside I see of 3 is that the "larger set-everything method" can't ever change once stabilized. If we wanted to add another thing to customize a field, it'd require a new method, field2 or something sad like that. A trait like in option 2 however can grow new methods. This would allow reqwest to eventually allow you to specify individual headers to accompany a multipart value, as the spec says.

What do you think of all this?

@e00E
Copy link
Contributor Author

e00E commented Aug 14, 2017

In the trait version with tuples I dislike the ambiguity introduced. For example a tuple<string,string> could be a normal key value or a key and path to file. If specific types are used then the negative of the builder version that additional imports are needed applies again (but there it would only be 1 type, while it would be multiple here) so making it a trait in that case does not seem to improve anything.

What I like about the builder (and I might be biased because I created it) is that it is very simple. The code is simple, the intentions of the code are clear and there is no indirection. Is having to import MultipartField really a negative and does removing it make up for the complexities introduced elsewhere?

This is not a strong opinion and you have more experience designing rust apis than I do, so I won't have negative feelings letting you decide.

Ok, I've done some exploring. It turns out, going down the serde path in this case was probably a dumb idea. We not only need to specify how the name, value, filename, and media types can be given as arguments, but we also need to specify how those things get written. Serde is not meant for that. Sorry for pushing on that idea.

Should I remove the existing serde related code then?

@seanmonstar
Copy link
Owner

Sorry, my time has been sucked up with trying to get the http crate ready for release by this weekend (RustConf)!

It probably makes sense to remove the serde stuff (again, so sorry!), since it can be confusing how it works, especially if someone tries to serialize a file or something. As for the builder, I think we can start with by mixing the builder you made and adding some convenience methods to MultipartRequest, so that hopefully in most cases, people can use the one type, and we will leave the MultipartField as a way to customize all the knobs. If in the future a trait way seems better, well, MultipartField can always implement the trait!

Don't worry, I'll pull your branch locally and make these changes myself (in fact, I'm half way done). I really do appreciate all the work you've put in here, and I want to get your contribution merged.

@seanmonstar
Copy link
Owner

Ok, nearly finished. I think the only question I have left is about naming and namespaces. With both a form-like struct and fields, is it better to hang them off the top level namespace, or keep them in a module? I can see benefits to both, so I'd welcome other's thoughts.

  • reqwest::Multipart and reqwest::MultipartField?
  • reqwest::multipart::Form and reqwest::multipart::Part (or Field)?

@e00E
Copy link
Contributor Author

e00E commented Aug 18, 2017

I do not see a practical difference but I subjectively prefer the module version because it groups the related types logically. Maybe you want to use reqwest::multipart:: but not all the other reqwest types.

@BW155
Copy link

BW155 commented Aug 20, 2017

For me, it looks like reqwest::Multipart would look better because you may get confused if you just have Part or Form in your code.

@seanmonstar seanmonstar mentioned this pull request Aug 21, 2017
@seanmonstar
Copy link
Owner

Ok, I've opened #190 that has this PR, plus the tweaks I mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants