Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-octect multipart files lose their filename #662

Closed
schloerke opened this issue Aug 25, 2020 · 8 comments · Fixed by #663
Closed

Non-octect multipart files lose their filename #662

schloerke opened this issue Aug 25, 2020 · 8 comments · Fixed by #663

Comments

@schloerke
Copy link
Collaborator

schloerke commented Aug 25, 2020

file.bin

-----------------------------90908882332870323642673870272
Content-Disposition: form-data; name="sample_name"; filename="sample.tsv"
Content-Type: text/tab-separated-values

x	y	z
0.118413259	0.708780115	0.220421733
0.7442446	0.539953502	0.02358455
0.452216508	0.434335382	0.290591105
0.177870351	0.329585854	0.867134948
0.328333594	0.564684198	0.455520456
0.680873191	0.57619709	0.015501929
0.87209177	0.4487502	0.008787638
0.50966466	0.958683547	0.251500478
0.937677985	0.457557468	0.19860169
0.352366596	0.49283506	0.161614863
0.370598572	0.366986682	0.15623896

-----------------------------90908882332870323642673870272
Content-Disposition: form-data; name="sample_name"; filename="has_name.bin"
Content-Type: application/octet-stream

a
-----------------------------90908882332870323642673870272
Content-Disposition: form-data; name="sample_name"
Content-Type: text/tab-separated-values

x	y	z
0.118413259	0.708780115	0.220421733
0.7442446	0.539953502	0.02358455
0.452216508	0.434335382	0.290591105
0.177870351	0.329585854	0.867134948
0.328333594	0.564684198	0.455520456
0.680873191	0.57619709	0.015501929
0.87209177	0.4487502	0.008787638
0.50966466	0.958683547	0.251500478
0.937677985	0.457557468	0.19860169
0.352366596	0.49283506	0.161614863
0.370598572	0.366986682	0.15623896

-----------------------------90908882332870323642673870272
Content-Disposition: form-data; name="sample_name"; filename="sample3.tsv"
Content-Type: text/tab-separated-values

x	y	z
0.118413259	0.708780115	0.220421733
0.7442446	0.539953502	0.02358455
0.452216508	0.434335382	0.290591105
0.177870351	0.329585854	0.867134948
0.328333594	0.564684198	0.455520456
0.680873191	0.57619709	0.015501929
0.87209177	0.4487502	0.008787638
0.50966466	0.958683547	0.251500478
0.937677985	0.457557468	0.19860169
0.352366596	0.49283506	0.161614863
0.370598572	0.366986682	0.15623896

-----------------------------90908882332870323642673870272--

Test Code

bin_file <- "file.bin"
body <- readBin(bin_file, what = "raw", n = file.info(bin_file)$size)
parsed_body <- parse_body(body, "multipart/form-data; boundary=---------------------------90908882332870323642673870272", make_parser(c("multi", "tsv")))
❯❯ str(parsed_body)
List of 1
 $ sample_name:List of 10
  ..$ x           : num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
  ..$ y           : num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
  ..$ z           : num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...
  ..$ has_name.bin: raw 61
  ..$ x           : num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
  ..$ y           : num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
  ..$ z           : num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...
  ..$ x           : num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
  ..$ y           : num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
  ..$ z           : num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...

I would expect a nested structure like the output below to preserve the file information / separate out the chunks, rather than merge the chunks.

❯❯ str(parsed_body)
List of 1
 $ sample_name:List of 4
  ..$ sample.tsv
    ..$ x: num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
    ..$ y: num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
    ..$ z: num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...
  ..$ has_name.bin: List of 1
    ..$ : raw 61
  ..$
    ..$ x: num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
    ..$ y: num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
    ..$ z: num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...
  ..$ sample2.tsv
    ..$ x: num [1:11] 0.118 0.744 0.452 0.178 0.328 ...
    ..$ y: num [1:11] 0.709 0.54 0.434 0.33 0.565 ...
    ..$ z: num [1:11] 0.2204 0.0236 0.2906 0.8671 0.4555 ...

cc @nischalshrestha

@schloerke
Copy link
Collaborator Author

@meztez Do you think we should preserve the file names within each group of multipart data?

The ones that seem odd to me are the unnamed multipart item, such as the last two items.

Ex:

str(parsed_body)
  # named group with all filenames
  $ NAME1
    # named file
    $ FILE1
      $ CONTENT1a
      $ CONTENT1b
      $ CONTENT1c
    # named file
    $ FILE2
      $ CONTENT2a
      $ CONTENT2b
      $ CONTENT2c

  # named group with some file names
  $ NAME2
    # unnamed file
    $ 
      $ CONTENT1a
      $ CONTENT1b
      $ CONTENT1c
    # named file
    $ FILE2
      $ CONTENT2a
      $ CONTENT2b
      $ CONTENT2c
    # unnamed file
    $ 
      $ CONTENT3a
      $ CONTENT3b
      $ CONTENT3c

  # named group with all unnamed files
  $ NAME3
    # unnamed file
    $ 
      $ CONTENT1a
      $ CONTENT1b
      $ CONTENT1c
    # unnamed file
    $ 
      $ CONTENT2a
      $ CONTENT2b
      $ CONTENT2c
    # unnamed file
    $ 
      $ CONTENT3a
      $ CONTENT3b
      $ CONTENT3c

  # unnamed group with file name
  $ 
    $ FILE1
      $ CONTENT1a
      $ CONTENT1b
      $ CONTENT1c

  # unnamed group with no file names
  $ 
    $ 
      $ CONTENT1a
      $ CONTENT1b
      $ CONTENT1c

@meztez
Copy link
Collaborator

meztez commented Aug 26, 2020

@schloerke Maybe, yes? For your last two examples, I believe user should not expect plumber to turn garbage into gold.

@schloerke
Copy link
Collaborator Author

Reference: Returning Values from Forms: multipart/form-data https://tools.ietf.org/html/rfc7578

Q: Are names required?
A:

https://tools.ietf.org/html/rfc7578#section-4.2
Each part MUST contain a Content-Disposition header field [RFC2183]
where the disposition type is "form-data". The Content-Disposition
header field MUST also contain an additional parameter of "name";

... so no need to worry about the last two situations. Yay!

Q: How should duplicate names be handled?
A:

https://tools.ietf.org/html/rfc7578#section-5.2
Form parts with identical field names MUST NOT be
coalesced.

This makes me want to rethink our return structure. 🤔🤔

@meztez
Copy link
Collaborator

meztez commented Aug 26, 2020

See also section https://tools.ietf.org/html/rfc7578#section-4.3

@schloerke
Copy link
Collaborator Author

To match widely deployed implementations, multiple files MUST be sent
by supplying each file in a separate part but all with the same
"name" parameter.

Yup! And should still not be coalesced.

@schloerke schloerke mentioned this issue Aug 27, 2020
3 tasks
@meztez
Copy link
Collaborator

meztez commented Aug 27, 2020

Oh, I get it. Yeah, so plumber should never coalesce multipart that use the same name as you have shown above for tsv files. But would still coerce multiple files into a single named list?

@schloerke
Copy link
Collaborator Author

@meztez Ah! Good idea. I was trying to be too general.

Given #663, the upgraded webutils::parse_multipart() output will be stored in ret$body.

Then we could have ret$argsBody store a shape that is user friendly. 😄

@schloerke
Copy link
Collaborator Author

Latest #663 format

Example 1

> str(req$body)
List of 5
 $ files:List of 6
  ..$ value              : raw [1:1117] 89 50 4e 47 ...
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "image/png"
  ..$ name               : chr "files"
  ..$ filename           : chr "avatar2-small.png"
  ..$ parsed             : raw [1:1117] 89 50 4e 47 ...
 $ files:List of 6
  ..$ value              : raw 61
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "application/octet-stream"
  ..$ name               : chr "files"
  ..$ filename           : chr "text1.bin"
  ..$ parsed             : raw 61
 $ files:List of 6
  ..$ value              : raw 62
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "application/octet-stream"
  ..$ name               : chr "files"
  ..$ filename           : chr "text2.bin"
  ..$ parsed             : raw 62
 $ files:List of 6
  ..$ value              : raw 63
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "application/octet-stream"
  ..$ name               : chr "files"
  ..$ filename           : chr "text3.bin"
  ..$ parsed             : raw 63
 $ dt   :List of 4
  ..$ value              : raw [1:2] 7b 7d
  ..$ content_disposition: chr "form-data"
  ..$ name               : chr "dt"
  ..$ parsed             : Named list()
 - attr(*, "class")= chr "plumber_multipart"

> str(req$argsBody)
List of 2
 $ files:List of 4
  ..$ avatar2-small.png: raw [1:1117] 89 50 4e 47 ...
  ..$ text1.bin        : raw 61
  ..$ text2.bin        : raw 62
  ..$ text3.bin        : raw 63
 $ dt   : Named list()

Example 2

> str(req$body)
List of 4
 $ json:List of 4
  ..$ value              : raw [1:59] 7b 0a 20 20 ...
  ..$ content_disposition: chr "form-data"
  ..$ name               : chr "json"
  ..$ parsed             :List of 3
  .. ..$ a: int 2
  .. ..$ b: int 4
  .. ..$ c:List of 2
  .. .. ..$ w: int 3
  .. .. ..$ t: int 5
 $ img1:List of 6
  ..$ value              : raw [1:1117] 89 50 4e 47 ...
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "image/png"
  ..$ name               : chr "img1"
  ..$ filename           : chr "avatar2-small.png"
  ..$ parsed             : raw [1:1117] 89 50 4e 47 ...
 $ img2:List of 6
  ..$ value              : raw [1:1537] 89 50 4e 47 ...
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "image/png"
  ..$ name               : chr "img2"
  ..$ filename           : chr "ragnarok_small.png"
  ..$ parsed             : raw [1:1537] 89 50 4e 47 ...
 $ rds :List of 6
  ..$ value              : raw [1:211] 1f 8b 08 00 ...
  ..$ content_disposition: chr "form-data"
  ..$ content_type       : chr "application/rds"
  ..$ name               : chr "rds"
  ..$ filename           : chr "women.rds"
  ..$ parsed             :'data.frame':	15 obs. of  2 variables:
  .. ..$ height: num [1:15] 58 59 60 61 62 63 64 65 66 67 ...
  .. ..$ weight: num [1:15] 115 117 120 123 126 129 132 135 139 142 ...
 - attr(*, "class")= chr "plumber_multipart"

> str(req$argsBody)
List of 4
 $ json:List of 3
  ..$ a: int 2
  ..$ b: int 4
  ..$ c:List of 2
  .. ..$ w: int 3
  .. ..$ t: int 5
 $ img1:List of 1
  ..$ avatar2-small.png: raw [1:1117] 89 50 4e 47 ...
 $ img2:List of 1
  ..$ ragnarok_small.png: raw [1:1537] 89 50 4e 47 ...
 $ rds :List of 1
  ..$ women.rds:'data.frame':	15 obs. of  2 variables:
  .. ..$ height: num [1:15] 58 59 60 61 62 63 64 65 66 67 ...
  .. ..$ weight: num [1:15] 115 117 120 123 126 129 132 135 139 142 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants