Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart part's Content-Disposition with non-ascii filename fails to parse #5809

Closed
adamw opened this issue Jan 4, 2022 · 14 comments
Closed

Comments

@adamw
Copy link

adamw commented Jan 4, 2022

Using http4s 0.23.7.

Given the a simple service which parses the body as a multipart, which:

import cats.effect._
import org.http4s.blaze.server.BlazeServerBuilder
import org.http4s.dsl.io._
import org.http4s.headers.`Content-Disposition`
import org.http4s.server.Router
import org.http4s.{EntityDecoder, HttpRoutes, multipart}

import scala.concurrent.ExecutionContext

object Test2 extends IOApp {
  val service: HttpRoutes[IO] = HttpRoutes.of[IO] {
    case r @ POST -> Root / "file" =>
      implicitly[EntityDecoder[IO, multipart.Multipart[IO]]]
        .decode(r, strict = false)
        .value
        .flatMap {
          case Left(_) => BadRequest()
          case Right(mp) =>
            println("Got parts: " + mp.parts)
            mp.parts.foreach { part =>
              println("Part name: " + part.name)
              println("Part content disposition: " + part.headers.getWithWarnings[`Content-Disposition`])
            }
            Ok()
        }
    case _ =>
      BadRequest()
  }

  implicit val ec: ExecutionContext = scala.concurrent.ExecutionContext.Implicits.global

  override def run(args: List[String]): IO[ExitCode] = {
    BlazeServerBuilder[IO]
      .withExecutionContext(ec)
      .bindHttp(8080, "localhost")
      .withHttpApp(Router("/" -> service).orNotFound)
      .resource
      .use { _ => IO.never }
      .as(ExitCode.Success)
  }
}

And invocation:

curl -v -X 'POST' 'http://localhost:8080/file' -H 'accept: text/plain' -H 'Content-Type: multipart/form-data' -F 'file=@中文文件名.json;type=application/json'

The result on stdout is:

Got parts: Vector(Part(Headers(Content-Disposition: form-data; name="file"; filename="中文文件名.json", Content-Type: application/json),Stream(..)))
Part name: None
Part content disposition: Some(Left(NonEmptyList(org.http4s.ParseFailure: Invalid Content-Disposition header: Error(34,NonEmptyList(InRange(34,","))))))
@aeons
Copy link
Member

aeons commented Jan 4, 2022

According to the spec, the filename parameter must be encoded using ISO-8859-1. If characters outside that are required, it should be set in the filename* parameter.

So this is a question of how strictly we should follow the spec.

@counter2015
Copy link

After some searching, I found these

#312

#1767

It seems works well in http4s 0.21.19, any changed after that ?

@ritschwumm
Copy link

isn't filename* deprecated by now? i think i read about this "somewhere on the internet" :/

@adamw
Copy link
Author

adamw commented Jan 4, 2022

@aeons I'm getting the same when sending through postman, though then the content-disposition header is a bit different. Still there's a parsing error, though:

Got parts: Vector(Part(Headers(Content-Disposition: form-data; name="file"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json, Content-Type: application/json),Stream(..)))
Part name: None
Part content disposition: Some(Left(NonEmptyList(org.http4s.ParseFailure: Invalid Content-Disposition header: Error(34,NonEmptyList(InRange(34,","))))))

maybe if filename* is present, the parsing rules should be relaxed?
I'm doing a simple request using postman, no customisations or anything like that.

@counter2015
Copy link

counter2015 commented Feb 18, 2022

@aeons In spec

Many user agent implementations predating this specification do not understand the "filename*" parameter. Therefore, when both "filename" and "filename*" are present in a single header field value, recipients SHOULD pick "filename*" and ignore "filename". This way, senders can avoid special-casing specific user agents by sending both the more expressive "filename*" parameter, and the "filename" parameter as fallback for legacy recipients (see Section 5 for an example).

When sending request from Postman, it generate filename* correctly, but the parser failed to parse it.

import org.http4s.headers.`Content-Disposition`.parse

val str = """form-data; name="b"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json"""

parse(str)

val res0: org.http4s.ParseResult[org.http4s.headers.Content-Disposition] = Left(org.http4s.ParseFailure: Invalid Content-Disposition header: Error(31,NonEmptyList(InRange(31,","))))

maybe if filename* is present, the parsing rules should be relaxed?

Accroding the spec, should we use filename* instead of filename when it occurs ?

Above code works in http4s 0.21.31

parse(str)

val res0: org.http4s.ParseResult[org.http4s.headers.Content-Disposition] = Right(Content-Disposition: form-data; name="b"; filename="中文文件名.json"; filename*="UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json")

I'm not familiar with the parsing process , it seems that the parser implement is not the same as 0.23

  def CONTENT_DISPOSITION(value: String): ParseResult[`Content-Disposition`] =
    new Http4sHeaderParser[`Content-Disposition`](value) {
      def entry =
        rule {
          Token ~ zeroOrMore(";" ~ OptWS ~ Parameter) ~ EOL ~> {
            (token: String, params: collection.Seq[(String, String)]) =>
              `Content-Disposition`(token, params.toMap)
          }
        }
    }.parse

see also: akka/akka-http#1240

@counter2015
Copy link

isn't filename* deprecated by now? i think i read about this "somewhere on the internet" :/

I cant find it, could you provide any link? @ritschwumm

@counter2015
Copy link

counter2015 commented Sep 22, 2022

It seems the Content-Disposition header string is generate from here

implicit val headerInstance: Header[`Content-Disposition`, Header.Single] =
Header.createRendered(
ci"Content-Disposition",
v =>
new Renderable {
// https://datatracker.ietf.org/doc/html/rfc8187#section-3.2.1
private val attrChar = CharPredicate.AlphaNum ++ attrExtraSafeChars
// Adapted from https://github.com/akka/akka-http/blob/b071bd67547714bd8bed2ccd8170fbbc6c2dbd77/akka-http-core/src/main/scala/akka/http/scaladsl/model/headers/headers.scala#L468-L492
def render(writer: Writer): writer.type = {
val renderExtFilename =
v.parameters.get(ci"filename").exists(!safeChars.matchesAll(_))
val withExtParams =
if (renderExtFilename && !v.parameters.contains(ci"filename*"))
v.parameters + (ci"filename*" -> v.parameters(ci"filename"))
else v.parameters
val withExtParamsSorted =
if (withExtParams.contains(ci"filename") && withExtParams.contains(ci"filename*"))
TreeMap[CIString, String]() ++ withExtParams
else withExtParams
writer.append(v.dispositionType)
withExtParamsSorted.foreach {
case (k, v) if k == ci"filename" =>
writer << "; " << k << '=' << '"'
writer.eligibleOnly(v, keep = safeChars, placeholder = '?') << '"'
case (k @ ci"${_}*", v) =>
writer << "; " << k << '=' << "UTF-8''" << Uri.encode(
toEncode = v,
toSkip = attrChar,
)
case (k, v) => writer << "; " << k << "=\"" << v << '"'
}
writer
}
},
parse,
)
}

The original request seem like

curl --location --request POST 'http://localhost:8082/file' \
--form 'file=@"/Users/counter/Desktop/中文文件名.json"'

It wiil be convert to

Content-Disposition: form-data; name="file"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json, Content-Type: application/json

Well, it seem this idea is inspired by akka-http, so I try to reproduce this issue on akka-http(10.2.10)

import akka.actor.typed.ActorSystem
import akka.actor.typed.scaladsl.Behaviors
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._

import scala.io.StdIn

object Test extends App {

  implicit val system = ActorSystem(Behaviors.empty, "my-system")
  // needed for the future flatMap/onComplete in the end
  implicit val executionContext = system.executionContext

  val route =
    path("file") {
      post {
        fileUpload("file") {
          case (metadata, byteSource) =>

            complete(s"Filename: ${metadata.fileName}")
        }
      }
    }

  val bindingFuture = Http().newServerAt("localhost", 8082).bind(route)

  println(s"Server now online. Please navigate to http://localhost:8082/hello\nPress RETURN to stop...")
  StdIn.readLine() // let it run until user presses return
  bindingFuture
    .flatMap(_.unbind()) // trigger unbinding from the port
    .onComplete(_ => system.terminate()) // and shutdown when done
}

it seems works well

$ curl --location --request POST 'http://localhost:8082/file' \
--form 'file=@"/Users/counter/Desktop/中文文件名.json"' -vvvv

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:8082...
* Connected to localhost (127.0.0.1) port 8082 (#0)
> POST /file HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/7.79.1
> Accept: */*
> Content-Length: 211
> Content-Type: multipart/form-data; boundary=------------------------c0013cc1164d7df3
> 
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: akka-http/10.2.10
< Date: Thu, 22 Sep 2022 08:46:08 GMT
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 30
< 
* Connection #0 to host localhost left intact

Filename: 中文文件名.json

The header parser result seem like this

  import akka.http.scaladsl.model.headers.{`Content-Disposition` => cd}

  val res = cd.parseFromValueString("""form-data; name="file"; filename="中文文件名.json"""")

  // Right(Content-Disposition: form-data; filename="?????.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json; name="file")
  println(res)

And the code

https://github.com/akka/akka-http/blob/465fb573459707fd1d5e2203800d96775f0bf797/akka-http-core/src/main/scala/akka/http/impl/model/parser/ContentDispositionHeader.scala#L22-L31

  // http://tools.ietf.org/html/rfc6266#section-4.1
  def `content-disposition` = rule {
    `disposition-type` ~ zeroOrMore(ws(';') ~ `disposition-parm`) ~ EOI ~> { p =>
      val all = TreeMap(p: _*)
      // https://tools.ietf.org/html/rfc6266#section-4.3
      // when both "filename" and "filename*" are present in a single header field value,
      //   recipients SHOULD pick "filename*" and ignore "filename"
      all.get("filename*").map(fExt => all - "filename*" + ("filename" -> fExt)) getOrElse all
    } ~> (`Content-Disposition`(_, _))
  }

@aeons do you have time to take a look at it?

@danicheg
Copy link
Member

@counter2015 some time ago, I proposed these changes you described – #6475. But we have ended up with this #6485.

@counter2015
Copy link

@danicheg That sounds good! Was this commit merged into series/0.23 ?

@armanbilge
Copy link
Member

@counter2015 yes, it was first released in 0.23.13.
https://github.com/http4s/http4s/releases/tag/v0.23.13

@counter2015
Copy link

counter2015 commented Sep 27, 2022

Using http4s 0.23.7.

Given the a simple service which parses the body as a multipart, which:

import cats.effect._
import org.http4s.blaze.server.BlazeServerBuilder
import org.http4s.dsl.io._
import org.http4s.headers.`Content-Disposition`
import org.http4s.server.Router
import org.http4s.{EntityDecoder, HttpRoutes, multipart}

import scala.concurrent.ExecutionContext

object Test2 extends IOApp {
  val service: HttpRoutes[IO] = HttpRoutes.of[IO] {
    case r @ POST -> Root / "file" =>
      implicitly[EntityDecoder[IO, multipart.Multipart[IO]]]
        .decode(r, strict = false)
        .value
        .flatMap {
          case Left(_) => BadRequest()
          case Right(mp) =>
            println("Got parts: " + mp.parts)
            mp.parts.foreach { part =>
              println("Part name: " + part.name)
              println("Part content disposition: " + part.headers.getWithWarnings[`Content-Disposition`])
            }
            Ok()
        }
    case _ =>
      BadRequest()
  }

  implicit val ec: ExecutionContext = scala.concurrent.ExecutionContext.Implicits.global

  override def run(args: List[String]): IO[ExitCode] = {
    BlazeServerBuilder[IO]
      .withExecutionContext(ec)
      .bindHttp(8080, "localhost")
      .withHttpApp(Router("/" -> service).orNotFound)
      .resource
      .use { _ => IO.never }
      .as(ExitCode.Success)
  }
}

And invocation:

curl -v -X 'POST' 'http://localhost:8080/file' -H 'accept: text/plain' -H 'Content-Type: multipart/form-data' -F 'file=@中文文件名.json;type=application/json'

The result on stdout is:

Got parts: Vector(Part(Headers(Content-Disposition: form-data; name="file"; filename="中文文件名.json", Content-Type: application/json),Stream(..)))
Part name: None
Part content disposition: Some(Left(NonEmptyList(org.http4s.ParseFailure: Invalid Content-Disposition header: Error(34,NonEmptyList(InRange(34,","))))))

I just test code above on this version, it cant pass on latest version.

val http4s = "0.23.16"
val http4sBlaze = "0.23.12"


libraryDependencies ++= Seq(
  "org.http4s" %% "http4s-dsl" % http4s,
  "org.http4s" %% "http4s-blaze-server" % http4sBlaze,
  "org.http4s" %% "http4s-blaze-client" % http4sBlaze
)

it seems cant parse correctly, it's a regression here?

the request is generate from postman(without any customisations )

$ curl --location --request POST 'http://localhost:8080/file' \
--header 'Cookie: csrftoken=T1j0VnnNBI2HXYWjwBXR3j47wV5cGrQdRuI5Zu2BAFSJSYBDMQLY2eliy3j0Yqqn' \
--form 'file=@"/Users/counter/Desktop/中文文件名.json"'

And the server print this message to console

Got parts: Vector(Part(Headers(Content-Disposition: form-data; name="file"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json, Content-Type: application/json),Stream(..)))
Part name: None
Part content disposition: Some(Left(NonEmptyList(org.http4s.ParseFailure: Invalid Content-Disposition header: form-data; name="file"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json
                                  ^
expectation:
* must be char: '"')))

And on http4s 0.23.13, it complains that

Got parts: Vector(Part(Headers(Content-Disposition: form-data; name="file"; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json, Content-Type: application/json),Stream(..)))
Part name: None
Part content disposition: Some(Left(NonEmptyList(org.http4s.ParseFailure: Invalid Content-Disposition header: Error(34,NonEmptyList(InRange(34,","))))))

@segery
Copy link

segery commented Jan 12, 2023

Issue persists in 0.23.17

@counter2015
Copy link

I'm trying to invest the exception cause.

The parser implement is adpated from akka-http

The solution of akka to handle no-ascii character in filename is replace it to placeholder character.

https://github.com/akka/akka-http/blob/5a18cec28f72448c8573ee9efd43324099ab250e/akka-http-core/src/main/scala/akka/http/scaladsl/model/headers/headers.scala#L474-L501

In http4s, it has the same logic too, but seems not work fine.

implicit val headerInstance: Header[`Content-Disposition`, Header.Single] =
Header.createRendered(
ci"Content-Disposition",
v =>
new Renderable {
// https://datatracker.ietf.org/doc/html/rfc8187#section-3.2.1
private val attrChar = CharPredicate.AlphaNum ++ attrExtraSafeChars
// Adapted from https://github.com/akka/akka-http/blob/b071bd67547714bd8bed2ccd8170fbbc6c2dbd77/akka-http-core/src/main/scala/akka/http/scaladsl/model/headers/headers.scala#L468-L492
def render(writer: Writer): writer.type = {
val renderExtFilename =
v.parameters.get(ci"filename").exists(!safeChars.matchesAll(_))
val withExtParams =
if (renderExtFilename && !v.parameters.contains(ci"filename*"))
v.parameters + (ci"filename*" -> v.parameters(ci"filename"))
else v.parameters
val withExtParamsSorted =
if (withExtParams.contains(ci"filename") && withExtParams.contains(ci"filename*"))
TreeMap[CIString, String]() ++ withExtParams
else withExtParams
writer.append(v.dispositionType)
withExtParamsSorted.foreach {
case (k, v) if k == ci"filename" =>
writer << "; " << k << '=' << '"'
writer.eligibleOnly(v, keep = safeChars, placeholder = '?') << '"'
case (k @ ci"${_}*", v) =>
writer << "; " << k << '=' << "UTF-8''" << Uri.encode(
toEncode = v,
toSkip = attrChar,
)
case (k, v) => writer << "; " << k << "=\"" << v << '"'
}
writer
}
},
parse,
)
}

in line writer.eligibleOnly(v, keep = safeChars, placeholder = '?') << '"', the parser should replace all no-ascii character in filename field, but it cause exception ahead, I try to debug it, but cant find a effictive to debug line by line in lambda inside a implict val.

IMO, the ideal result of parser could be following:

  test("Content-Disposition header should work when `filename` contains no-ascii character but has `filename*` field") {
    val headerValue = """attachment; filename="中文文件名.json"; filename*=UTF-8''%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.json"""
    val parsedHeader = `Content-Disposition`.parse(headerValue)
    assertEquals(parsedHeader.map(_.filename), Right(Some("中文文件名.json")))
  }

but this would be very annoying to implement: we should first check the filename* and then handle filename with utf8 encode.

as a fallback, playframework choose parser result like this https://github.com/playframework/playframework/pull/8314/files

 "support UTF-8 encoded filenames in Content-Disposition headers" in {
      val tempFile: Path = JFiles.createTempFile("ScalaResultsHandlingSpec", "txt")
      try {
        withServer {
          import scala.concurrent.ExecutionContext.Implicits.global
          implicit val mimeTypes: FileMimeTypes = new DefaultFileMimeTypes(FileMimeTypesConfiguration())
          Results.Ok.sendFile(
            tempFile.toFile,
            fileName = _ => "测 试.tmp"
          )
        } { port =>
          val response = BasicHttpClient.makeRequests(port)(
            BasicRequest("GET", "/", "HTTP/1.1", Map(), "")
          ).head

          response.status must_== 200
          response.body must beLeft("")
          response.headers.get(CONTENT_DISPOSITION) must beSome(s"""inline; filename="? ?.tmp"; filename*=utf-8''%e6%b5%8b%20%e8%af%95.tmp""")
        }
      } finally {
        tempFile.toFile.delete()
      }
    }

@rossabaker
Copy link
Member

I hadn't seen this issue before #7419, but there's an extra complication here. filename* is defined for Content-Disposition as a response header, but explicitly forbidden in multipart, with no unambiguous alternative. I don't know how many multipart implementations use it anyway, but the user agents I'm dealing with do not, and there must be a solution that works in its absence.

@hamnis hamnis closed this as completed in d319c9b Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants