Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload combinator #133

Closed
alpmestan opened this issue Jun 19, 2015 · 76 comments
Closed

File upload combinator #133

alpmestan opened this issue Jun 19, 2015 · 76 comments

Comments

@alpmestan
Copy link
Contributor

Just creating this issue to put some code I have written up here for discussion.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Files where

import Control.Monad.IO.Class
import Control.Monad.Trans.Either
import Control.Monad.Trans.Resource
import Data.ByteString.Lazy (ByteString)
import Network.Wai
import Network.Wai.Handler.Warp (run)
import Network.Wai.Parse
import Servant
import Servant.Server.Internal

-- Backends for file upload: in memory or in /tmp ?

data Mem
data Tmp

class KnownBackend b where
  type Storage b :: *

  withBackend :: Proxy b -> (BackEnd (Storage b) -> IO r) -> IO r

instance KnownBackend Mem where
  type Storage Mem = ByteString

  withBackend Proxy f = f lbsBackEnd

instance KnownBackend Tmp where
  type Storage Tmp = FilePath

  withBackend Proxy f = runResourceT . withInternalState $ \s ->
    f (tempFileBackEnd s)

-- * Files combinator, to get all of the uploaded files

data Files b

instance (KnownBackend b, HasServer api) => HasServer (Files b :> api) where
  type ServerT (Files b :> api) m =
    [File (Storage b)] -> ServerT api m

  route Proxy subserver req respond = withBackend pb $ \b -> do
    (_, files) <- parseRequestBody b req
    route (Proxy :: Proxy api) (subserver files) req respond

    where pb = Proxy :: Proxy b

type FilesMem = Files Mem
type FilesTmp = Files Tmp

-- test

type API = "files" :> FilesTmp :> Post '[JSON] ()
      :<|> Raw

api :: Proxy API
api = Proxy

server :: Server API
server = filesHandler :<|> serveDirectory "."

  where filesHandler :: [File FilePath] -> EitherT ServantErr IO ()
        filesHandler = liftIO . mapM_ ppFile

        ppFile :: File FilePath -> IO ()
        ppFile (name, fileinfo) = do
          putStrLn $ "Input name: " ++ show name
          putStrLn $ "File name: " ++ show (fileName fileinfo)
          putStrLn $ "Content type: " ++ show (fileContentType fileinfo)
          putStrLn $ "------- Content --------"
          readFile (fileContent fileinfo) >>= putStrLn
          putStrLn $ "------------------------"

app :: Application
app = serve api server

f :: IO ()
f = run 8083 app

along with this HTML file:

<form action="/files" method="post" enctype="multipart/form-data">
    Select a file: <input type="file" name="blah" />
    Select another one: <input type="file" name="foo" />
    <hr />
    <input type="submit" value="Upload" />
</form>

served through serveDirectory. Thoughts, comments?

@jhickner
Copy link

This is great! 👍

@alpmestan
Copy link
Contributor Author

One problem with this is that we can't target individual files by specifying the input name associated to them, with, say, a data File backend (inputname :: Symbol) combinator, but this is kind of on purpose... Let me explain.

  • With the current implementation I'm just calling out to some existing wai-code that reads the request body and decodes it. If we were to add a File combinator, we would either be decoding the request body and getting the file for every File mentionned, or would have to do something similar to what we do with ReqBody and memoize the decoded body. This is getting much trickier to handle.
  • If we have a File using the memory backend and another using /tmp, this would probably mess things up or do a lot of unnecessary work.

This is why I'm not sure we should include a File combinator. Shall I just put a PR together with just Files, using the code above and documenting it?

@alpmestan
Copy link
Contributor Author

Here's a version that doesn't "forget" about the inputs that were sent along with the files in the request body (remember, this is multipart/form-data):

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Files where

import Control.Monad.IO.Class
import Control.Monad.Trans.Either
import Control.Monad.Trans.Resource
import Data.ByteString.Lazy (ByteString)
import Network.Wai
import Network.Wai.Handler.Warp (run)
import Network.Wai.Parse
import Servant
import Servant.Server.Internal

-- Backends for file upload: in memory or in /tmp ?

data Mem
data Tmp

class KnownBackend b where
  type Storage b :: *

  withBackend :: Proxy b -> (BackEnd (Storage b) -> IO r) -> IO r

instance KnownBackend Mem where
  type Storage Mem = ByteString

  withBackend Proxy f = f lbsBackEnd

instance KnownBackend Tmp where
  type Storage Tmp = FilePath

  withBackend Proxy f = runResourceT . withInternalState $ \s ->
    f (tempFileBackEnd s)

-- * Files combinator, to get all of the uploaded files

data Files b

type MultiPartData b = ([Param], [File (Storage b)]) 

instance (KnownBackend b, HasServer api) => HasServer (Files b :> api) where
  type ServerT (Files b :> api) m =
    MultiPartData b -> ServerT api m

  route Proxy subserver req respond = withBackend pb $ \b -> do
    dat <- parseRequestBody b req
    route (Proxy :: Proxy api) (subserver dat) req respond

    where pb = Proxy :: Proxy b

type FilesMem = Files Mem
type FilesTmp = Files Tmp

-- test
type API = "files" :> FilesTmp :> Post '[JSON] ()
      :<|> Raw

api :: Proxy API
api = Proxy

server :: Server API
server = filesHandler :<|> serveDirectory "."

  where filesHandler :: MultiPartData Tmp -> EitherT ServantErr IO ()
        filesHandler (inputs, files) = do
          liftIO $ mapM_ ppFile files
          liftIO $ mapM_ print inputs

        ppFile :: File FilePath -> IO ()
        ppFile (name, fileinfo) = do
          putStrLn $ "Input name: " ++ show name
          putStrLn $ "File name: " ++ show (fileName fileinfo)
          putStrLn $ "Content type: " ++ show (fileContentType fileinfo)
          putStrLn $ "------- Content --------"
          readFile (fileContent fileinfo) >>= putStrLn
          putStrLn $ "------------------------"

app :: Application
app = serve api server

Now, this is in fact all equivalent to a new content type, for multipart/form-data. However, the fact that wai-extra forces us to do some IO to parse all of this (see parseRequestBody) gets in the way.

Thoughts? @jkarni any idea to make this a content type?

@mwotton
Copy link

mwotton commented Aug 4, 2015

fwiw, i'm using this now and while it works from curl, it doesn't in the browser for large files. Weirdly, if i use netcat to collect the request as raw text then use netcat to dump it into the server, it works fine - doing it normally gets 405 Method Not Allowed and some chunks of text from the input logged by Network.Wai.Middleware.RequestLogger.

@alpmestan
Copy link
Contributor Author

Interesting. I can upload files just fine in some app I'm working on. Do you think you could share a minimal app (I assume the one you're working on is for work, hence closed-source?) that has this problem? I could start from there and investigate what is going on. So just the HTML of your upload form along with the haskell app that receives the files.

@mwotton
Copy link

mwotton commented Aug 5, 2015

yep, i'll try to get that to you today.

@mwotton
Copy link

mwotton commented Aug 5, 2015

i think this is actually a warp thing, i get it when i use scotty too.

@codedmart
Copy link
Contributor

@mwotton what version of warp?

@mwotton
Copy link

mwotton commented Aug 5, 2015

3.1.0. testing with the latest now.

@codedmart
Copy link
Contributor

@mwotton there was an issue with 3.1.0. So try the lastest or downgrade to 3.0.*.

@mwotton
Copy link

mwotton commented Aug 5, 2015

ah, thanks, @codedmart - i'll try that.

@mwotton
Copy link

mwotton commented Aug 5, 2015

yep, 3.1.2 fixes it. wish i'd known that before i rewrote the app :)

@alpmestan
Copy link
Contributor Author

Ah, good to know!

@3noch
Copy link

3noch commented Oct 26, 2015

Perhaps you could release this as a package?

@alpmestan
Copy link
Contributor Author

Well, we would have to write instances for servant-client, servant-docs etc if we want to release this properly. It's quite a task...

@3noch
Copy link

3noch commented Oct 26, 2015

Make the version < 1. Something is better than nothing, IMHO. That is to say, this is awesome and I would like to use it.

@jkarni
Copy link
Member

jkarni commented Oct 26, 2015

In order to make this a content-type, we'd need to change the content-type machinery to allow IO. I think that makes sense.

And if we do that, we may not need instances for all the packages.

@alpmestan
Copy link
Contributor Author

@3noch You can use this! Simply drop the code in a module in your project, and you can use it =) I've put this combinator to work in several apps for work this way.

@3noch
Copy link

3noch commented Oct 26, 2015

@alpmestan Of course, I just wish I had found it on hackage instead of a GitHub issue. Also, I could submit a PR to a repo if it existed. 😀

@alpmestan
Copy link
Contributor Author

Yeah we should probably put some page together with combinators like this one and some instances. On the github wiki or in the servant-examples package or the site or something?

@bb010g
Copy link

bb010g commented Nov 27, 2015

👍

@i-am-the-slime
Copy link

Is this still the best way to achieve file upload?

@alpmestan
Copy link
Contributor Author

I'm afraid so, but the work done by @fizruk here could probably be merged eventually, once the issues raised there are fixed. I'm sure if one or more people give this PR some love, that can happen :)

@i-am-the-slime
Copy link

I have it working. I am trying to capture the text and upload at the same time. How do I do that:
:<|> "plants" :> Capture "plant_id" Text :> "pictures" :> Capture "picture_id" Text :> FilesTmp :> Post '[JSON] () ? This does not seem to work.

@codedmart
Copy link
Contributor

@i-am-the-slime Did you look at this version: #133 (comment). It handles the form data and the files.

@i-am-the-slime
Copy link

Thanks @codedmart it works. However for a test I now seem to need a HasClient instance. How and where should I define that?

@alexanderkjeldaas
Copy link
Contributor

Is there a version of this that works with 0.5?

@soenkehahn
Copy link
Contributor

@schell: Yes, that's correct.

@gdeest
Copy link
Contributor

gdeest commented Aug 2, 2016

I tried to update the combinator above for Servant 0.8. As far as I can tell, the change is not that simple as the route method in the HasServer class does not have access to the request object anymore.

Any suggestion as to how I might proceed ?

@gdeest
Copy link
Contributor

gdeest commented Aug 2, 2016

Nevermind, looking at Servant.Server.Internal gave me the answer. I still hope this combinator will be officially included in Servant, though !

@alpmestan
Copy link
Contributor Author

Well, it's "just" a matter of writing all interpretations for it =) The annoying thing is that it basically just is a content type, but one that needs IO to decode from, and it doesn't feel right. Those are the 2 main reasons this isn't shipped in servant today.

@bb010g
Copy link

bb010g commented Aug 4, 2016

There's no way around IO without avoiding parseRequestBody though, right?

@alpmestan
Copy link
Contributor Author

right. and we do want some IO because files get created using that functionand the /tmp backend.

@soenkehahn
Copy link
Contributor

Well, it's "just" a matter of writing all interpretations for it =)

I think this is the wrong approach. If we only allow new combinators to pop up when they provide instances for all core interpretations, we're basically asking people to write code that they don't even want to use themselves. Which we should strongly avoid.

@gaeldeest (or anyone else): Have you considered publishing this combinator as a separate package? Then you could just include the one (servant-server) interpretation that you care about right now. And people wouldn't have to copy the combinator from this issue discussion into their code to use it.

Re: Allowing combinators to do IO: Even if most combinators don't want (and shouldn't) use IO I think we should still allow IO if there's one good use-case for that. (Which file-upload is.) So I'd be in favor of changing that.

@relrod
Copy link

relrod commented Aug 17, 2016

+1, please publish this as a package or get it merged into servant and worry about the other instances later. Pretty please!

@jkarni
Copy link
Member

jkarni commented Aug 18, 2016

Having played around with @fizruk 's branch, I now think allowing unrender to do IO is the wrong approach, for the following reasons:

  • We're skating on lazy-IO thin ice, since we have very little control or understanding of how much of the request body will be in memory at any point
    • We can't use wai-extra, since even it's more internal functions expect an IO strict bytestring, and if we convert our lazy bytestring to strict I'm pretty sure we'd be putting everything into memory.
    • The content-type doesn't necessarily get to decide the storage (e.g., it should be possible to keep even mutlipart-formdata in memory, and other content-types in disk).
    • Doing IO in mimeUnrender is anyhow ugly conceptually.

So I'm jumping ships and saying that the approach @alpmestan outlined is better.

@fizruk
Copy link
Member

fizruk commented Aug 19, 2016

@jkarni I am not sure which of @alpmestan's approaches you refer to.
It appears to me that in this thread he hoped #343 will be merged eventually.

What are the alternatives to IO-enabled mimeUnrender?

@soenkehahn
Copy link
Contributor

@fizruk: I'm also slightly confused, but I think @jkarni is referring to the topmost comment here in this issue. I haven't tried it myself, but it looks like it works without #343.

@alpmestan
Copy link
Contributor Author

The alternative is to just have a Files combinator and not pretend we support the multipart content type. Basically provide just enough to enable users to have file upload. @jkarni correct me if I'm wrong.

@codedmart
Copy link
Contributor

Multipart needs to be supported though.

@alpmestan
Copy link
Contributor Author

alpmestan commented Aug 20, 2016

Right, I didn't mean that we shouldn't support multipart at some point. But I think people really just want the code shown in this ticket to be available on hackage and ready to use in any servant app, for now. Later on, when someone feels brave enough, we could have proper multipart support but this means reimplementing the code from wai-extra, even though there's still the question then of how to do it without IO.

@codedmart
Copy link
Contributor

@alpmestan As long as I have the ability to still handle mutlipart myself then I am good with whatever.

@alexanderkjeldaas
Copy link
Contributor

Will the solution discussed here support the servant-client case where I want to do:

  $ curl -XPOST 'https://somewhere/foo' \
   -i -L \
   -H "Authorization: Bearer $TOKEN" \
   -H "Content-Type: audio/wav" \
   --data-binary "@sample.wav"

@bollmann
Copy link

bollmann commented Oct 12, 2016

Hi,

I've been playing with @alpmestan's approach for handling file uploads using the suggested Files combinator. A minimal-complete example that is (partially) working for me on servant-8.1 is the following slight variation of the code snippets already posted in this thread:

{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE DataKinds             #-}
{-# LANGUAGE TypeOperators         #-}
{-# LANGUAGE ScopedTypeVariables   #-}
{-# LANGUAGE TypeFamilies          #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RecordWildCards       #-}
module Main where

import           Control.Monad.Trans.Resource
import           Control.Monad.Except

import           Data.Monoid
import qualified Data.ByteString.Lazy as LS
import           Data.Text (Text)
import qualified Data.Text as Text
import qualified Data.Text.Encoding as Text

import           Lucid

import           Network.Wai.Parse
import           Network.Wai.Handler.Warp hiding (FileInfo)
import           Servant
import           Servant.HTML.Lucid
import           Servant.Server.Internal
import           System.Directory
import           System.FilePath

data Mem
data Tmp

class KnownBackend b where
  type Storage b :: * -- associated type family

  withBackend :: Proxy b -> (BackEnd (Storage b) -> IO r) -> IO r

instance KnownBackend Mem where
  type Storage Mem = LS.ByteString

  withBackend Proxy f = f lbsBackEnd

instance KnownBackend Tmp where
  type Storage Tmp = FilePath

  withBackend Proxy f = runResourceT . withInternalState $ \s ->
    f (tempFileBackEnd s)

data Files b

type MultiPartData b = ([Param], [File (Storage b)])

instance (KnownBackend b, HasServer sublayout config)
      => HasServer (Files b :> sublayout) config where

  type ServerT (Files b :> sublayout) m =
    IO (MultiPartData b) -> ServerT sublayout m

  route Proxy config subserver =
    route psub config (addBodyCheck subserver check)
    where
      psub  = Proxy :: Proxy sublayout
      pbak  = Proxy :: Proxy b
      check = withRequest $ \request -> return $ withBackend pbak $
        \backend -> parseRequestBody backend request

type API = "form" :> Get '[HTML] (Html ())
      :<|> "upload" :> Files Mem :> Post '[PlainText] Text

-- | Some form to upload an image and a video file.
formHandler :: ExceptT ServantErr IO (Html ())
formHandler = return $
  html_ $ do
    head_ (title_ "upload files test!")
    body_ $ do
      h1_ "upload files test"
      form_ [ action_ "/upload"
            , method_ "POST"
            , enctype_ "multipart/form-data" ] $ do
        input_ [ type_ "file", name_ "media" ]
        input_ [ type_ "submit"
               , name_ "send"
               , value_ "Submit Data!" ]

-- | Handle the uploaded image and video files.
uploadHandler :: IO (MultiPartData Mem) -> ExceptT ServantErr IO Text
uploadHandler multipart = do
  liftIO $ putStrLn "handling file upload..."
  (params, files) <- liftIO multipart

  -- when using `MultiPartData Mem`, use `wrFile` to write the files from memory to disk.
  -- when using `MultiPartData Tmp`, use `cpFile` to copy the temporarily uploaded files 
  -- to some other location.
  liftIO $ mapM_ wrFile files
  return $
    "params:\n" <> Text.intercalate "\n" (map ppParam params) <> "\n" <>
    "files:\n"  <> Text.intercalate "\n" (map ppFile files)
  where
    ppParam (name, val) =
      "  name  = " <> Text.decodeUtf8 name <> "\n" <>
      "  value = " <> Text.decodeUtf8 val  <> "\n"

    ppFile (paramName, FileInfo{..}) =
      "  parameter name = "  <> Text.decodeUtf8 paramName       <> "\n" <>
      "  fileName = "        <> Text.decodeUtf8 fileName        <> "\n" <>
      "  fileContentType = " <> Text.decodeUtf8 fileContentType <> "\n" <>
      "  fileContent = "     <> "..."                           <> "\n"
    wrFile (_, FileInfo{..}) = LS.writeFile newFileName fileContent
      where newFileName = (Text.unpack . Text.decodeUtf8) fileName <.> "copy"
    cpFile (_, FileInfo{..}) = copyFile oldFileName newFileName
      where
        oldFileName = fileContent
        newFileName = (Text.unpack . Text.decodeUtf8) fileName <.> "copy"

server :: Server API
server = formHandler :<|> uploadHandler

main :: IO ()
main = run 8888 (serve (Proxy :: Proxy API) server)

However, the above snippet exhibits the following two issues:

  1. While the code works for MultiPartData Mem (and using the wrFile function in uploadHandler), it doesn't seem to work for MultiPartData Tmp (and using the cpFile helper in uploadHandler). For the latter, a file seems to get uploaded temporarily to the /tmp directory, but by the time cpFile in uploadHandler wants to copy it to some other location, the file doesn't exist anymore, thus causing cpFile to fail with a source file does not exist exception.
  2. Even in the MultipartData Mem setting, the above snippet seems only able to upload files up to a certain size. For example, uploading video files with sizes between 50-100MB just silently fails (not even hitting the uploadHandler at all).

Has anyone observed either one of these issues? And if so, does anyone know what is happening here and, at best, how to resolve and overcome any of the issues?

Thoughts on this would be great!

@alpmestan
Copy link
Contributor Author

In my first shot at file upload, withBackend used to wrap the execution of the entire multipart decoding + disk writing + handler chain. If you take a look at the code you can see that the file is registered for deletion, and it's not exactly clear to me what the guarantees are wrt the lifetime of the uploaded file. Ideally, we would need a saner implementation of multipart that would be a good old content type or at least an implementation that either:

  • lets the handler decide when/if the uploaded file from /tmp should be deleted
  • deletes the file in /tmp at the end of the execution, if it's still there, in which case a user really should copy or move the said file inside the handler ; otherwise the content is lost.

In your case, AFAICT, we're only guaranteed that the tmp file is there during the decoding and writing to /tmp. I really wish we had a solid multipart encoding/decoding library; this would allow:

  • proper support for /tmp backend that doesn't have the flaws I've mentionned above
  • file upload support in servant-client! we could hand a few FilePath or (Filename, FileContent) or something like that to the client function and it would do everything for us "automagically"

There's a possibility that I am misunderstanding the issue, but that looks like the most plausible explanation to me.

@rimmington
Copy link

@alpmestan @bollmann The temp files are deleted at the end of runResourceT, which is why the implementation in #133 (comment) uses the MultiPartDataT type. I'm currently doing something similar.

@bollmann
Copy link

@alpmestan, @rimmington: Thanks for your replies! Indeed, using the MultiPartDataT data allows one to use the Tmp backend as well, which solves my first problem. Furthermore, my second problem was unrelated to the Files combinator, but rather had to to with my improper use of BasicAuth, so nvm. :-)

@alpmestan
Copy link
Contributor Author

alpmestan commented Oct 16, 2016

For the record, I've been working on packaging up the multipart/form-data-powered upload. It does a little bit more than all the code we've written in this ticket and doesn't have the issue reported by @bollmann, without exposing a continuation. I still have a few things to add there and have to think about making this as nice and simple to use as possible, but I'll drop a comment here once it's ready. The repo's here.

Hopefully we'll soon be able to close one of (if not the) oldest open issues in the tracker =)

@pmiddend
Copy link

pmiddend commented Jan 6, 2017

Any updates on this feature? I'd really like to have file uploads in servant!

@alpmestan
Copy link
Contributor Author

Well, I'd appreciate feedback on https://github.com/haskell-servant/servant-multipart if anyone has got some time for looking at it. I might get back to it soon and add support for in-memory handling of file upload and cut a first release. It requires these patches for servant, which I yet have to wrap up and add tests for. If anyone wants to give some feedback or even help with these tasks, that'd be very much appreciated :)

@phadej phadej modified the milestone: 0.10 Jan 13, 2017
@alpmestan
Copy link
Contributor Author

For the record, servant got the necessary patches and https://github.com/haskell-servant/servant-multipart should be ready for use. It's missing support for in-memory uploads but could be released as it stands.

@jkarni
Copy link
Member

jkarni commented Mar 1, 2017

It looks good! I'd say release it! Though maybe it should be split into servant-multipart and servant-multipart-server (so that afterwards we can add -client, -docs, etc. in a backwards compatible way)?

As for the feedback - it's well documented and the API seems very nice! I'd mention something about ReqBody not being usable in conjunction with this on the same path. Also, tests :)

Primarily, though, release it!

@alpmestan
Copy link
Contributor Author

Uploaded as it is on hackage! Here

I hope nobody here minds that I close this issue, after having started it almost 2 years ago. Any multipart/upload related discussion should now happen on the issue tracker for servant-multipart, over here. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests