Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blind Archive support. #20

Open
ghost opened this issue Jul 29, 2016 · 18 comments
Open

Blind Archive support. #20

ghost opened this issue Jul 29, 2016 · 18 comments

Comments

@ghost
Copy link

ghost commented Jul 29, 2016

I wanted to create a Handle independent of the zip module. I believe what I have is working currently. If you want I can create a pull request. If you want to see the code it is in my repo.

main = do
  let rubbishFileName = "rubbishfile"
  h <- openFile rubbishFileName ReadWriteMode
  removeFile rubbishFileName
  hSetBinaryMode h True          

  !leftovers <- createBlindArchive h $ do
    setArchiveComment "This archive is just a test"
    parseRelFile "./lmn/foo" >>= mkEntrySelector >>= addEntry Store "this is the file content"

  hSeek h AbsoluteSeek 0 

  arch <- openFile "archive.zip" ReadWriteMode
  hSetBinaryMode arch True          

  hGetContents h >>= hPutStr arch

  hClose arch
  print leftovers

I can safely write data to the archive w/o actually exposing it to the filesystem unless I want to. The hPutStr could just as well be to a socket or a conduit to an httpd service, etc.

@ghost
Copy link
Author

ghost commented Jul 29, 2016

For more possibilities for both handles from and to other processes that may have privileged access to archives that the current process lacks see: http://blog.varunajayasiri.com/passing-file-descriptors-between-processes-using-sendmsg-and-recvmsg

@ghost
Copy link
Author

ghost commented Jul 29, 2016

Todo: add blindCopyEntry so an open Handle to another archive can be solicited for information.

@mrkkrp
Copy link
Owner

mrkkrp commented Jul 30, 2016

And where are the archive contents till you write them into a file? In memory?


I'm not sure what is going on in your example, but the approach seems hackish.

@ghost
Copy link
Author

ghost commented Aug 1, 2016

No hack at all. The contents are in the filesystem. As long as the handle is maintained by at least one thread the data remains in the filesystem. No memory is involved. It is no different than any other file opened anywhere else with the exception that due to the unlink (remove) there exists no directory reference to the file.

As soon as the Handle is closed, or the thread/process exits, the file contents are freed by the filesystem. No cleanup necessary.

This leaves one free to create an archive on the fly in a blind/anonymous file. The file can be read or written to by any process/thread that has access to the Handle, which includes passing the Handle to other processes on the OS via sockets.

There is nothing new or 'hackish' about this idiom. It has been around for decades.

There are other applications for Handle passing via OS sockets that need not include unlinking the file from the directory structure. A server that can pass restricted archives to an unprivileged process by making the Handle available via OS socket, no copy of data required.

@ghost
Copy link
Author

ghost commented Aug 1, 2016

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 1, 2016

Thank you, I'll look into that.

@ghost
Copy link
Author

ghost commented Aug 1, 2016

@ghost
Copy link
Author

ghost commented Aug 1, 2016

Haskell has had support for Handle/fd passing via sockets for many years.

https://hackage.haskell.org/package/network-2.6.3.1/docs/Network-Socket.html#g:10

@ghost
Copy link
Author

ghost commented Aug 1, 2016

What follows is a working piece of code that uses createBlindArchive to create an archive from database documents, and then uploads the archive via Yesod. Once the hClose runs, the archive file vanishes from the filesystem. Had exceptions prevented hClose from being reached, as soon as the thread died the archive file and any contents would vanish.

data Document = Document { documentName :: FilePath
                         , cronos :: UTCTime
                         }

download :: FilePath -> [(Document, ByteString)] -> Handler TypedContent
download archivePath documents = do
  h <- liftIO $ do
    h <- openFile archivePath ReadWriteMode
    removeFile archivePath
    hSetBinaryMode h True          

    createBlindArchive h $ do
      setArchiveComment "This archive was created by Me!"
      forM_ documents
              (\(doc, payload) -> do
                 es <- mkEntrySelector =<< parseRelFile (documentName doc)
                 setModTime (cronos doc) es
                 addEntry Store payload es
              )
    hSeek h AbsoluteSeek 0
    pure h

  respondSource "application/zip" $ handleToBuild h

handleToBuild :: Handle -> Source (HandlerT site IO) (Flush DBB.Builder)
handleToBuild h = sourceHandle h =$= lumps
  where
    lumps = maybeM (liftIO $ hClose h) (\b -> yield (Chunk $ BB.insertByteString b) *> lumps) =<< await

maybeM :: (Applicative m) => m b -> (a -> m b) -> Maybe a -> m b
maybeM _             action (Just a) = action a
maybeM defaultAction _       Nothing = defaultAction

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 4, 2016

OK, you can go ahead with PR, but please preserve backward-compatibility in API.

@ghost
Copy link
Author

ghost commented Aug 4, 2016

Absolutely! I already have the code and it passes all of the prior
tests.
On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:

OK, you can go ahead with PR, but please preserve backward-
compatibility in API.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@ghost
Copy link
Author

ghost commented Aug 7, 2016

Would you like me to delay the PR until I add a set of tests to the
test suite or just get the working code to you first?
On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:

OK, you can go ahead with PR, but please preserve backward-
compatibility in API.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

  
  

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5
5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi
p","title":"mrkkrp/zip","subtitle":"GitHub
repository","main_image_url":"https://assets-cdn.github.com/images/mo
dules/aws/aws-
bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/asset
s/143418/15842166/7c72db34-2c0b-11e6-9aed-
b52498112777.png","action":{"name":"Open in
GitHub","url":"https://github.com/mrkkrp/zip"}},"updates":{"snippets"
:[{"icon":"PERSON","message":"@mrkkrp in #20: OK, you can go ahead
with PR, but please preserve backward-compatibility in
API."}],"action":{"name":"View
Issue","url":"#20 (comment)
237466433"}}}

@mrkkrp
Copy link
Owner

mrkkrp commented Aug 7, 2016

@robertLeeGDM, Let's first see what you've got.

@ghost
Copy link

ghost commented Jan 25, 2018

I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. ( sz <- liftIO $ (IO.hSeek h IO.SeekFromEnd 0 >> IO.hTell h) before seeking 0 )

@ghost
Copy link
Author

ghost commented Jan 25, 2018 via email

@ghost
Copy link
Author

ghost commented Jan 25, 2018 via email

@ghost
Copy link

ghost commented Jan 30, 2018

Memory usage is great in tests. I would emphasize to future users that they need to ensure the filesystem where the handle is created has to have enough space for the largest possible zip file the users expect to produce.

In the long run, an approach that doesn't use a filesystem, even blind, is probably more compatible with serving streaming zips from a web application. The drawback here is that users have to wait a long time before the download actually starts for larger zip files.

UPDATE:

  • http://gruffcode.com/2010/10/28/detecting-the-file-download-dialog-in-the-browser/ - to offset the delay before download starts
  • alternatively, one might be able to switch the http response to chunked transfer encoding to avoid having to provide a computed content length for OS X browser download, but this seems like a worse user experience as the progress indicator on download can't provide any information

Update 2:
After a few months in production, one of our user's chrome browsers gives up when the initial response takes too long. I started implementing async + browser poll approach. My ideal would be to speed up zip generation and keep everything synchronous, but I am not sure if I am constrained by the speed of writing buffers to disk. I haven't explored chunked transfer encoding yet.

@ghost
Copy link
Author

ghost commented Feb 5, 2018

We are stuck with the fact that zip was not created with streaming in mind. Zip is it's own worst enemy when it comes to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant