Blind Archive support. #20

ghost · 2016-07-29T00:41:01Z

I wanted to create a Handle independent of the zip module. I believe what I have is working currently. If you want I can create a pull request. If you want to see the code it is in my repo.

main = do
  let rubbishFileName = "rubbishfile"
  h <- openFile rubbishFileName ReadWriteMode
  removeFile rubbishFileName
  hSetBinaryMode h True          

  !leftovers <- createBlindArchive h $ do
    setArchiveComment "This archive is just a test"
    parseRelFile "./lmn/foo" >>= mkEntrySelector >>= addEntry Store "this is the file content"

  hSeek h AbsoluteSeek 0 

  arch <- openFile "archive.zip" ReadWriteMode
  hSetBinaryMode arch True          

  hGetContents h >>= hPutStr arch

  hClose arch
  print leftovers

I can safely write data to the archive w/o actually exposing it to the filesystem unless I want to. The hPutStr could just as well be to a socket or a conduit to an httpd service, etc.

ghost · 2016-07-29T01:20:00Z

For more possibilities for both handles from and to other processes that may have privileged access to archives that the current process lacks see: http://blog.varunajayasiri.com/passing-file-descriptors-between-processes-using-sendmsg-and-recvmsg

ghost · 2016-07-29T01:43:53Z

Todo: add blindCopyEntry so an open Handle to another archive can be solicited for information.

mrkkrp · 2016-07-30T16:32:37Z

And where are the archive contents till you write them into a file? In memory?

I'm not sure what is going on in your example, but the approach seems hackish.

ghost · 2016-08-01T17:31:12Z

No hack at all. The contents are in the filesystem. As long as the handle is maintained by at least one thread the data remains in the filesystem. No memory is involved. It is no different than any other file opened anywhere else with the exception that due to the unlink (remove) there exists no directory reference to the file.

As soon as the Handle is closed, or the thread/process exits, the file contents are freed by the filesystem. No cleanup necessary.

This leaves one free to create an archive on the fly in a blind/anonymous file. The file can be read or written to by any process/thread that has access to the Handle, which includes passing the Handle to other processes on the OS via sockets.

There is nothing new or 'hackish' about this idiom. It has been around for decades.

There are other applications for Handle passing via OS sockets that need not include unlinking the file from the directory structure. A server that can pass restricted archives to an unprivileged process by making the Handle available via OS socket, no copy of data required.

ghost · 2016-08-01T18:00:05Z

http://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket/

mrkkrp · 2016-08-01T18:08:40Z

Thank you, I'll look into that.

ghost · 2016-08-01T18:27:23Z

This is a related and useful technique:
http://stackoverflow.com/questions/14514997/reopen-a-file-descriptor-with-another-access/14515466#14515466

ghost · 2016-08-01T18:37:50Z

Haskell has had support for Handle/fd passing via sockets for many years.

https://hackage.haskell.org/package/network-2.6.3.1/docs/Network-Socket.html#g:10

ghost · 2016-08-01T22:47:56Z

What follows is a working piece of code that uses createBlindArchive to create an archive from database documents, and then uploads the archive via Yesod. Once the hClose runs, the archive file vanishes from the filesystem. Had exceptions prevented hClose from being reached, as soon as the thread died the archive file and any contents would vanish.

data Document = Document { documentName :: FilePath
                         , cronos :: UTCTime
                         }

download :: FilePath -> [(Document, ByteString)] -> Handler TypedContent
download archivePath documents = do
  h <- liftIO $ do
    h <- openFile archivePath ReadWriteMode
    removeFile archivePath
    hSetBinaryMode h True          

    createBlindArchive h $ do
      setArchiveComment "This archive was created by Me!"
      forM_ documents
              (\(doc, payload) -> do
                 es <- mkEntrySelector =<< parseRelFile (documentName doc)
                 setModTime (cronos doc) es
                 addEntry Store payload es
              )
    hSeek h AbsoluteSeek 0
    pure h

  respondSource "application/zip" $ handleToBuild h

handleToBuild :: Handle -> Source (HandlerT site IO) (Flush DBB.Builder)
handleToBuild h = sourceHandle h =$= lumps
  where
    lumps = maybeM (liftIO $ hClose h) (\b -> yield (Chunk $ BB.insertByteString b) *> lumps) =<< await

maybeM :: (Applicative m) => m b -> (a -> m b) -> Maybe a -> m b
maybeM _             action (Just a) = action a
maybeM defaultAction _       Nothing = defaultAction

mrkkrp · 2016-08-04T06:43:04Z

OK, you can go ahead with PR, but please preserve backward-compatibility in API.

ghost · 2016-08-04T18:36:52Z

Absolutely! I already have the code and it passes all of the prior
tests.
On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:

OK, you can go ahead with PR, but please preserve backward-
compatibility in API.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

ghost · 2016-08-07T19:04:46Z

Would you like me to delay the PR until I add a set of tests to the
test suite or just get the working code to you first?
On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:

OK, you can go ahead with PR, but please preserve backward-
compatibility in API.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5
5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi
p","title":"mrkkrp/zip","subtitle":"GitHub
repository","main_image_url":"https://assets-cdn.github.com/images/mo
dules/aws/aws-
bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/asset
s/143418/15842166/7c72db34-2c0b-11e6-9aed-
b52498112777.png","action":{"name":"Open in
GitHub","url":"https://github.com/mrkkrp/zip"}},"updates":{"snippets"
:[{"icon":"PERSON","message":"@mrkkrp in #20: OK, you can go ahead
with PR, but please preserve backward-compatibility in
API."}],"action":{"name":"View
Issue","url":"#20 (comment)
237466433"}}}

mrkkrp · 2016-08-07T19:17:31Z

@robertLeeGDM, Let's first see what you've got.

ghost · 2018-01-25T02:04:24Z

I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. ( sz <- liftIO $ (IO.hSeek h IO.SeekFromEnd 0 >> IO.hTell h) before seeking 0 )

ghost · 2018-01-25T20:43:38Z

I have the code for blind handles, and I have used it in commercial production for some while without a problem. I had submitted it as a pull request, but the code was not formatted in accord with the standards used in that package. I did say I was going to fix it, but I'm a bit clueless with github, and so if I did reformat it I'd probably bungle the pull request.

…

On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi p","title":"mrkkrp/zip","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/ass ets/143418/17495839/a5054eac-5d88-11e6-95fc- 7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent .com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa ***@***.*** in #20: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr p/zip/issues/20#issuecomment-360337340"}}}

ghost · 2018-01-25T20:49:43Z

https://github.com/robertLeeGDM/zip <<< See the 'blind' branch.

…

On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi p","title":"mrkkrp/zip","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/ass ets/143418/17495839/a5054eac-5d88-11e6-95fc- 7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent .com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa ***@***.*** in #20: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr p/zip/issues/20#issuecomment-360337340"}}}

ghost · 2018-01-30T02:17:23Z

Memory usage is great in tests. I would emphasize to future users that they need to ensure the filesystem where the handle is created has to have enough space for the largest possible zip file the users expect to produce.

In the long run, an approach that doesn't use a filesystem, even blind, is probably more compatible with serving streaming zips from a web application. The drawback here is that users have to wait a long time before the download actually starts for larger zip files.

UPDATE:

http://gruffcode.com/2010/10/28/detecting-the-file-download-dialog-in-the-browser/ - to offset the delay before download starts
alternatively, one might be able to switch the http response to chunked transfer encoding to avoid having to provide a computed content length for OS X browser download, but this seems like a worse user experience as the progress indicator on download can't provide any information

Update 2:
After a few months in production, one of our user's chrome browsers gives up when the initial response takes too long. I started implementing async + browser poll approach. My ideal would be to speed up zip generation and keep everything synchronous, but I am not sure if I am constrained by the speed of writing buffers to disk. I haven't explored chunked transfer encoding yet.

ghost · 2018-02-05T16:08:59Z

We are stuck with the fact that zip was not created with streaming in mind. Zip is it's own worst enemy when it comes to that.

mrkkrp added enhancement feature-request labels Aug 4, 2016

mrkkrp mentioned this issue Jun 8, 2017

Ability to work with zip file in memory #36

Closed

ghost mentioned this issue Jan 24, 2018

Re-implement download session to restore bounded memory usage databrary/databrary#271

Closed

ghost mentioned this issue Jan 25, 2018

Possibly zip size for no compression dylex/zip-stream#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blind Archive support. #20

Blind Archive support. #20

ghost commented Jul 29, 2016 •

edited by ghost

Loading

ghost commented Jul 29, 2016

ghost commented Jul 29, 2016

mrkkrp commented Jul 30, 2016 •

edited

Loading

ghost commented Aug 1, 2016 •

edited by ghost

Loading

ghost commented Aug 1, 2016

mrkkrp commented Aug 1, 2016

ghost commented Aug 1, 2016

ghost commented Aug 1, 2016 •

edited by ghost

Loading

ghost commented Aug 1, 2016

mrkkrp commented Aug 4, 2016

ghost commented Aug 4, 2016

ghost commented Aug 7, 2016

mrkkrp commented Aug 7, 2016

ghost commented Jan 25, 2018 •

edited by ghost

Loading

ghost commented Jan 25, 2018 via email

ghost commented Jan 25, 2018 via email

ghost commented Jan 30, 2018 •

edited by ghost

Loading

ghost commented Feb 5, 2018

Blind Archive support. #20

Blind Archive support. #20

Comments

ghost commented Jul 29, 2016 • edited by ghost Loading

ghost commented Jul 29, 2016

ghost commented Jul 29, 2016

mrkkrp commented Jul 30, 2016 • edited Loading

ghost commented Aug 1, 2016 • edited by ghost Loading

ghost commented Aug 1, 2016

mrkkrp commented Aug 1, 2016

ghost commented Aug 1, 2016

ghost commented Aug 1, 2016 • edited by ghost Loading

ghost commented Aug 1, 2016

mrkkrp commented Aug 4, 2016

ghost commented Aug 4, 2016

ghost commented Aug 7, 2016

mrkkrp commented Aug 7, 2016

ghost commented Jan 25, 2018 • edited by ghost Loading

ghost commented Jan 25, 2018 via email

ghost commented Jan 25, 2018 via email

ghost commented Jan 30, 2018 • edited by ghost Loading

ghost commented Feb 5, 2018

ghost commented Jul 29, 2016 •

edited by ghost

Loading

mrkkrp commented Jul 30, 2016 •

edited

Loading

ghost commented Aug 1, 2016 •

edited by ghost

Loading

ghost commented Aug 1, 2016 •

edited by ghost

Loading

ghost commented Jan 25, 2018 •

edited by ghost

Loading

ghost commented Jan 30, 2018 •

edited by ghost

Loading