Skip to content
This repository has been archived by the owner. It is now read-only.

Is there currently a way to act on parts of a text file as they are downloaded? #141

Closed
whereswaldon opened this issue Jun 30, 2016 · 8 comments

Comments

@whereswaldon
Copy link

whereswaldon commented Jun 30, 2016

I'm interested in streaming large text files and performing analysis on the text chunks as they come in. How would I go about doing this via IPFS?

@Ghoughpteighbteau
Copy link

Ghoughpteighbteau commented Jul 5, 2016

I'm no expert here, but it really comes down to what you expect from a stream. There are examples out there of streaming video over IPFS via HLS. That's basically just chunking files up into component parts and then downloading them in sequence. I'm pretty sure IPFS doesn't support "streaming" as it were because it has to validate that the content matches its address.

I'm pretty sure this is not what you're trying to do.

If you have the ability to chunk your file up like HLS, then problem solved, chunk your file up and create your own inventory of chunks, download and analyze like that.

If you don't, then you're going to have to analyze them out of order. Which means reconstructing the text document yourself.

~ $ ipfs object get QmZTWdd1ZXvCeAqa4qGmeWqth38mpkDtUGxFBKC2XCMmNM | jq
{
  "Links": [],
  "Data": "\b\u0002\u0012�\u0010## Some basics\n\nTo get started, we need to make sure ipfs has been initialized,\nif you havent done this yet:\n```\n$ ipfs init\n```\n\nNow, run the daemon:\n```\n$ ipfs daemon\n```\n\nNow that we have the daemon up, lets have some fun.\n\nBasic work with files in ipfs:\n```\n$ echo \"welcome to ipfs!\" > hello\n$ ipfs add hello\n```\n\nThat should have printed out something along the lines of:\n```\nadded qmxzzpcazv6tw1tvicf9poare9kkb1fwmzbvamytdwvshe hello\n```\n\nThat means that the file was successfully added into the ipfs datastore,\nand may be accessed through ipfs now.\n\nTo check, try:\n```\n$ ipfs cat qmxzzpcazv6tw1tvicf9poare9kkb1fwmzbvamytdwvshe\n```\n(Note: if your files hash was different in the first step, use your\nhash instead of mine)\n\n\nIf all went well, you should see the text from your file printed out to you!\n\nNow, lets try out a directory.\n```\n$ mkdir foo\n$ mkdir foo/bar\n$ echo \"hello\" > foo/bar/baz\n$ echo \"hello\" > foo/baz\n$ ipfs add -r foo\n```\n\nView all the things!\n```\n$ ipfs ls <hash foo>\n$ ipfs ls <hash foo>/bar\n$ ipfs cat <hash foo>/bar/baz\n$ ipfs cat <hash foo>/baz\n```\n\nSo, that lets you explore the ipfs filesystem pretty much in the same way you\nwould explore a standard unix filesystem (like ext4 or zfs). Now, lets do a few\nslightly more interesting things. `ipfs refs` will allow you to view blocks that\nare associated with a given hash. Lets try it out with the `foo` directory \nstructure we just made.\n\n```\n$ ipfs refs <hash foo>\n$ ipfs refs -r <hash foo>\n```\nNote that the `-r` option output not just the direct children of foo, but all\nof its decendants all the way down. `ipfs refs` has a few other really\ninteresting options, to learn more about them, run `ipfs refs --help`.\n\n\nAs you have seen `ipfs cat` is a great command to quickly retrieve and view\nfiles, but if the file you are requesting contains binary data (such as an image\nor movie) `ipfs get` might be more appropriate:\n```\n$ ipfs get -o cats.png <hashofcatpic>\n```\n\nThis will create a file named 'cats.png' that contains the data from the\ngiven hash.\n\nBy [whyrusleeping](http://github.com/whyrusleeping)\n\u0018�\u0010"
}

You'll have to manually reconstruct the links and put them in order, and deal with any weird word boundary issues that might come up as a result of unicode.

If I'm insane someone let me know. 😗

@Kubuxu
Copy link

Kubuxu commented Jul 5, 2016

You can also use files API.

ipfs files cp /ipfs/QM...AAA /myfile.txt
ipfs files read --offset=N --count=M /myfile.txt

Then you will read M bytes at offset N.

@whereswaldon
Copy link
Author

whereswaldon commented Jul 5, 2016

@Ghoughpteighbteau Thanks, especially for the tip on HLS. I didn't realize that the video streaming wasn't inherently baked-in to IPFS.

@Kubuxu This only works after you have downloaded the file though, yes? Is there a way to do the same thing while the download is going?

@Ghoughpteighbteau
Copy link

Ghoughpteighbteau commented Jul 5, 2016

The problem, I think, is that the way IPFS downloads is sorta like bittorrent. It chunks files up and constructs an acyclic graph of them. The data could theoretically be requested in order, you could do that by manually traversing the graph, but that's on you, IPFS, like bittorrent, is just going to grab any chunk whenever it's available. I think you have to write your own stuff that works with IPFS's plumbing if you want it in sequence.

Now that I've written all that. I guess it is possible. :shipit:

@Kubuxu
Copy link

Kubuxu commented Jul 5, 2016

@whereswaldon it will not download while file, only the chunks you access with ipfs files read. ipfs files cp is zero cost operation, it doesn't perform any download.

@whereswaldon
Copy link
Author

whereswaldon commented Jul 5, 2016

@Ghoughpteighbteau I actually don't care about the order of the chunks for my particular use case. I just wanted to act on them as they arrived. I'm potentially working with Gigabytes of text, and I'd like to start processing as soon as they arrive.

@Kubuxu Oh, okay. I'm not sure that this fits my particular use-case, since I doing that would require many sequential requests, but it's an excellent example of how to stream data where order is sensitive. Since I don't care about order very much, I think the other approach is somewhat more promising. Thanks for bringing this up though. I wonder now whether you couldn't build a streaming service just on top of that functionality.

@whyrusleeping
Copy link

whyrusleeping commented Jul 5, 2016

We are going to have a rudimentary pub-sub mechanism soon that will allow 'live' streaming of data. Most of the code to do this is there, but we're not solid on the api interface yet, so it hasnt merged.

@flyingzumwalt
Copy link
Contributor

flyingzumwalt commented May 23, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants