Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreunix, mfs, and unixfs are closely coupled to merkledag #5488

Open
rob-deutsch opened this issue Sep 19, 2018 · 13 comments
Open

coreunix, mfs, and unixfs are closely coupled to merkledag #5488

rob-deutsch opened this issue Sep 19, 2018 · 13 comments

Comments

@rob-deutsch
Copy link
Contributor

rob-deutsch commented Sep 19, 2018

Background

I have been hacking away on IPFS, trying to build an application on top of it, but I've found it quite challenging. In this issue I'd like to share my observations and get feedback.

I'm trying to build filesystem-like application, so I'm trying to utilise as much of go-ipfs/core/coreunix, go-mfs, go-unixfs, and go-merkledag as possible. Its desirable for all these packages to be utilised, because they are handy building blocks for other IPFS app devs.

My observation

The most challenging desire I have is to use my own custom versions of go-merkledag.ProtoNode and go-merkledag.RawNode. This is not as easy as it should be.

What I'm seeing is that go-ipfs and its dependancies are tightly coupled to the go-merkledag.ProtoNode and go-merkledag.RawNode types. And this is preventing me from using my own custom types.

I've seen this occur in two ways:

  1. Many places in the codebase specifically build ProtoNodes and RawNodes. It's tricky to have them build anything else.

  2. Both ProtoNode and RawNode are often wrapped in the restrictive go-ipld-format.Node type, and are often casted back. This is done by either explitly casting to a type, or a switch statement that errors if the ipld.Node isn't a ProtoNode or RawNode

How I'm handling it

I've spent many hours digging through the codebase, trying to implement various solution, to no significant success. 30 minutes ago I realised what my 'best' option was - I need to use the existing IPFS codebase to generate ipld.Node interfaces, and then pass it to my code which will implement the same not-ideal type casting to ProtoNodes and RawNodes that I can then translate to my own nodes...

This is really not ideal, because I'll end up having to reinvent a lot of wheels, such as the way that coreunix.Adder (which I'll use to buld my ipld.Nodes) interacts with a Pinner, Blockstore, and maybe even DAGService.

Any thoughts?

@magik6k
Copy link
Member

magik6k commented Sep 19, 2018

In hopefully near future go-ipfs/core/coreunix will get mostly replaced by coreapi.Unixfs() (from core/coreapi).

Can you tell a bit more about your use-case and why you need to dive to those lower-level layers? (just a lack of unified interface?)

@schomatis
Copy link
Contributor

Agreed with pretty much everything you say, also interested in knowing a bit more about your use case and the code you're building.

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Sep 20, 2018

Thanks for the info on coreapi.Unixfs(). I just took a look at the code in master which is seeming like a work in progress. What's the best way to get up to speed on it? Is there a specific branch or issue to take a look at? (I searched but didn't come up with much)

The use-case is a tad tricky to explain concisely, because there's two ways it can be seen 1) the thing I'm trying to build, 2) the exact personal problem I have that I'm trying to solve. But I'll give it a shot...

The short story

I want to build private AWS S3-style buckets on top of IPFS. I'm naming these 'vaults'. They're basically directories of files, and they're 'private' because they're encrypted.

Motivation

I like to share files between my 3 computers (cloud server, laptop, and phone). The cloud server obtains the files, but I want to have the files available on my laptop and phone.

My current setup is: The cloud server obtains the files and caches them, I periodically rsync them to my laptop and delete them from the cloud server. If I ever need to get them onto my phone its a PITA.

There are two annoyances with with getting them onto my phone 1) They could be on my cloud server (if I haven't rsync'd yet) or my laptop, and 2) Regardless of where they are, getting them to my phone is a pain.

IPFS could solve this natively, but these are private files, and I don't want other peers to read them. I could achieve this by restricting which peers my nodes will send the blocks too, but that's a pain. So lets use encryption.

The solution

I COULD achieve this by just encrypting the file before making them available on IPFS, but there are a few practical problems with this (e.g. hiding the filenames) and I want something that acts with minimal effort on my part (e.g. as transparent as possible).

So my idea was to reuse everything already in IPFS, but encrypt the blocks with AES256 in my Blockstore and when I send them out over Bitswap.

The implementation

In order to achieve this , I want all of my IPFS nodes to have a store of secrets. Each secret will be (friendly name, secret, fingerprint of secret). In this example lets say all 3 of my computers just have a single secret like (myvault, aes256 key, sha256 of aes256 key).

Every block would them be encrypted into something like the following format (details TBD):

finger print of secret aes key , aes256(nonce , length of raw block , multicodec of block , raw block)

That way I can distribute my files between my 3 computers without worrying about anyone else getting a hold of the blocks. Want to request them from one of my nodes? Fine, go ahead, I don't care. They'll do that, but they'll also be contributing to the broader DHT etc.

How the implementation has gone so far

Adding to the go-ipfs code such that it could handle these encrypted blocks was easy(ish).

Adding to the go-ipfs code such that it would generate these encrypted blocks is very tough. coreunix.Adder just steams ahead creating ProtoNode and RawNode and taking the CID and pushing them out to the network.

I want to yell at coreunix.Adder, "just tell me the raw data you want in the raw block, and let met tell you what CID you should use to get the raw data (because its actually going to come from an encrypted block), and then of course let me push it out to the BlockService"

@magik6k
Copy link
Member

magik6k commented Sep 20, 2018

Have you tried setting custom fileAdder.CidBuilder? It seems to be what you want:

type Builder interface {
	Sum(data []byte) (Cid, error)
	GetCodec() uint64
	WithCodec(uint64) Builder
}

Example implementation: https://github.com/ipfs/go-cidutil/blob/master/inline.go

Integrating the read part will be likely much trickier, involving lots of poking in go-unixfs

@schomatis
Copy link
Contributor

I COULD achieve this by just encrypting the file before making them available on IPFS, but there are a few practical problems with this (e.g. hiding the filenames) and I want something that acts with minimal effort on my part (e.g. as transparent as possible).

Also, maybe this encryption layer can be of use: https://github.com/jbenet/ipfs-senc.

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Sep 20, 2018

Have you tried setting custom fileAdder.CidBuilder?

I have considered this, but it doesn't work. Firstly, in addition to a custom fileAdder.CidBuilder I'll also need a custom BlockService/DAGService. This can be done, but the problem is that they both somehow need to know either:

a) A nonce that's added inside the encrypted block
b) An IV at the beginning of the block

The read part was actually easy to implement. I've already done it. It just required some additions to go-ipld-format, go-merkledag and an additional package I named go-ipld-aes. I made it so that go-unixfs just thinks its dealing with regular ProtoNode/RawNode. Of course, this too would be much more elegant if parts of the codebase were decoupled from go-merkledag.

Also, maybe this encryption layer can be of use: https://github.com/jbenet/ipfs-senc.

I didn't know about ipfs-senc. Thanks!

Unfortunately, it's not what I want. It tar's an entire directory into a single file.

I want to retain all of the cool DAG functionality of "normal" IPFS. I want to achieve this by just encrypting individual blocks. The main one I need in my use case is the ability to treat it as a folder that I can add files to without deleting the old files.

@schomatis
Copy link
Contributor

I want to retain all of the cool DAG functionality of "normal" IPFS. I want to achieve this by just encrypting individual blocks. The main one I need in my use case is the ability to treat it as a folder that I can add files to without deleting the old files.

I don't fully understand (but you don't need to answer this) why this needs to be implemented at the block level and not at the UnixFS/MFS layers adding some kind of encrypted file type. Proto/raw nodes (seems to me) are more about how we cut a file up for convenience of transport and storage but I would encrypt the source (file) instead of the bit streams generated from it.

@rob-deutsch
Copy link
Contributor Author

It could be done at that level, but I don't think its the right way to do it.

I'm not entirely sure what type of implementation you've got in mind, but the biggest issue I see is "how do you also encrypt the file names?"

@schomatis
Copy link
Contributor

but the biggest issue I see is "how do you also encrypt the file names?"

Good point, that is stored at the DAG level, you'd also need to implement your own type of MFS directory that would store the name of its files as part of its content instead of relying in lower layers. But yes, your project sounds more like an encrypted volume ("vault" as you call it) and the current code is not prepared for it. I would be interested in taking a look at your encrypting implementation (if you can share that part of your code).

@rob-deutsch
Copy link
Contributor Author

@schomatis, do you mean my decryption implementation (built into go-ipld-format) or how I plan to actually encrypt blocks?

Also, is the plan to keep IPFS using ProtoNode and RawNode, or is it planned to move everything to cbor nodes?

@schomatis
Copy link
Contributor

@schomatis, do you mean my decryption implementation (built into go-ipld-format) or how I plan to actually encrypt blocks?

both, I used the encrypt term to actually mean encrypt/decrypt.

Also, is the plan to keep IPFS using ProtoNode and RawNode, or is it planned to move everything to cbor nodes?

I think those nodes won't be deprecated, but I can't say for sure.

@rob-deutsch
Copy link
Contributor Author

rob-deutsch commented Sep 24, 2018

My POC is available here: rob-deutsch/go-merkledag/tree/poc/decrypt.

It's not too much code, so its all in a single commit.

I can give the following summary:

  1. Previously, dagService.Get() called ipld.Decode() directly to turn a block into into an ipld.Node(). I've modified dagService.Get() so that it first checks if its an encrypted block (multicodec 0x1337 for testing purposes), and decrypts the block if required before its passed to ipld.Decode().

  2. The decryption function func DecryptBlock(rawData []byte, repo keyStore) (multicodec uint64, plaintext []byte, err error) was added in a file called aes.go. The expectation is that the block's CIDv1 will have the 0x1337 multicodec, so the first part of the decrypted block is the multicodec of the decrypted block (e.g. dag-pb or raw)

  3. All encrypted blocks are prepended with an SHA-256 hash of the encryption key. This is basically a fingerprint that we can use to determine if we have the encryption key. go-ipfs actually passes go-merkledag an interface built ontop of of the Repo which implements:

type keyStore interface {
	GetByHash(mh.Multihash) ([]byte, error)
}

@schomatis schomatis added the status/deferred Conscious decision to pause or backlog label Dec 13, 2018
@schomatis
Copy link
Contributor

Moving to the backlog, I don't think there's anything we can do here at the moment.

@Stebalien Stebalien removed the status/deferred Conscious decision to pause or backlog label Dec 18, 2018
@momack2 momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants