-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encryption layer for IPLD #64
Comments
I’ve thought about this a bit and a few quick things to note:
So, an encryption program would do something like:
In the short term, this would be something like: { type: ‘encrypted’,
crypto: { toPublicKey, fromPublicKey, algorithm, settings },
links: [ CID ], /* optional, some blocks will not contain links */
data: Buffer /* original block data after encryption */
} This is just a sketch, there’s probably something a bit more elegant we can do with the schema stuff @warpfork has done. But the place I’d like to get in the future once we can take advantage of WebAssembly is something like this: { crypto: [ CID /* link to the WebAssembly program */, [ toPublicKey, fromPublicKey ]],
data: Buffer,
links: [] /* optional */
} |
That is actually the goal. I want two layers of encryption / access:
This way you could elect specific replicator to replicate data without accessing data itself without having to build second graph of data blocks.
I can see only one advantage of doing it this way - which is it would not reveal order in which data blocks were added but I'm not sure that in itself provides enough benefit to deal with the fact that it would require syncing graph with all the list.
👍 That sounds great!
What about the keys needed to do actual decryption ? |
Never mind. In my head I was still thinking of two layers of encryption which is not what you're suggesting so this is probably irrelevant. |
I'm also realizing here that I'm biased towards the use case I've being thinking of - that is linked data feed, which is more of linked list than a tree, which is why I'm not concerned with a link names because they always just point to the tail of the list. If you do consider graph then concealing link names start to matter. |
Yup. Also, keep in mind that node decryption is atomic, the decryptor is only ever concerned with a single block. This means that, even with a single layer of encryption, the links (both plain text and encrypted) are the same and they link to, presumably, blocks that are also encrypted, but the traverser doesn’t even know they are encrypted until it hits the next block. In other words, there will be no references in either encrypted or unencrypted data to the original unencrypted CID’s. The only thing this method allows someone to see without decryption keys is the shape of the graph. With enough modeling you could actually start to make assertions about the data just from the shape. However, this is easily overcome if we continue to do everything in IPLD in a block agnostic way (using only paths and selectors) because an encryption program could take a graph and produce an new graph at the block layer with identical graph information as far as IPLD paths and selectors are concerned, effectively obfuscating the shape of the data from the shape of the visible graph to replicators. |
I made some progress on my encrypted data feeds that attempted to incorporate suggestions made here. There are few things I learned in the process that I would like to share / get feedback on:
/cc @vmx |
@Gozala the double encryption here is so that the data is completely obscured to the public but a replicator can access the links that need to be replicated but can't access the unencrypted data, right? |
Exactly!
Signatures allow consumers to verify that feed is updated by an author (owner of the feed private key) and that feed is linear (does not fork). It is important in the context where feed represents OPs of the CRDT (which is how I indent to use it with https://github.com/automerge/hypermerge). |
Few more thoughts:
In the example below one could access last message of the feed through a following path:
// Assume promise based API instead of callback base one
const Seretbox = {
multicodec:"dag-secretbox",
util: {
async serialize({message, nonce, key}) {
return nacl.secretbox(message, nonce, key)
},
async deserialize(box, [nonce, key]) {
return {
message: nacl.secretbox.open(box, nonce, key),
nonce,
key
}
}
async cid(node, options) {
const hashAlg = options.hashAlg || resolver.defaultHashAlg
const version = typeof options.version === 'undefined' ? 1 : options.version
const box = await Seretbox.util.serialize(node)
const hash = await multihashing(box, hashAlg)
return new CID(version, Seretbox.multicodec, hash)
}
},
resolver: {
async resolve(blob, path) {
const [root, ...params] = path.split("/")
switch (params.length) {
case 0:
return ["/nonce?/key?"]
case 1:
return ["/key?"]
case 2:
return {
value: await Seretbox.deserialize(blob, [nonce, key]),
remainderPath: ""
}
default:
throw new Error('path out of scope')
}
}
async tree(blob) {
return ["/nonce?/key?"]
}
}
}
ipld.support.add(Seretbox.multicodec, Seretbox.resolver, Seretbox.util)
const publish = async (feed, data) => {
// dag.inline encodes node with a given coder and prefixes it with codec info
const inlineMessage = await dag.inline({
previous: feed.head,
size: feed.size + 1,
content: content
}, "dag-cbor")
const message = await dag.put({
nonce:feed.subscriber.nonce,
key: feed.subscriber.secretKey,
message: inlineMessage
}, "dag-secretbox")
const inlineBlock = await dag.inline({
links:[feed.headCID, message],
message
}, "dag-cbor")
const block = await dag.put({
nonce:feed.replicator.nonce,
key:feed.replicator.secretKey,
message: inlineBlock
}, "dag-secretbox")
const signature = feed.author.sign(secretBlock)
const head = await dag.put({ block, signature }, "dag-cbor")
return {...feed, head, size: feed.size + 1 }
}
const last = async (feed, n) => {
const replecator = `${feed.replicator.key}/${feed.replicator.secretKey}`
const subscriber = `${feed.subscriber.key}/${feed.subscriber.secretKey}`
const path = `/block/${replicator}/message/${subscriber}/content`
return await dag.get(feed.head, path)
} |
Nice, it might make sense to step back from the current selector conceptualisation and use the IR-style that's developing @ ipld/specs#95. It's got enough expressiveness to build in the kinds of parameters needed to transparently traverse encrypted blocks, including IVs/nonces and whatever else might be needed for a given encryption scheme.
Traversal involving encryption boxed blocks would just skip through them transparently. Whether or not there is a need to have a human-readable form of this and what that would look like could be deferred till later. |
Have not had a chance to look at the spec yet, but generally you can’t always defer humane-readablity as without that as a design constraint you may end up with a solution that doesn’t necessarily permit it or feels like a clunky afterthought. I’ll read through spec when I get a chance and provide more constructive feedback afterwards. |
I would argue that you do still want to encode the data with a specific codec. You want to put enough information in the block that a decryption program can figure out what key it needs to decrypt it. There isn’t enough information in the CID to do this. One of the principals in IPLD is to be “self describing.” By this, we mean that data should carry all the information necessary to interpret it without outside knowledge. If you had a block without a codec, effectively a Let me try saying this another way, in terms of layers. The “Block” is basically the lowest layer in the stack. It’s just a chunk of binary data, a matching hash, and a reference to a codec in order to interpret it. It’s important to note that even at the lowest layer we’ve encoded enough information in the Block to interpret it up to a point. If there is more information we need in order to further interpret the data then it should live in that decoded data. As you go a layer up the stack, for this encryption case I’d say we should just move directly to the IPLD Data Model, we have a set of types we support when decoding the block using a given codec. This is where I would implement encryption, and this is also where I think you need to make sure that enough information is encoded in plain text to know:
From there, you can build a self-describing encryption format on the IPLD Data Model rather than at the Block layer. |
@mikeal I think you may be misunderstanding what I was trying to say in quoted message. I do agree on the proposed layering. And agree that dag-secretbox should encode info it needs to decode the message. What I think you're missing from my message is following: There will be codecs that are more of a "transcoders" if you will. It takes data in some format say encoded in "dag-cbor" and encrypts it. The problem is there is no standard way to pass in encoded data without loosing information about the format. Sure you can do it off the band meaning my "dag-secretbox" may take node blocks like |
Why not just require that it be the same decoder? The reason the CID has all this information is so that you can link from one block to another and know how to interpret it. If the data is already in the block then just require it be using the same encoder, it’s not as though you’re pointing to an external reference. If a block is encoded in let container = {
_encryption: { nonce, publicKey, algo }
_data: encrypt(dagCbor.encode({ foo: “i’m secret encrypted data” }, nonce, algo, privateKey))
}
let buffer = dagCbor.encode(container)
let block = new Block.from(buffer, ‘dag-cbor’) // or something, we are still debating this API An implementation of a path traversal would have code in it that looked like this const decryptNode = async (node, format) => {
let decrypt = findDecryptor(node._encryption.algo)
let key = findPrivateKey(node._encryption.publicKey)
let decode = findDecode(format)
return decode(await decrypt(node._data, key, node._encryption.nonce))
}
const resolve = async (path, block) {
if (!Array.isArray(path)) path = path.split(‘/‘).filter(x => x)
let node = await block.decode() // still discussing this API, but the more I look at it the more i like it
if (node._encryption) {
node = decryptNode(node, block.format)
}
let p = path.shift()
while (path.length) {
if (node[p] === undefined) throw new Error(‘Not Found’)
node = node[p]
if (CID.isCID(node)) return {value: node, remaining: path.join(‘/‘)}
}
return {value: node}
} |
Because then I need dag-cbor-secretbox, dag-pb-secretbox, etc... |
Ok so you're creating a requirement that wrapper was created encoded in the same encoding as data that was encrypted. You could do that but I think that is a bad requirement to have what if message at hand is git object or something even more exotic it seems strange to force wrapper to have same encoding. |
It's not that it's not doable, I'm already doing it by using multicodec and prefixing encoded bytes before encryption (which also hides format that your proposed solution doesn't) and on decode I find corresponding decoder to decode decrypted bytes. However that introduces incidental complexity - that is dag-secretbox needs to know the format of the message, hence my argument it would be better if it did not have to. Which would be trivial to do by allowing inline links and all the codecs will become free of that concern. Additional benefit would be it would allow freedom of data layout in the block, so you could actually represent things like this in IPLD block Where messages can be in arbitrary format. |
@mikeal also worth mentioning that your proposed solution works with one layer of encryption, but what if you have multiple layers that you have a problem. |
Why? They are just normal dag nodes with the “secretbox” information encoded into them. My point is, anything that is a valid We only need to know special information about the encrypted payload when we read the data in block, and that happens at a layer above the Block level. We can modify the Selector and Path specifications to be aware of information we encode at the data model layer. We already have to do this for I think this is hard to see right now because of the current state of IPLD. We have a lot of working code at the Block level and for very basic path resolution but we’ve just built a basic selector engine and haven’t implemented any of the dynamic support for collections I’m mentioning above, this is all just planned. So, I can see why you’d want to do this at the Block layer in order to get something working in the short term.
ok, then: while (node._encryption) {
node = await decryptNode(node, block.format)
} |
True, I guess this exposes a flaw in our mental model when it comes to supporting content addressed data that doesn’t support the Data Model. We have been assuming that when linking to systems that already exist we would have to use a reference that is publicly available in order to potentially do content discovery in another system. It hadn’t really occurred to me that you would take data from another system, encrypt it in an IPLD system and then move that data around in the IPLD system. It also doesn’t help that most of our use cases for this have been blockchains where any encryption of the underlying data is already done underneath the data we’re getting a reference to. Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID? |
Also, if the solution to this ends up being “we encrypt another CID for the encrypted block” then we need to rope in an encryption expert because some of the bytes are going to be rather predictable. |
There is more detailed elaboration on details but here is summary: I want to provide a generic secure message feed library, meaning application code decides what the messages (and the corresponding format for those are). Further more feed attempts to have several layers of access:
To accomplish this there are multiple layers of encryption:
Note that at the feed implementation layer I do not want to know what the messages are or what the format is, I just want them to be Furthermore it implement another codec like SSB private-box so that message in the feed can be directed at specific friends (meaning arbitrary followers can't read them, or know who they are for or how many recipients that message have - image in previous comment is visualization of that). Also worth noting that private-box message should ideally also be in arbitrary format. This all works out really nicely with idea of "inline-links" because you preserve same linked data doesn't need to be stored in separate block, but rather get's inlined into the target block - that is format+encodedbytes are added. |
Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID? Not sure if I fully understand this but assuming I do that is what the feed abstraction does (Textile does the same thing BTW) CIDs to the encrypted blocks are concealed to the topmost layer so adversary can't traverse the graph. |
So, the format is not encrypted? Or at least, not encrypted at layer these encodedBytes are stored, but it may be inside another encrypted container. |
It is, this is exactly what I'm doing today: https://github.com/Gozala/ipdf/blob/499fce4b048bb6a5d39a2060bd27792dab496e74/src/feed.js#L242-L256 Having to know the format, encoding, prefixing is all incidental complexity. Ideally there would be something like |
It is worth mentioning that if in the above case |
It is also worth pointing out that this would enable not only encrypting single message in a single format but say multiple messages in different formats (just like you can link to multilpe blocks encoded in different formats) transparently and without introducing further complexity. Without inline links you'd have to encrypt individual message and then pack them together from the outside, however that's not great because you'll end up either revealing number of messages or will have to encrypt yet again, not to mention that would constrain structure of your nodes. Inline links address all that in way that fit's natural (at least to me) to the existing IPLD model. |
I get that, I think the thing I didn’t quite understand until today was that the format may be different. We’ve actually been working hard to remove the distinction between a link and an inline value as far as reads go. Specs like |
If it doesn’t have a CID, and the entire thing exists inside another block, I don’t think we should call it a “link.” I’m not even sure if “inline Block” is the right term, it makes sense to me now, but I worry about confusing new developers. We can bike shed the terminology later, I think I understand the use case enough now. I’m going to think on this a little more and then write up a larger new issue that can hopefully cover all the places this touches. The impetus for a lot of this seems to be leveraging the same multicodec parsing engine, which makes me really wish WebAssembly was a little farther along. If we could implement the decoder in WebAssembly then we could just reference it directly by a link rather than a multicodec reference. That would expand this out of the “inline Block” metaphor, because we wouldn’t have to leverage the same decoding engine and it would become a much more robust parameterized envelope. |
I might be missing context here but it appears to me that what I'm suggesting is aligned with that, in fact I also do want to be able to remove distinction between linked blocks and nested blocks as well and have freedom to choose how blocks are arranged in memory (single blob vs many linked blobs). It's just your use case graph seems homomorphic while mine is polymorphic. It appears to me that we share the same goal & are just stack on the metaphors we use to describe it. |
@Gozala I didn’t mean to suggest these were out of alignment, I was just iterating through my own process in understanding this use case and I was less inclined early on to extend the Block concept to it, but it all makes sense now. |
Hi. What is the latest on this? |
@matheus23 mind if we resume the convo here?
Personally, I've been thinking about this from the perspective of IPLD ADLs. One could use a node builder to construct an encrypted DAG, then use something like an IPLD URL with the decryption key in it, or using the new Tagged Pointers spec once it comes out. |
I'm not familiar with node builders. Maybe I can put this more generally: I haven't worked with go-ipld-prime at all so far. I guess I'm missing out on a bunch of IPLD ideas because of that. My practical IPLD experience is based on working with JS and rust libraries mostly.
I don't really like that idea. It would be bad if it's a query param, since then the decryption key would be sent to gateways. This concern is invalid if you're running your own gateway locally, but - as a pattern - I think it can be harmful. If it's possible, people will send their decryption keys to gateways that wouldn't actually want it. On the other hand, if you don't send the decryption key to the gateway, how is your data decrypted? Well, ideally in the frontend. For that, the server would need to serve some HTML with some script that automatically decrypts the block you're looking at and knows how to look for further blocks & how to piece them together. This would need good IPLD (unixfs?) libraries for browsers. |
What if we encrypt the IPLD data in each node with a different symmetric key, and keep a side tree with the encryption keys linked to each node? Another approach might be similar to the same thing that crypto tree is using for IPFS files encryption (WNFS): https://whitepaper.fission.codes/file-system/partitions/private-directories/concepts/cryptree |
That's pretty much what https://peergos.org/ have been doing with their capabilities. 😁 Their protocol diverges from regular IPFS/IPLD a bit however. |
This is somewhat relates to #63 as it could be an alternative or one could be enabled by the other. At the moment with IPLD all the links are public even if content it links to isn't. However as I pointed out in #63 case could be made that one might want make conceal links and make them only available to selected participants (with whom corresponding keys were shared).
I think it is important to consider this in relation to GraphSync and IPLD Selectors as it would be a shame if peers participating in exchange that happen to have shared key for concealed links were required to do multiple round-trips for data exchange that would defeat the benefit of GraphSync.
The text was updated successfully, but these errors were encountered: