Research Challenges

Stephen Oliver edited this page Aug 12, 2016 · 1 revision
Clone this wiki locally

Stuff that we may need some help from competent theorists on. See also Papers.

Table of Contents

Research challenges

Swapping

Swapping is the location changing algorithm. It is used to build optimal routes in a restricted topology.

Without a good, working, fast, secure, reliable swapping algorithm, Darknet is unlikely to work well, and as the security page and others explain, without darknet Freenet's security and long term viability are doubtful. Also, if good results can be achieved they may have wider applications beyond Freenet.

Swapping security

Swapping currently is vulnerable to the Pitch Black attack. Basically bogus swap requests can be used to drag nodes' locations towards specific points in the keyspace, causing routing to degenerate. A form of this was happening naturally but that part has been resolved by some random resetting. We have a solution for this, which has been simulated by thesnark (Michael Grube). We are still discussing the details, and waiting for more simulations, but it seems to work. See this bug and the devl thread Pitch Black Attack - Analysis, Code, Etc.".

Swapping speed

The time swapping takes to settle probably does not scale well with increasing network size. Some of the papers may have looked into this. Vilhelm Verendel's work proposes some ways to improve on this. These appear to need further work.

Real world topologies, Sybil, etc

It is an open question whether a small world topology can actually be achieved for a real world darknet. This is a techno-social issue, we simply don't know.

It is planned to connect to and route through friend-of-a-friend connections. This should substantially improve performance on pure darknet. But this has not been studied in detail. However there is published work saying taking FOAF locations into account in greedy routing helps with performance, so hopefully it will be a gain. It'd be nice to be sure.

What happens when two networks, growing organically, meet one another? Some work has been done on this, possibly by Vilhelm Verendel. It looks like it isn't catastrophic for routing performance, and the two networks will gradually merge as they get more links. But presumably there is a lot of location movement resulting in nodes changing their locations and losing existing data? This likely needs further work.

The other side of the coin is classic darknet Sybil: An attacker can get a handful of connections, but can create any number of nodes behind those connections. Can he thus control a large part of the keyspace? The above work suggests probably not, but again this needs looking into in detail.

Alternative approaches

X-Vine is a proposed means to route on darknet. Its performance is superior in some ways to Freenet's; it guarantees that a route can be found, for example, and doesn't take time for swapping to settle. It assumes that there is a single growing darknet; we probably deal better with multiple darknets merging. Also its routes are not necessarily shorter; on an ideal network, we can achieve very short paths according to Oskar's work. Toad posted an analysis here: http://permalink.gmane.org/gmane.network.freenet.devel/29157

General security

General assumptions

Darknet: Pure friend-to-friend topology. Connections only between friends, and possibly to friend-of-a-friend's. Hard to identify. Locations are swapped, see above.

Opennet: Similar to a DHT. Connections are set up by the network (on a successful request, the least recently useful node is dropped and replaced with a node close to the data source). Locations are fixed.

Both produce a small world topology, and are currently a single network for routing purposes.

The best attack we know of at the moment, the Mobile Attacker Source Tracing attack, involves an attacker recognizing keys and using their location and the fact that the request reached his key to determine roughly where the originator is on the keyspace. This is fairly powerful. On Opennet, it is extremely dangerous because they can simply announce (bootstrap) to that location. However it has not been quantified or implemented yet.

Files are split into CHKs (32KB each, key = hash), which use random encryption keys and are not predictable in advance, and SSKs, which *are* predictable in advance (key ~= pubkey + document name/version). Hence when uploading a file, the attacker cannot move towards the originator until the top block (an SSK) has been uploaded. Ideally we'd like to be able to reinsert CHKs safely, and in any case SSKs are used for forums and so on. So really we need some sort of tunnels...

Generally uploaders of content are assumed to be more valuable than downloaders. Most of the above applies only to uploaders. Downloaders are much easier to trace since all the keys can be known in advance!

More expensive attacks include e.g. connecting to every node on Opennet (sadly this is likely to remain cheap enough to be a plausible attack). Currently this would allow tracing most uploaders *and* downloaders. Obviously on Darknet it's impossible.

Tunnels

We can tunnel only the predictable keys to provide a very low-cost defence against MAST, even on opennet. To protect against more powerful attackers we would need to tunnel all keys uploaded, or even downloaded; this could be configurable. Most Freenet traffic is long-term downloads/uploads so Mixminion-style tunnels may be possible, but backup routes would be needed due to node uptimes. The real problem is peer selection however...

Our needs for tunnels are different for Opennet vs for Darknet:

On darknet, we can use the social topology to choose nodes with reasonable safety. We will probably implement something like Pisces, although it will need some tweaking given we probably won't implement X-Vine. Note also that darknet tunnels will probably have many hops (5-15), just like on Pisces, even though we can use friend-of-a-friend connections in tunnels. This is partly because we cannot safely create new connections on darknet: Darknet is a fixed topology, this is important for unblockability. See above re Sybil on darknet.

On opennet, Freenet is much more like a classic DHT, and it appears that secure peer selection on a DHT is a rather difficult problem, although one that has had considerable attention given to it. On the other hand we could create new connections if we need them. Sybil is probably intractable.

Other proposals:

  • Routing between three random nodes chosen from the network as a whole. This would cost a lot of performance, but if it can be used only for the predictable inserts mentioned above, could be affordable. However, peer selection may be difficult.
  • Tunnels constructed by the intersection of several partially-randomly-routed "anchor" requests, each carrying part of a shared secret. (There is published work on this but I don't have the reference).
  • Short, local tunnels based on known local topology ("premix routing"). This has been discussed extensively and is probably intractable, but in any case would be expensive and only provide protection against *nearby* attackers.

Can opennet be secured?

In opennet Freenet nodes connect to strangers: Nodes which aren’t checked manually.

Several senior people believe opennet is fundamentally insecure and cannot be made secure. This may or may not be true. The main known weaknesses are:

  • If an attacker can connect directly to a node he can get a pretty good idea what it is doing. Mixnets can help with this, but even with mixnets Sybil is far too easy.
  • The bootstrapping process for opennet ("announcement") allows an attacker to get connected quickly to any location. This is useful in e.g. the mobile attacker source tracing attack described above. Solving this requires solving Sybil...
  • Basically it all comes down to Sybil: It's likely to be relatively cheap to create thousands of nodes. There isn't much we can do about this: Even a complex semi-centralised solution to prevent announcing many nodes on one IP address would only provide limited protection. Hashcash and CAPTCHAs are both ineffective and too costly for normal users. Autonomous System Numbers are an interesting option: We can limit the number of peers of any node from a single AS, or we can throttle the number of announcements overall from a single AS, on the theory that an attacker would hire a data centre connected to a single ISP, but this would require considerable manual supervision and probably have to be fairly centralised.
  • Announcement itself, and the pre-bootstrapping process ("seed nodes"), can also be DoS'ed. Seed nodes can also be malicious, in which case they may be able to "capture" newly announced nodes, but there are at least solutions to this (e.g. require that peers be visible from more than one seednode).

Non-tunneling options

One proposal was a blind/reveal mechanism whereby encrypted data is inserted (distributed across the keyspace), and then a separate mechanism inserts the decryption keys for the data (to the same set of locations), causing the data to be decrypted and reinserted by the nodes that received the first encrypted inserts. Back of an envelope math suggests that this might achieve a reasonable degree of security, albeit at a considerable cost. The main advantage is that the "reveal" inserts can be very small, so it may be possible to protect them very thoroughly with tunnels or even more expensive options such as Xor trees etc. Whereas any other use of tunnels, other than just for the top block, will have the basic performance/security tradeoff of how many tunnels to use for a single upload, and the fact that nodes go offline so we constantly need to set up new tunnels.

Currently data is not cached for ~ 2 hops from the originator on an insert, for security reasons. We will route randomly during these few hops; we could even do some cheap tunneling, e.g. based on the random-rendezvous mentioned above, with no performance cost.

How unblockable can Freenet actually be?

Even on darknet, there are ways to block Freenet, for instance, traffic flow analysis, blocking all peer to peer traffic by IP address, or some forms of packet inspection. How realistic is this threat? This also has some bearing on the question above about mixnets. It is also related to the delay tolerant networking stuff below - most good steganographic transports would likely have latency or uptime issues.

Delay tolerant networking, passive requests, and subscriptions

Publish/subscribe and passive requests

Passive requests, or publish/subscribe, have been planned for a long time. Passive requests are simply pub/sub on a single key; pub/sub would be a more general stream. These have not been simulated, and there may be difficulties with load management. However, they could substantially improve performance for e.g. forums, as well as possibly allowing for new applications.

Long term requests: Delay tolerant networking

The main motivations for some DTN-style functionality are:

  • Darknets will likely have uptime issues. Your friends are not necessarily all online at the same time as you, and there may not, in the worst case, be a route from you to the node storing the data you want at the specific instant when your node requests it. It would be best if the request could be accepted by a node, forwarded when possible, and then the data returned later. Big downloads are queued anyway due to speed, and this would also work well with publish/subscribe, so this would fit acceptably with Freenet's UI.
  • Steganographic transports may not have 100% uptime or millisecond latency. And in fact, even in areas that are not all that hostile, exchanging data close-up when you meet up with your friends might be interesting (e.g. using a smartphone with some fast local wireless connection, or exchanging USB sticks); think Haggle, but with darknet (friend to friend), rather than with local broadcasts.
How much of this can we do? How much is it useful to do within Freenet's darknet security model?

Client applications

How to make a scalable but spam proof forums tool!

Freetalk, the WebOfTrust back-end, and FMS, do not scale completely. Passive requests will help; their early incarnations ("ultra-lightweight passive requests") already help; numerous tweaks and optimisations will help (e.g. changing polling frequencies depending on usage etc). But fundamentally they rely on polling other users' outboxes, which will not scale better than linearly, maybe attuned if the frequency of interesting posts follows a power-law distribution. Is it possible to improve on this? Do we need new semantics? Is the web of trust a viable solution or do we need something else? Can we divide up the work somehow? Even if we can't make it scale better than O(n), can we make it viable for plausible sized forum systems?

Re new semantics, a linear but significant optimisation has been proposed using programmable keys, which allow many users to share a single sequence of keys, while still having to sign the message with their own key (which has been signed by the moderator), so that the moderator (or higher level democratic mechanisms built on this) can still get rid of spammers by changing his key and re-signing the non-spammers.

We should be able to poll only the users of a specific forum, rather than all users of the tool, provided users can announce easily. Announcement is a bit of a problem; currently we use CAPTCHAs to announce to one user, and then rely on them propagating the newbies, this is one of the reasons we end up polling everyone. Adapting this so that CAPTCHA solutions are visible (after a short delay) to all users is possible. And see next section.

Basically this boils down to scalable, spam-proof, distributed, anonymous keyword search. Which is probably intractable in principle, but there may be useful approximations. That's also fundamental to things like searching for content on Freenet.

What can replace CAPTCHAs?

This is a problem for the whole internet. The particular difficulty with Freenet is we rely *ONLY* on CAPTCHAs, for introducing new identities; we can't block IP ranges.

Ideally we would like some sort of "scarce keys". Something that any given node can only insert every X period, depending on how many connections they have. This could replace CAPTCHAs. There has been some work on this.

How to make sufficiently-scalable real-time chat

Freenet can in theory provide fairly low latency, and real time chat (despite some security tradeoffs) is potentially very popular. But preventing spam without having to poll way too many queues could be difficult.

Social stuff

On the other hand, models like Twitter can be implemented fairly easily; you don't follow everyone, you only follow people you follow (Sone supports this but in practice everyone follows everyone at the moment, but because it's a small community). However, real-time searching is a key part of Twitter in practice...

Distributed searching

Current Freenet search facilities are fairly poor - that is, they are relatively slow (but there is reason to think this can be improved), they rely on a central index maintainer spidering the freesite web, they have relatively poor ranking support etc. Distributed search could make a lot of difference, but fully distributed approaches proposed so far cannot deal with spam effectively.

Sandboxing or verifying plugins and javascript

Freenet has unique requirements for sandboxing. For instance, being able to time requests can tell you where you are on the network *AND* what the user has recently downloaded. It may however be possible to build a VM or restricted API that is useful, possibly with some measures to limit this.

Alternatively, distributed source management, as a form of code review as well as collaborative development, with multiple signoffs etc, integrated with deployment, might really shine.

General performance

Load management

Load management has been a bane of Freenet for a very long time. Some progress has been made recently, but it largely hasn't been simulated, except for the most basic mathematical modelling by ArneBab.

Time/cost-sensitive routing

Some theoretical work by Oskar and others suggests that taking expected performance into account as well as the node's location (greedy routing) can work. A form of this was implemented in 0.5, but it didn't work well (largely due to not being thought through properly). Also, it may be useful sometimes to exhaust the local network first if there are few/slow/expensive links to the wider world.

General performance

Many more possibilities!

Category:Theory