Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Censorship resistance on IPFS #281

Open
dmbb opened this issue Feb 27, 2018 · 19 comments
Open

Censorship resistance on IPFS #281

dmbb opened this issue Feb 27, 2018 · 19 comments

Comments

@dmbb
Copy link

dmbb commented Feb 27, 2018

Hey everyone!

I'm Diogo and I'm pursuing a PhD in Instituto Superior Técnico, Universidade de Lisboa. From my MSc. onwards, I've been dwelling with the topic of Internet censorship circumvention. Particularly, I've explored ways to transmit censored data by piggybacking it on top of multimedia protocols which a censor may refrain to block due to social/economical reasons.

I take interest in IPFS as it allows data to be replicated around the network, difficulting the task of censors to block a given piece of information. In specific, it got to my attention that IPFS was sucessfully used during the referedum in Catalonia in order to prevent the spanish government from blocking voting-related information from citizens.

Although users were able to browse data in an uncensored way in the above episode, in my understanding, there are other challenges facing the adoption of IPFS for censorship-resistance purposes. For instance, IPFS's bootstrapping process is either tied to a set of well know nodes which could be blocked by a knowledgeable censor, or to the use of a peer discovery protocol which may be identified (and further blocked) by a censor's traffic analysis techniques.

Indeed, such an issue is also true for other overlay networks such as Tor. I'm opening this issue to get to know your opinion on the major research challenges IPFS faces in order to provide Internet censorship resistance capabilities. Are these challenges similar to the ones faced by Tor? Are there any disparate design decisions which sprout fundamentally different approaches?

Thanks for building such an amazing project. I thank you in advance for any comments you may have about directions for fighting censorship with IPFS.

@Stebalien
Copy link
Member

Stebalien commented Mar 3, 2018

A lot of the challenges concerning censorship will look a lot like the ones faced by Tor. However, we have some advantages and disadvantages.

Advantages:

  • Not a one-trick pony. That is, it's not only for privacy/anti-censorship. If IPFS can gain a critical mass, it'll be hard to block it entirely without economic repercussions (especially if people start doing software distribution over IPFS).
  • The ability to bootstrap off of other nodes on the same LAN using local discovery. Currently, two nodes on the same LAN can (well, should, this area needs a bit of love) find each other and connect even if one or both can't connect to the rest of the network. IPFS can even operate in entirely isolated networks this way.

Disadvantages:

  • No privacy/anonymity. This is a pretty big issue where censorship is concerned.
  • Readily enumerable. In Tor, it's possible to enumerate all the relays. In IPFS, it's possible to enumerate all known nodes (without much difficulty). Unfortunately, IPFS needs this to function properly.

However, we're always working on improving IPFS. We're working on:

  • A tor transport (eventually). However, we want to be very sure that we're doing everything right before we finish/release this feature. Unfortunately, this will negate the two advantages we have over Tor.
  • QUIC support with TLS 1.3. Due to how TLS 1.3/QUIC work, traffic between IPFS nodes using this transport should be hard to distinguish from HTTPs traffic (although we may need to randomize the port to make this an effective anti-censorship measure). Also, unfortunately, the IPFS node will still respond as an IPFS node so this only really helps prevent passive attackers.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 4, 2018

For instance, IPFS's bootstrapping process is either tied to a set of well know nodes which could be blocked by a knowledgeable censor, or to the use of a peer discovery protocol which may be identified (and further blocked) by a censor's traffic analysis techniques.

Bootstrapping process also be significantly improved by using previously known nodes for bootstrapping. The primary risk there is that at least one of the nodes used for bootstrap needs to be well behaved (and give you access to the rest of the network through DHT based discovery).

IPFS has also great properties in case of hybrid sneakernets. Use Sneakernets to get data into the network (imagine campus network) and then use ipfs internally to access, duplicate and generally spread the data (and local bootstrap peers if local discovery doesn't work).

Disadvantages mentioned by @Stebalien still apply.

@dmbb
Copy link
Author

dmbb commented Mar 5, 2018

Thank you so much for your answers. In fact, and since you have mentioned it, transport layer is another thing I'm intrigued about. What exactly does IPFS traffic looks like in the network, at the moment? Is there a default transport for IPFS connections?

QUIC support with TLS 1.3. Due to how TLS 1.3/QUIC work, traffic between IPFS nodes using this transport should be hard to distinguish from HTTPs traffic (although we may need to randomize the port to make this an effective anti-censorship measure). Also, unfortunately, the IPFS node will still respond as an IPFS node so this only really helps prevent passive attackers.

@Stebalien even while using TLS 1.3/QUIC, I'm assuming no further effort is employed in obfuscating traffic patterns. Lets say a client wishes to download some file. How easy would it be for a passive adversary to fingerprint downloaded content when data is downloaded simultaneously from multiple peers? It looks this kind of analysis would be harder to perform in IPFS than in Tor, for instance.

Also, if TLS/QUIC gets to be deployed, wouldn't it make more sense to use some well-known port like 443 to prevent a censor from blindly blocking TLS-alike traffic in uncommon ports? What's the rationale for using this transport over random ports?

@okdistribute
Copy link

okdistribute commented Mar 5, 2018

Where's the evidence that it was used in Catalonia? From what I understood, domains were blocked, so they had to move and rehost the website on a different domain almost every day.

@dmbb
Copy link
Author

dmbb commented Mar 5, 2018

@Karissa I found this discussion highlighted in HackerNews, concerning the following article. This was also discussed in Twitter.

If you happen to have more concrete information about this episode, I'd be happy to learn. I think websites were also constantly rehosted in different domains besides being online through IPFS.

@ghost
Copy link

ghost commented Mar 5, 2018

@Karissa They built the official voting info website in a completely static fashion, which made it easy to distribute it with p2p technologies. They then had the president tweet out the URL after the canonical website was blocked: https://twitter.com/KRLS/status/911482634789953536

krls

That tweet is what got gateway.ipfs.io blocked on the next day, while funny enough ipfs.io continued to work.

This obviously still went through HTTP, but it's trivial to replace https://ipfs.io/ipfs with ipfs://, and ipfs-companion helps with it.

(Sorry about those ugly lines in the screenshot, that's my crappy screen grab tool ;)).

@okdistribute
Copy link

okdistribute commented Mar 5, 2018

Gotcha, I was there through the week of Oct 1 and no one seemed to be using the ipfs client. They had a huge whatsapp group and a new link would get sent out every time the old website got banned. I bet the ipfs link worked for one of those rounds though. Pretty cool.

@Stebalien
Copy link
Member

Stebalien commented Mar 5, 2018

Is there a default transport for IPFS connections?

Yes. We currently use a TLS-like protocol we call secio (but are working on switching to plain TLS). However, we have to negotiate the security protocol in the clear so IPFS connections are currently readily identifiable.

even while using TLS 1.3/QUIC, I'm assuming no further effort is employed in obfuscating traffic patterns. Lets say a client wishes to download some file. How easy would it be for a passive adversary to fingerprint downloaded content when data is downloaded simultaneously from multiple peers? It looks this kind of analysis would be harder to perform in IPFS than in Tor, for instance.

Unlike Tor, passive adversaries learn significantly less about what a user might be downloading because IPFS uses content addressing instead of location addressing. That means that where a user goes to download information is decoupled from what the user is downloading. However, it's still correlated.

Unfortunately, it's trivial for an active adversary to learn information like this. All they have to do is connect to a node and wait for it to ask it for a file.

Also, if TLS/QUIC gets to be deployed, wouldn't it make more sense to use some well-known port like 443 to prevent a censor from blindly blocking TLS-alike traffic in uncommon ports? What's the rationale for using this transport over random ports?

So, the problem with 443 is that it's a reserved port so users will have to run the daemon as root (not recommended). The next best thing is a random port (or, maybe, some common HTTP alternative port like 8080, 8181, 8888, etc.).

@mitra42
Copy link

mitra42 commented Mar 6, 2018

If the question is avoiding censorship rather than its companion (avoiding surveillance) then I'm more concerned about the single-point-of-failure issues. With the Catalonia example it was trivially easy to block https://ipfs.io which essentially meant that anyone without extreme tech skills couldn't access it.

I'm assuming that most people are going to be using unmodified browsers. We are building a version of our front-end to run in the browser (loadable from anywhere) BUT the connections are still single-point-of-failure, i.e. websocketstar which has to go direct to a known gateway server because that server currently has to be primed (e.g. via a HTTP HEAD call) to know about the file. This makes it easily blockable.

Its unclear to me whether there are fixes in the works for that problem (e.g. what I think is called websocket-relay?)

If I understand it correctly, part of the issue is that putting a DHT in the browser requires WebRTC, which crashes browsers when they open lots of conenctions. It would be great if Firefox & Chrome fixed that issue, but it doesn't sound like it. I've also been unable to get a clear answer as to whether WebRTC and the DHT built upon it could be tuned to open far fewer connections. IMHO even if a browser opened 10 connections it could create a workable DHT since there would still be a lot of well-connected (Go or NodeJS) nodes.

Is there anyone on this thread whose been thinking about those single-point-of-failure issues ?

@RangerMauve
Copy link

RangerMauve commented Mar 6, 2018

@mitra42 The websocket-star example is going to be fixed once libp2p/js-libp2p-websocket-star#43 lands.

Relay mode in general is going to fix issues with browsers not being able to participate in the network. But this will only work if there's a large amount of relay nodes out there and that it will be easy to discover them. (Would be nice if relay hop was on by default)

@RangerMauve
Copy link

RangerMauve commented Mar 6, 2018

Some more single points of failure are going to be the bootstrap nodes for the DHT. I don't think any IPFS clients cache healthy nodes for the DHT and currently rely on the bootstrap nodes and mdns to find peers.

Though I don't think that will be hard to fix.

@mitra42
Copy link

mitra42 commented Mar 7, 2018

That will be great when a browser can connect to any node (or any of a certain subset of nodes) and access IPFS docs uploaded to any other node, I think that's what most people expect (and I've seen a number of reported "bugs" which appear to be just not understanding that you can't access any IPFS file from anywhere). Once you have that, then it becomes easy to surface a large number of connection points and avoid single points of failure. I'm also betting we can set up ways to distribute lists of bootstrap nodes in applications an so on. DOesnt sound like we can do much till that patch lands.

@Illasera
Copy link

Illasera commented Jun 11, 2018

@Kubuxu :

Bootstrapping process also be significantly improved by using previously known nodes for bootstrapping. The primary risk there is that at least one of the nodes used for bootstrap needs to be well behaved (and give you access to the rest of the network through DHT based discovery).


That can lead to compromised nodes, Instead of hiding what the user is transferring, You can keep records of any IP accessing such data, marking everyone.


My suggestion is : Migrating hosts, An ownerless, Trustless system, Described as such :

1.)The owner of the content uploads a file, Once enough copies were made (How many are enough, we need to discuss , it will require an equation i highly suspect to be based on statistical analysis), The original owner will no longer stream such data, nor will he be aware that such data existed to begin with.

2.)New users are the owners of such data and they will become new hosts, giving copies to enough people (How many are enough in this case? Not too many , We can NOT have an exponential growth*,
*exponential growth can and will cause scaling issues and best to avoid.
The system must instill doubt that no current owner is the real owner).

3.)Repeat stage 2.

Its like a package that keeps changing hands all the time and nobody can know who owns it and where it came from, Since everyone owns it and yet nobody is the true owner.

Main issue : How do we enforce synchronization (Knowing when to migrate the hosts and making sure there are enough hosts to stream the data and ensure fast service of such stream)?
It must be trustless, automated method.

Sidenotes :
1.)In order for such method to work, when choosing new hosts, the choice method MUST HAVE a solid RNG element, TRUE (As close as we can get to) Randomness must be present.
2.)There can't be any method of caching when using such method.

Disadvantages :

  • Could cause clusters of data to appear.
  • Could cause data stream to become lost.
  • Unreliable.

@hadifarnoud
Copy link

hadifarnoud commented Aug 17, 2018

is it possible to access IPFS files with another domain? governments can block access to ipfs.io domain and therefore block the whole thing.

@mitra42
Copy link

mitra42 commented Aug 19, 2018

As far as I'm aware there are two cases
a) access to IPFS files with the IPFS protocol eg. from JS-IPFS running in an application on your browser
b) access to IPFS files thru an HTTP/IPFS gateway.

The IPFS protocol still has single points of failure/censorship, though it would be good to get an update on the status since some of those reported above may have been fixed.

In theory anyone could setup a HTTP/IPFS gateway and provide access to any files, and that might work in many circumstances. but ...
1: That gateway itself could be blocked if it became widely known, and if it isn't widely known then its hard to have effect
2: There were definitely problems we hit with entering files at one location and retrieving through a gateway elsewhere to do with that gateway being able to find them, some of those problems might have been fixed by now and some might be peculiar to the Archive's particular setup (using an early version of urlstore that wasn't announcing to the DHT and scaling issues in the DHT).

There is also interesting information on some of the links I'm seeing above,

@Stebalien
Copy link
Member

Stebalien commented Aug 22, 2018

The IPFS protocol still has single points of failure/censorship, though it would be good to get an update on the status since some of those reported above may have been fixed.

Currently, if you can't connect to any bootstrap nodes (or nodes on your local network advertised over mDNS), you won't be able to join the network. However, you can add custom bootstrap nodes. We're also working on persisting peer-store information which will allow us to try to connect to nodes we've seen previously.

access to IPFS files thru an HTTP/IPFS gateway.

We also have a gateway that uses a javascript service worker here: https://js.ipfs.io/ (scroll down). Once enabled, you'll be able to visit, e.g., https://js.ipfs.io/ipfs/QmYNQJoKGNHTpPxCBPh9KkDpaExgd2duMa3aF6ytMpHdao/index.html, and load it through js-ipfs.


We're also working with some people at Mozilla on better browser integration (see libdweb and ipfs-companion.

You can currently install the ipfs-companion and enable the "js-ipfs" internal node to use IPFS without installing any local applications and without relying on any public gateways. Once we can get the libdweb APIs merged into Firefox itself, you'll even be able to visit addresses like ipfs://... or ipns://ipfs.io.

@TheZ4ro
Copy link

TheZ4ro commented Nov 28, 2019

I think that although TLS 1.3 and QUIC can be confused as a standard access protocol, they still cannot avoid the outcome that can be easily distinguished by the latest AI data analysis. Whether it is DHT or TLS, it is not difficult to identify (especially data traffic feature identification). This part of the work has been mentioned in several recent patents from China. It turns out that the most effective way is not to perform reliable extreme encryption, but to obfuscate the data stream. In simple terms, we can easily distinguish masked people, and masked people are more likely to attract attention. But if you do a facelift, the likelihood of being noticed will be significantly reduced.I have tried disguising the data stream as a public and obvious video stream, especially with a standard decodable video before the session. In this case, basically all monitoring can be easily broken.

@bertrandfalguiere
Copy link

bertrandfalguiere commented Nov 28, 2019

Some ideas on the "let bootstrapping not be a single point of failure" front: ipfs/kubo#3908 (comment)

@jbshirk
Copy link

jbshirk commented Nov 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests