Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poking a Hole in the Wall: Efficient Censorship-Resistant Internet Communications by Parasitizing on WebRTC (CCS 2020) #55

Open
wkrp opened this issue Dec 4, 2020 · 6 comments
Labels
reading group summaries and discussions of research papers and other publications

Comments

@wkrp
Copy link
Member

wkrp commented Dec 4, 2020

Poking a Hole in the Wall: Efficient Censorship-Resistant Internet Communications by Parasitizing on WebRTC
Diogo Barradas, Nuno Santos, Luís Rodrigues, Vítor Nunes
https://censorbib.nymity.ch/#Barradas2020a
https://github.com/dmbb/Protozoa

This paper presents a censorship circumvention design called Protozoa. Protozoa belongs to the class of systems that are use what the authors call "multimedia covert streaming": disguising a channel to look like the transmission of an audio or video stream. Past such systems have either mimicked the surface-level features of an encrypted media stream (e.g. SkypeMorph), which gives rise to dead-parrot attacks; or they have encoded data into the audio/video signal in a way that survives media compression (e.g. CovertCast), which comes with a loss of efficiency and the challenge of matching packet size and timing features. The main innovation of Protozoa is that while it tunnels through a genuine video streaming application, it doesn't actually exchange properly encoded video streams. Instead, it takes an input video stream (such as the webcam video) as a carrier, scoops out its encoded video bitstream, and replaces it with covert data. The recipient extracts the covert data and throws away the video stream container. This is all done without modifying the sizes or timing of video stream packets, so the traffic characteristics of Protozoa are identical to hose of the carrier video. Overall encryption of the media stream prevents an observer from seeing that any traffic replacement has happened. The design, which the authors call "encoded media tunneling," allows for both higher performance and better resistance to traffic analysis. Encoded media tunneling in some ways resembles Slitheen, which also uses an independent carrier traffic generator and opportunistically replaces part of the traffic with covert data.

The authors build a prototype of the system using a version of Chromium that is modified to permit hooking the video transport layer and replacing the video bitstream. They do most of their testing with Whereby, a WebRTC video conferencing service. The client and proxy first share a meeting room identifier out of band. Both parties then enter the meeting room in the modified Chromium and start a meeting. Protozoa takes over the video stream and starts replacing content. Using an established service like Whereby has the advantage that most concerns about WebRTC fingerprinting do not apply: the WebRTC stack comes from a browser, and the browser automatically uses the service's own signaling servers and STUN/TURN servers. The authors build a data set of synthetic traffic and evaluate detectability using a machine learning classifier. Protozoa-tunnelled traffic is barely more detectable than random chance, which is expected, given how it works.

What makes Whereby a suitable media streaming service is that it establishes a direct peer-to-peer WebRTC connection between the two meeting participants, and both peers know to extract data from the video stream and not treat it like actual video. Protozoa would not work with services that intercept the media stream at a middlebox and attempt to decode it, as Discord is reported to do. Reliability and retransmission of data within the (potentially unreliable) media tunnel is handled by the OS kernels at both ends, as with a VPN. The system doesn't have any inherent defense against insider attack or proxy enumeration; as with other covert tunnels, you need to take care that the IP addresses of Protozoa proxies do not become known to censors.

Thanks to the authors for commenting on a draft of this summary.

@wkrp wkrp added the reading group summaries and discussions of research papers and other publications label Dec 4, 2020
@klzgrad
Copy link

klzgrad commented Dec 5, 2020

I agree with and commend the general idea of layering circumvention traffic above widely used software stack repurposed as the transport (or parasitization, not sure if this feels a bit negative). It is much easier and much more efficient to parrot a software stack by modifying a small part of it than reimplementing it.

Some comments that I can think of,

  • It is still parroting, and the risk of introducing detectable behavioral differences is larger the more code is patched on top of the original code base. Some code audit or minimization of patch size would be helpful in ensuring parrots are not introduced in practice. I'd be particularly focused on auditing uncommonly traversed code path and error handling behaviors, which can be artificially probed but are not well represented by statistical analysis.
  • The architecture here transports IP packets over WebRTC(TLS/TCP), which suffers from the well-known TCP-over-TCP issue and is worse off under packet loss. But I can understand this is easier to implement, otherwise a mux layer would be required.
  • The choice of mimicking WebRTC traffic puts an upper limit on the possible throughput of circumvention traffic as WebRTC is commonly configured for mid-to-low bitrate video streams. This is a limitation of this direction.
  • There is some room to improve usability by minimizing the modified WebRTC stack to a standalone program instead of a full browser instance.

@wkrp
Copy link
Member Author

wkrp commented Dec 7, 2020

  • The architecture here transports IP packets over WebRTC(TLS/TCP), which suffers from the well-known TCP-over-TCP issue and is worse off under packet loss. But I can understand this is easier to implement, otherwise a mux layer would be required.

I think you are mistaken on this point. WebRTC is always UDP—it has to be, in order for peer-to-peer connections through NAT to work. Media channels use SRTP. Data channels are SCTP internally, but encapsulated in DTLS, so still UDP. Protozoa uses media channels, which are unreliable, so it's not much different than sending encrypted datagrams with some extra overhead. The mux already exists: it's the Linux kernel at both ends handling reordering and retransmission of the IP datagrams tunneled through Protozoa.

@klzgrad
Copy link

klzgrad commented Dec 9, 2020

@wkrp I stand corrected. Thank you for pointing this out.

@yafeng-Soong
Copy link

I have question about the §5 (the evaluation section) in this paper. When testing the system’s ability to resist traffic analysis attacks, the authors only tested the case where covert data was YouTube video same as the carrier video. Then authors concluded that the system has a very good traffic analysis resistance. However, the difference between video streams and video streams is not obvious. So I think authors should also test the cases when the covert data is web pages, files, and emails, to make the conclusion more convincing.
I would be very glad if someone could answer my question. Thanks for helping!

@wkrp
Copy link
Member Author

wkrp commented May 28, 2021

When testing the system’s ability to resist traffic analysis attacks, the authors only tested the case where covert data was YouTube video same as the carrier video. Then authors concluded that the system has a very good traffic analysis resistance. However, the difference between video streams and video streams is not obvious. So I think authors should also test the cases when the covert data is web pages, files, and emails, to make the conclusion more convincing.

That's the beauty of the design: the type of the covert data does not matter. That's because the traffic characteristics of the carrier video are independent of the traffic characteristics of the covert data. The size and timing of the carrier's SRTP packets are never modified from what they would be naturally: the only modification is that EFBP payload data is replaced with encrypted covert data of the same size. If there is no covert data available to send at the moment, the payload is replaced with encrypted padding instead. (Section 4.2: "Protozoa will continuously stream video until the termination of the covert session, even when there is no covert traffic to be transmitted; in this case, dummy payload (chaff) is sent.")

You can compare to a similar design in Slitheen (Section 3.1):

If the resource has a leaf content type, the station will replace the response body with data from the downstream queue pertaining to the Slitheen ID of the flow and change the content type of the resource to "slitheen". It then re-encrypts the modified record, recomputes the TCP checksum, and sends the packet on its way. If there is a shortage of downstream data, the station will replace the resource with garbage bytes, padding the response body to the expected length.

By replacing the leaf resources of valid HTTP requests, Slitheen perfectly imitates an access to an overt site. Regardless of advances in website fingerprinting techniques, a censor will be unable to distinguish between a Slitheen decoy routing session and a regular access to the overt site based on packet sequence patterns such as packet lengths, directionalities, and timings.

You are right, though, that in other cases it is important to consider whether traffic characteristics of the covert traffic affect the cover channel in detectable ways. The argument advanced in Facet (for video) and Mailet (for text-based social media) is that no single cover channel is appropriate for all covert traffic; one should choose the cover channel to match the characteristics of the covert traffic you want to transmit.

⋯it is unclear whether there can be a single cover protocol that can handle arbitrary Internet content. An alternative strategy is to develop a small number of systems, each of which is difficult to detect or block when carrying a specific type of content.

@yafeng-Soong
Copy link

When testing the system’s ability to resist traffic analysis attacks, the authors only tested the case where covert data was YouTube video same as the carrier video. Then authors concluded that the system has a very good traffic analysis resistance. However, the difference between video streams and video streams is not obvious. So I think authors should also test the cases when the covert data is web pages, files, and emails, to make the conclusion more convincing.

That's the beauty of the design: the type of the covert data does not matter. That's because the traffic characteristics of the carrier video are independent of the traffic characteristics of the covert data. The size and timing of the carrier's SRTP packets are never modified from what they would be naturally: the only modification is that EFBP payload data is replaced with encrypted covert data of the same size. If there is no covert data available to send at the moment, the payload is replaced with encrypted padding instead. (Section 4.2: "Protozoa will continuously stream video until the termination of the covert session, even when there is no covert traffic to be transmitted; in this case, dummy payload (chaff) is sent.")

You can compare to a similar design in Slitheen (Section 3.1):

If the resource has a leaf content type, the station will replace the response body with data from the downstream queue pertaining to the Slitheen ID of the flow and change the content type of the resource to "slitheen". It then re-encrypts the modified record, recomputes the TCP checksum, and sends the packet on its way. If there is a shortage of downstream data, the station will replace the resource with garbage bytes, padding the response body to the expected length.

By replacing the leaf resources of valid HTTP requests, Slitheen perfectly imitates an access to an overt site. Regardless of advances in website fingerprinting techniques, a censor will be unable to distinguish between a Slitheen decoy routing session and a regular access to the overt site based on packet sequence patterns such as packet lengths, directionalities, and timings.

You are right, though, that in other cases it is important to consider whether traffic characteristics of the covert traffic affect the cover channel in detectable ways. The argument advanced in Facet (for video) and Mailet (for text-based social media) is that no single cover channel is appropriate for all covert traffic; one should choose the cover channel to match the characteristics of the covert traffic you want to transmit.

⋯it is unclear whether there can be a single cover protocol that can handle arbitrary Internet content. An alternative strategy is to develop a small number of systems, each of which is difficult to detect or block when carrying a specific type of content.

Thanks for patiently answering my questions! Respect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reading group summaries and discussions of research papers and other publications
Projects
None yet
Development

No branches or pull requests

3 participants