Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using proxy and scraping services for hiding servers? #336

Open
mmmray opened this issue Feb 28, 2024 · 6 comments
Open

Using proxy and scraping services for hiding servers? #336

mmmray opened this issue Feb 28, 2024 · 6 comments

Comments

@mmmray
Copy link

mmmray commented Feb 28, 2024

This idea is a bit out there and I lack some networking understanding to determine whether it is doable at all.

There are a few companies out there that provide access to a proxy network of "residential IPs". Basically comparable with botnets. You get a SOCKS5 proxy endpoint, and the TCP connections you establish through that proxy enter the public network through what I can assume is somebody's malware-infected mobile phone or home computer. Basically, botnet as a service. Those services have relatively high, but not prohibitively high prices per-gigabyte, and are mostly to bypass detection when scraping websites for data.

These companies advertise a pool of "millions of organic IPs". Now I wonder, can this kind of service be used in this setup:

  1. there is a middlebox in a censored country. it masquerades as a website and has two hidden endpoints: one for clients to connect to, and one for establish reverse tunnels to outside of the GFW
  2. an exit server in the friendly country attempts to establish connections to that middlebox, conceiled as web traffic. Proxy networks are used to hide the source IP in order to make the (client ip, server ip) distribution seem organic, as if foreign visitors are browsing the website
  3. clients connect to the middlebox, and their packets travel from middlebox through the proxy network to the exit server

the idea of middleboxes is not new for sure. what I wonder is, what are the challenges getting the traffic from the middlebox over the firewall, generally speaking, and am i correct in assuming there's a challenge in making that traffic seem organic?

@gaukas
Copy link

gaukas commented Mar 1, 2024

That's an interesting idea. So effectively your model is equivalent to (from a censor's perspective):

  • Multiple residential IPs from outside of the censorship perimeter connecting to a server inside the perimeter.
  • Multiple residential IPs from within the perimeter connecting to the same server.

While the model may seem to be fairly common (Baidu.com?), there are still a few discrepancies censors may notice, majorly via traffic shaping detection:

  • TLS-over-TLS pattern: the most controversial traffic shaping problem in circumvention community. See xue-usenix2024.
  • Flow direction: HTTP round trips are usually asymmetrical, that one relatively small request triggers a relatively large response. Without a very smart padding/fragmenting, the direction will look like reversed on the leg crossing the censorship perimeter, i.e., a small response (HTTP server -> HTTP client) triggering a large request.
  • Connection TTL/Timing: HTTP connections are likely short-lived and immediately start communication once established.

Please feel free to point me out in case I made any mistake. And I believe there should be more common challenges that the circumvention community is currently facing.

As said, web browsing might not be an ideal traffic source for mimicry purposes. There could be better candidates like online gaming, video streaming/conferences, etc.

@mmmray
Copy link
Author

mmmray commented Mar 1, 2024

You're right there are quite a few features being used by some censors that this solution does not cover. I was mainly focused on Iran where it seems to me the main issue today with censorship are per-IP bandwidth limits rather than things like TLS-in-TLS detection.

@gaukas
Copy link

gaukas commented Mar 1, 2024

To be fair, overall this is still not a bad idea, since the strongest advantages the circumvention community has against censors are the variety and agility. To not fall into the dead cycle of cat-and-mouse game, I believe it is crucial to introduce more novel designs/approaches.

@klzgrad
Copy link

klzgrad commented Mar 8, 2024

Connection TTL/Timing: HTTP connections are likely short-lived

Many ones are short-lived, but also many ones are long-lived, e.g. HTTP/2 connections.

The problem is, if a proxy tunnel connection multiplexes several H/2 connections, the tunnel connection will be even longer-lived than each indivisual H/2 ones. And this cannot be shortened without breaking the payload connections.

Given payload H/2 connections, the only way to shorten the tunnel connection time limit is connection migration, which is only available in H/3, or I forget where but wkrp may have mentioned it somewhere here. But overall this dimension is quite difficult to parrot in terms of engineering.

@ValdikSS
Copy link

@mmmray
Copy link
Author

mmmray commented Mar 21, 2024

the proxy services I have in mind do not allow listening for inbound connections on a port, hence the need for a middlebox and the pretending that there is a website with organic traffic. this upnproxy vulnerability sounds like a middlebox might not be necessary at all, meaning that clients can connect directly to a bunch of IPs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants