Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 Feature request: PROXY/SOCKS support #841

Open
unclearParadigm opened this issue Jul 15, 2023 · 12 comments
Open

💡 Feature request: PROXY/SOCKS support #841

unclearParadigm opened this issue Jul 15, 2023 · 12 comments
Labels
enhancement New feature that aren't already in Reddit

Comments

@unclearParadigm
Copy link

As a instance maintainer, running 6 instances behind a loadbalancer, with 6 public IP addresses I'm able to operate a public instance with only 30% of rate-limiting happening from reddits API side throughout a day - while other instances are constantly 425 Rate-limited. That realization brought me to experiment a little bit with piping libreddits' traffic through the TOR network, and it seems to work well enough. For containers, it seems to work reasonably well with David Personettes Tor Proxy. Also setups with proxychains work quite well. However, currently the experience is not as great as it could be. If libreddit would support a PROXY configuration and/or a SOCKS proxy it'd be easy to spin up a TOR proxy on a host system and route libreddits traffic directly through the TOR network without using hacks that force all traffic through the TOR network.

Suggestion

make a (socks-)proxy settable through well established env-vars

HTTP_PROXY
HTTPS_PROXY
http_proxy
https_proxy

However...

  • I'm not 100% sure if that does not count as "abuse" of the TOR network as quite some traffic might be generated.
  • if the rusty stack libreddit is using to support them (even though I believe it should be).

Any other/further thoughts on that?

@unclearParadigm unclearParadigm added the enhancement New feature that aren't already in Reddit label Jul 15, 2023
@ghost
Copy link

ghost commented Jul 15, 2023

Side note, it may require a semi large rework but we could utilize Arti (GitLab Repo; lib.rs page) instead of a proxy to a localhost TOR server which would require less setup than the latter (a TOR localhost proxy requires you to run TOR in the background while Arti implements all the logic itself, so you don't need anything except for the libreddit program to be running).

@stulto
Copy link

stulto commented Jul 16, 2023

However...

* I'm not 100% sure if that does not count as "abuse" of the TOR network as quite some traffic might be generated.

I feel if the instance also hosted TOR guard and relay nodes this shouldn't be an issue, and would actually help other instances/peers on the network.

Though there's still the problem of exit nodes... maybe have 2/3 of connections go through the TOR network (1/3 through exit nodes and the other 1/3 through Reddit's hidden service) and the other 1/3 go through clearnet. That way there'd be equal compensation for the use of exit nodes, and adequate compensation for the guard and relay nodes.

Though it might be smart to make this as a support layer that libreddit can sit atop instead of something baked directly into the project (as it would facilitate reuse in other privacy respecting frontends). Say a session based connection scrabler, where every session is randomly distributed over TOR, (maybe I2P,) and clearnet, over a predetermined list of servernames ("{old|www}.reddit.com" have their own hidden services "{old|www}.redditto...wfj4ooad.onion" which could also be used for scraping (I'm of course assuming that old.reddit.com and www.reddit.com would have independent rate limits as they're distinct servers but that may be wrong)).

Just a thought.

@seychelles111
Copy link

Hey i've utilized an idea to "educationally-use"-IPV6 #845

imo use an ipv6 per user.... maybe strong server would be needed ... idk

@seychelles111
Copy link

Time to route different /64 ipv4 addresses - or limit by /48 addresses

@ghost
Copy link

ghost commented Aug 13, 2023

I kind-of dirty patched TOR routing into libreddit with arti-hyper. It works, but it has a considerable delay because it creates a new TOR client for every request. That works great for anonymity and dodging 429s, but not so great for avoiding delays, so we'll want to store a global client and only reload when we hit a 429 in production probably.

src/client.rs:

diff --git a/src/client.rs b/src/client.rs
index 4c174cd..ca3eef6 100644
--- a/src/client.rs
+++ b/src/client.rs
@@ -1,24 +1,20 @@
+use arti_client::*;
+use arti_hyper::*;
 use cached::proc_macro::cached;
 use futures_lite::{future::Boxed, FutureExt};
-use hyper::client::HttpConnector;
-use hyper::{body, body::Buf, client, header, Body, Client, Method, Request, Response, Uri};
-use hyper_rustls::HttpsConnector;
+use hyper::{body, body::Buf, client, header, Body, Method, Request, Response, Uri};
 use libflate::gzip;
-use once_cell::sync::Lazy;
 use percent_encoding::{percent_encode, CONTROLS};
 use serde_json::Value;
 use std::{io, result::Result};
+use tls_api::{TlsConnector as TlsConnectorTrait, TlsConnectorBuilder};
+use tls_api_native_tls::TlsConnector;
 
 use crate::dbg_msg;
 use crate::server::RequestExt;
 
 const REDDIT_URL_BASE: &str = "https://www.reddit.com";
 
-static CLIENT: Lazy<Client<HttpsConnector<HttpConnector>>> = Lazy::new(|| {
-	let https = hyper_rustls::HttpsConnectorBuilder::new().with_native_roots().https_only().enable_http1().build();
-	client::Client::builder().build(https)
-});
-
 /// Gets the canonical path for a resource on Reddit. This is accomplished by
 /// making a `HEAD` request to Reddit at the path given in `path`.
 ///
@@ -75,7 +71,12 @@ async fn stream(url: &str, req: &Request<Body>) -> Result<Response<Body>, String
 	let uri = url.parse::<Uri>().map_err(|_| "Couldn't parse URL".to_string())?;
 
 	// Build the hyper client from the HTTPS connector.
-	let client: client::Client<_, hyper::Body> = CLIENT.clone();
+	let client: client::Client<_, hyper::Body> = {
+		let tor_client = TorClient::builder().bootstrap_behavior(BootstrapBehavior::OnDemand).create_unbootstrapped().unwrap();
+		let tls_connector = TlsConnector::builder().unwrap().build().unwrap();
+		let tor_connector = ArtiHttpConnector::new(tor_client, tls_connector);
+		hyper::Client::builder().build(tor_connector)
+	};
 
 	let mut builder = Request::get(uri);
 
@@ -129,7 +130,12 @@ fn request(method: &'static Method, path: String, redirect: bool, quarantine: bo
 	let url = format!("{}{}", REDDIT_URL_BASE, path);
 
 	// Construct the hyper client from the HTTPS connector.
-	let client: client::Client<_, hyper::Body> = CLIENT.clone();
+	let client: client::Client<_, hyper::Body> = {
+		let tor_client = TorClient::builder().bootstrap_behavior(BootstrapBehavior::OnDemand).create_unbootstrapped().unwrap();
+		let tls_connector = TlsConnector::builder().unwrap().build().unwrap();
+		let tor_connector = ArtiHttpConnector::new(tor_client, tls_connector);
+		hyper::Client::builder().build(tor_connector)
+	};
 
 	// Build request to Reddit. When making a GET, request gzip compression.
 	// (Reddit doesn't do brotli yet.)

Then I just added arti_client, arti_hyper, tls_api, and tls_api_native_tls to Cargo.toml.

@artemislena
Copy link
Contributor

T.: There's https://git.spec.cat/Nyaaori/libreddit also which uses Arti, too. Doesn't seem particularly slow either; we're using't for lr.artemislena.eu currently.

@ghost
Copy link

ghost commented Sep 15, 2023

@artemislena looks great :) thanks for sharing. maybe you could set up a github mirror and open a pull request here so your improvements are available to more people?

@avincent98144
Copy link

avincent98144 commented Sep 15, 2023 via email

@avincent98144
Copy link

avincent98144 commented Sep 15, 2023 via email

@ghost
Copy link

ghost commented Sep 15, 2023

@avincent98144 ...can you elaborate? i can't really tell what you're trying to say. if you meant to say Tanith's links lead nowhere, that's not true (at least not for me):

image

image

@avincent98144
Copy link

avincent98144 commented Sep 15, 2023 via email

@artemislena
Copy link
Contributor

artemislena commented Sep 16, 2023

T.: It's not our Forgejo instance, we didn't make the fork, n we don't got enough experience in Rust programming (or enough interest in programming in general) for doing this ^^; I mean sure we could open a PR but we can't provide any support on't, beyond on how ya host't; for the container it's recommended mounting /data in a persistent location (owned by UID 1000 in the container) for faster startup.
@avincent98144 Idk, the Forgejo link works fine for us, but ya can use https://send.artemislena.eu/download/08239cf210c1cd8f/#Qx11ppk5NmwGySBcsx-IRg for downloading a tarballa the repo (link's gonna expire after 100 downloads or 7 days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature that aren't already in Reddit
Projects
None yet
Development

No branches or pull requests

5 participants