Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy support #176

Closed
raphCode opened this issue Mar 13, 2022 · 8 comments
Closed

Proxy support #176

raphCode opened this issue Mar 13, 2022 · 8 comments
Labels
downloader enhancement New feature or request good first issue Good for newcomers

Comments

@raphCode
Copy link
Contributor

raphCode commented Mar 13, 2022

I tried using a proxy to download via another IP, but couldn't get it to work via the http_proxy or HTTP_PROXY environment variables:

First, check "normal" public IP, then set a proxy, I tried some from this list until one worked: https://freeproxylists.net/
Check if the proxy works with the curl command (should return proxy IP).
Lastly, run suckit with the proxy and observe that the IP in the downloaded webpage is still the "normal" IP without the proxy.

curl ifconfig.me
prox=175.144.112.239:80
http_proxy=$prox curl ifconfig.me
http_proxy=$prox suckit -v http://ifconfig.me

Still, something is done with the proxy, since an invalid IP leads to an timeout or connection failure, and the latency is increased compared to a non-proxy run.


Beside this bug, a feature idea could be to offer multiple proxies to suckit, and the requests are split across the different proxies. This can further speed up downloading since less traffic is issued from a single IP.

@Skallwar
Copy link
Owner

Reqwest seems to support this. https://docs.rs/reqwest/latest/reqwest/struct.Proxy.html

We could read the environment variable before creating the Downloader

@Skallwar Skallwar reopened this Mar 16, 2022
@Skallwar Skallwar added enhancement New feature or request good first issue Good for newcomers downloader labels Mar 16, 2022
@raphCode
Copy link
Contributor Author

What I find strange, the environment variable is already read and processed somehow:

  • the download takes significantly longer with a proxy
  • bad proxy strings lead to failed downloads
  • running strings on the binary shows that the strings http_proxy and HTTP_PROXY are contained

But, the downloaded page does not show the proxy ip for the ifconfig.me website.

@Skallwar
Copy link
Owner

Skallwar commented Mar 17, 2022

My knowledge on proxy is quite limited but I agree with you, something is not right

@Skallwar
Copy link
Owner

From what I read here it should "just works"™

@Skallwar
Copy link
Owner

Skallwar commented Mar 17, 2022

This works for me (with some warning and retry for the proxy connection):

https_proxy=147.135.134.57:9300 suckit https://ifconfig.me
https_proxy=147.135.134.57:9300 suckit http://ifconfig.me

The ip I get is not my real ip

Note that I'm using https_proxy and not http_proxy. If I used your http_proxy with a random https_proxy this is not working. I think that for some reason we are doing https request even when specifying http.

@raphCode
Copy link
Contributor Author

raphCode commented Mar 21, 2022

Nice catch, I can confirm it works with https_proxy and http_proxy.
It seems suckit makes https content retrieval and additional http requests for something else.
If I had to guess, there is some code that resolves URLs, which is responsible for the http requests. (I remember some unit test which try to resolve an invalid lwn.net URL and looks for a redirect.)

My public server IP got blocked from scraping a particular website, so I can tell it needs both kinds of proxies to circumvent the block.


For future readers:
For multithreading downloading via proxies to work, the constants MAX_EMPTY_RECEIVES and SLEEP_MILLIS may need to be adjusted upwards, otherwise all worker threads exit prematurely: They receive no work in the time interval because of the increased proxy latency.

@Skallwar
Copy link
Owner

My public server IP got blocked from scraping a particular website

Typical SuckIT

Should we close this?

@raphCode
Copy link
Contributor Author

raphCode commented Mar 22, 2022

As far as I am concerned, yes.
Except you want to keep it to open for the multiple proxy feature. This was just an idea, nothing where I would contribute personally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downloader enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants