Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert back to http connection #957

Closed
tleydxdy opened this issue Dec 10, 2019 · 23 comments
Closed

Revert back to http connection #957

tleydxdy opened this issue Dec 10, 2019 · 23 comments
Labels
question Further information is requested

Comments

@tleydxdy
Copy link
Contributor

tleydxdy commented Dec 10, 2019

Now that using quic will be banned too, I think we should go back to using http, as it's less dependency, and less fragile (on Alpine at least). or offer it as a build option?

@unixfox
Copy link
Member

unixfox commented Dec 10, 2019

How do you know that the quic workaround is banned by Google? I do agree that it may fix #917 but if the workaround is still working why reverting to a state where almost every invidious would stop working due to the Google captcha?

@haizrul
Copy link

haizrul commented Dec 10, 2019

Use https://anti-captcha.com will solve the captcha problem.

@tleydxdy
Copy link
Contributor Author

@unixfox because people are being banned?

@unixfox
Copy link
Member

unixfox commented Dec 10, 2019

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be).
The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet.
I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

@haizrul
Copy link

haizrul commented Jan 8, 2020

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be).
The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet.
I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

Hi sir, may i know how to block all the API endpoints except for web uses like you said? I want to implement it on my instance too. Please help.

@unixfox
Copy link
Member

unixfox commented Jan 8, 2020

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be).
The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet.
I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

Hi sir, may i know how to block all the API endpoints except for web uses like you said? I want to implement it on my instance too. Please help.

I just used the status parameter of Caddy like this:

status 403 {
       /api/v1/videos
       /api/v1/channels
       /api/v1/search
       /api/v1/mixes
}

@haizrul
Copy link

haizrul commented Jan 8, 2020

I use Debian 10, can you advice what file should i edit?

@unixfox
Copy link
Member

unixfox commented Jan 8, 2020

If you installed the Caddy webserver with this script: https://github.com/sayem314/Caddy-Web-Server-Installer
Then it's in the /etc/Caddyfile

@haizrul
Copy link

haizrul commented Jan 8, 2020

If you installed the Caddy webserver with this script: https://github.com/sayem314/Caddy-Web-Server-Installer
Then it's in the /etc/Caddyfile

Ok sir, i will try it. Thanks a lot for the help! 👍

@omarroth omarroth added the question Further information is requested label Jan 9, 2020
@omarroth
Copy link
Contributor

omarroth commented Jan 20, 2020

There are two different kinds of CAPTCHAs:


The first is similar to one of the reported errors in TeamNewPipe/NewPipe#2924, and looks like this:
72490565-84024080-37e5-11ea-9f06-d1bed1f6df2d

(For reference, the "submit" button makes a POST request to https://www.youtube.com/das_captcha, with the result of the CAPTCHA as "g-captcha-response" IIRC).

After a successful POST YouTube returns a new cookie goojf that the client can then use for subsequent requests.


The second one is more generic and looks like this:
image

After a successful POST (to https://www.google.com/sorry/index... you receive a GOOGLE_ABUSE_EXEMPTION cookie that is valid for around 6 hours (the cookie itself has an expires value or similar that you can use).


The goojf cookie provided by the first does not consistently prevent future captchas, and is not practical to bypass using something like anti-captcha (see #886). This captcha is completely bypassed when using QUIC. This is also why you will never see this type of CAPTCHA when using Chrome (except on first load), since all subsequent requests use QUIC.

The GOOGLE_ABUSE_EXEMPTION cookie will consistently prevent captchas from appearing until it expires. This is the captcha that is actually being bypassed when using anti-captcha.

@unixfox
Copy link
Member

unixfox commented Jan 22, 2020

@omarroth
Do you plan to support the cookie GOOGLE_ABUSE_EXEMPTION for anti-recaptcha? My instance is not blocked for viewing videos but for the channels.
When invidious is fetching the channel info it gets the second type of block that you explained with "/sorry/index".

Thus, the automatic captcha solving doesn't work because invidious doesn't check if the instance is partially blocked. Like only for fetching the channels.

@omarroth
Copy link
Contributor

omarroth commented Jan 22, 2020

Do you plan to support the cookie GOOGLE_ABUSE_EXEMPTION for anti-recaptcha?

This is the only cookie that is currently supported.

For clarification, what does e.g.

$ curl -sD - -o /dev/null 'https://www.youtube.com/browse_ajax?continuation=4qmFsgI8EhhVQ2EzamdoSUxCa3BiTW03bnBoeGlCcUEaIEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUFlZ0V4&gl=US&hl=en'

return for you? (you may also need to specify curl -4 or curl -6).

@unixfox
Copy link
Member

unixfox commented Jan 22, 2020

That's strange because the automatic anti-recaptcha never wants to activate itself. I though the anti-recaptcha was only designed for watching videos according to the source code: https://github.com/omarroth/invidious/blob/master/src/invidious/helpers/jobs.cr#L239

I'm on the phone but the curl command should returns the same second page with "our systems have detected...".

Everytime I fetch a channel I get a JSON::ParseException like described in #963

@unixfox
Copy link
Member

unixfox commented Jan 23, 2020

My bad you are right @omarroth, it does indeed support the cookie GOOGLE_ABUSE_EXEMPTION.
But as you can see it check only if the instance is blocked for video loading: https://github.com/omarroth/invidious/blob/master/src/invidious/helpers/jobs.cr#L239.
I modified the URL to /browse_ajax?continuation=4qmFsgI8EhhVQ2EzamdoSUxCa3BiTW03bnBoeGlCcUEaIEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUFlZ0V4&gl=US&hl=en and the anticaptcha worked.

Can you add that new URL in the source code or come up with a way to detect if a request that invidious does is redirected to /sorry/index then trigger the bypass_captcha function?

@Perflyst
Copy link
Contributor

I have similar behavior but with video informations, like comments, likes etc

@artths
Copy link

artths commented Mar 11, 2020

After a successful POST (to https://www.google.com/sorry/index... you receive a GOOGLE_ABUSE_EXEMPTION cookie that is valid for around 6 hours (the cookie itself has an expires value or similar that you can use).

I'm trying to implement anti-captcha for NewPipe. Currently I receive second type of captcha - "https://www.google.com/sorry/index..." and try to make post with 3 params: "q", "continue" and "g-recaptcha-response" but never receive GOOGLE_ABUSE_EXEMPTION cookie nor any redirect url. What I do wrong?

@unixfox
Copy link
Member

unixfox commented Mar 11, 2020

What's the error message given by Google? Also what's the status code when doing a request? If it's a 400 status code then there is something wrong in your code.

@artths
Copy link

artths commented Mar 11, 2020

It remains the same page with same url "https://www.google.com/sorry/index.." and 429 status code, like I didn't post at all.

@unixfox
Copy link
Member

unixfox commented Mar 11, 2020

Is your request a POST request?
Also is your request body converted from query strings and has a Content-Type header of application/x-www-form-urlencoded?
It is also preferred to specify the referrer.
Here is an example of a body made by a browser:

g-recaptcha-response: 03AERD8Xp5eQ8xX4nwTMr3_8OzfFyoU4IDcMW6ealj6gUNVsCSmB2AlZDuXtKkjIoCICyO5ZBK_mFfGKaXOjGqkHNvVkXhHmAPNCsU2FRip2hweFGYSVrgRzVRyeVKStSFM5WkLfxMXlp_2L-Liu6JCPo_LS_-0yJqA1zyAN6diQRyqEduU7qp6Lo0MhciuTj0SlAxzV2WDaIgubS_pd9x8gqfsCa6rEJ2y8tVyD-m_k1TJmcrUQlpsuRMnRfsM2BFggApYZ8TGTC5y-breO3IlnMsxKMa9-g6jt3IBVHE3BZ8mMcdTdp1A0En7_fkeZvpUM7BKTtwVu9Y4fc-9G5aeDRp6D8RseAN-rEng9S6lA_g91EhGqaaw33vZt4S0HQMbMqVeCoVCrdGtpevIUrEfjSrv7RjSUVC8WQzRmwAc4R4KDIqC_DQ_tGf5dBpY9HMihJvhP-twAdRTPWsDUDlrirpdL19bWimHg
q: EhAgAQZ8JmAEJQABAAAAAAmCGLqqo_MFIhkA8aeDS6ASm_qRFdynMgfJqm_jtxy0t4GDMgFy
continue: https://www.google.com/search?q=test

The best way to know if your request is correct or incorrect is to use a proxy like mitmproxy and compare your request with a request made in a browser.

EDIT: Here is an example code from one of my project: https://github.com/unixfox/proxy-sorry-google-recaptcha/blob/master/anticaptcha.js#L53. I hope this will help you.

@artths
Copy link

artths commented Mar 11, 2020

Yes, I do POST with okhttp3.

            FormBody.Builder formBodyBuilder = new FormBody.Builder();

            for (Map.Entry<String, String> entry : mCaptchaInputs.entrySet()) {
                formBodyBuilder.add(entry.getKey(), entry.getValue());
            }

            okhttp3.Request request = new okhttp3.Request.Builder()
                    .url(mCaptchaPostUrl)
                    .addHeader("User-Agent", USER_AGENT)
                    .addHeader("Accept-Language", "en-GB, en;q=0.9")
                    .addHeader("Content-Type", "application/x-www-form-urlencoded")
                    .addHeader("X-YouTube-Client-Name", "1")
                    .addHeader("X-YouTube-Client-Version", "2.20200214.04.00")
                    .post(formBodyBuilder.build())
                    .build();

            okhttp3.Response response = client.newCall(request).execute();

I can confirm that "q" value I post is the same as located in page.

I see omarroth closes previous connection just before POST. I parse the page, close connection, wait for the captcha task and then POST. Could it be the reason?

@artths
Copy link

artths commented Mar 11, 2020

LOL. In case anybody need this:
It was auto redirect of okhttp. I was getting a cookie and a redirect to the original url. Regular browser would set the cookie and redirect you to your page, but I was getting redirect without setting a cookie, so redirected again to a new captcha page.

@unixfox
Copy link
Member

unixfox commented Mar 11, 2020

I had the same issue with got, that's why I had to set methodRewriting to false here: https://github.com/unixfox/proxy-sorry-google-recaptcha/blob/master/anticaptcha.js#L68.
2020-03-11_21-49

@artths
Copy link

artths commented Mar 11, 2020

Oh. I wish I understand JS well. In any case thank you for help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants