-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Error 404 from urllib during WebDriver initialization #439
Comments
It is an issue with undetected_chromedriver. The following pull has fixed the issue for me: ultrafunkamsterdam/undetected-chromedriver#1427 |
@djuelg Any luck? |
Hi Folks, I get exactly the same error. However, also new to linux. @vincentvonu what exactly should I do to get the pull to my machine (ubuntu 22.04)? Would be awesome if you can explain it to a dummy :-) Thanks! |
So far, I haven't had time to try vincentvonus approach. The easiest solution for now would probably be to put the deb file of an older chrome version into your local flathunter checkout and then edit the Dockerfile to install the local deb-file instead of the ppa. |
You have to replace the file patcher.py in the undetected_chromedriver lib with the one from the pull I've linked. If you have created a virtualenv for your flathunter instance, you should find this file under: /home/[user]/.local/share/virtualenvs/[name of your flathunter venv]/lib/python/site-packages/undetected_chromedriver/ (this is the debian path, ubuntu might be slightly different) Hope that helps! |
Hi all. This is a patch for the Dockerfile, which basically pulls an old version of chrome. A bit hacky but it works. diff --git a/Dockerfile b/Dockerfile
index 793656d..60b3444 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -9,6 +9,11 @@ RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
+# Check available versions here: https://www.ubuntuupdates.org/package/google_chrome/stable/main/base/google-chrome-stable
+ARG CHROME_VERSION="114.0.5735.90-1"
+RUN wget --no-verbose -O /tmp/chrome.deb https://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-stable/google-chrome-stable_${CHROME_VERSION}_amd64.deb \
+ && dpkg -i /tmp/chrome.deb \
+ && rm /tmp/chrome.deb
# Upgrade pip, install pipenv
RUN pip install --upgrade pip In my opinion the installation of Chrome should be version-pinned so stuff like this doesn't happen. |
I've merged the PR from dependabot that updates |
Thanks a lot @vincentvonu . I tried this approach, but somehow got an error. Fixed it by simply getting Chrome 114. |
As I commented on another ticket, if you have both Chromium and Chrome
installed at different versions, I think the version detection gets
confused right now. This is not uncommon on Linux as the Google releases
are often a little while ahead of the open source. So maybe check for a
version mismatch between chrome and chromium if you are seeing this.
CMMenzel ***@***.***> schrieb am Di., 22. Aug. 2023, 21:21:
… Hi Folks, I get exactly the same error. However, also new to linux.
@vincentvonu <https://github.com/vincentvonu> what exactly should I do to
get the pull to my machine (ubuntu 22.04)? Would be awesome if you can
explain it to a dummy :-) Thanks!
You have to replace the file patcher.py in the undetected_chromedriver lib
with the one from the pull I've linked.
If you have created a virtualenv for your flathunter instance, you should
find this file under: /home/[user]/.local/share/virtualenvs/[name of your
flathunter venv]/lib/python/site-packages/undetected_chromedriver/ (this is
the debian path, ubuntu might be slightly different)
Hope that helps!
Thanks a lot @vincentvonu <https://github.com/vincentvonu> . I tried this
approach, but somehow got an error. Fixed it by simply getting Chrome 114.
—
Reply to this email directly, view it on GitHub
<#439 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEK5TZYEEPWA6S2WXQBGLXWUBE3ANCNFSM6AAAAAA3N3HZME>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Okay. I take it back. This seems to be 100% a problem with |
You're welcome! That's why I think version pinning is the right move, things are easier to troubleshoot if everyone has the same version, and you can just bump the latest working chrome version. But I understand, not everyone has docker so it's a bit hard to enforce. |
Hey, any idea how to deploy the fix that seems to work in local docker images for deployment as app engine in gcloud? Struggling with this |
Hi @paulwelzel, The Google App Engine deployment doesn't support crawling Immobilienscout because we can't install chrome in the App Engine environment. For Google Cloud Run, you can apply the same dockerfile changes to the |
Just merged #454, which bumps undetected-chromedriver to a version that should support the latest chrome. Please retest and check if this resolves your issues - thanks! |
Also merged #460 which fixes a bug in the flathunter's use of the new undetected-chromedriver. Closing this for now - please re-open if you're still having trouble. |
Hello altogether,
using the current main version, I seem to have the same issue as #257, but for different reasons. The script crashes while trying to initialize the Chrome WebDriver for the crawler. I'm getting the following output:
The script crashes only if the immobilienscout24 url is part of the search. The other three urls work just fine.
I added the following information to the logger output, which shows that the version parsing is not the problem (like it was in #257):
I tested other immobilienscout24 urls which also didn't seem to work.
The script is running via
docker-compose
on an Ubuntu 20.04.6 host.The text was updated successfully, but these errors were encountered: