-
Notifications
You must be signed in to change notification settings - Fork 52
Add webdriver-manager for managing chrome drivers
#165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -466,7 +466,7 @@ If used, `min_indexed_level` is ignored. | |||||
|
|
||||||
| When `js_render` is set to `true`, the scraper will use ChromeDriver. This is needed for pages that are rendered with JavaScript, for example, pages generated with React, Vue, or applications that are running in development mode: `autoreload` `watch`. | ||||||
|
|
||||||
| After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH` (default value is `/usr/bin/chromedriver`). | ||||||
| After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH`. If the variable is not set, the scraper will automatically download and use a compatible version of ChromeDriver. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| The default value of `js_render` is `false`. | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,8 +1,11 @@ | ||||||
| import re | ||||||
| import os | ||||||
| import sys | ||||||
| from distutils.util import strtobool | ||||||
| from selenium import webdriver | ||||||
|
|
||||||
| from selenium.webdriver.chrome.options import Options | ||||||
| from webdriver_manager.chrome import ChromeDriverManager | ||||||
| from ..custom_downloader_middleware import CustomDownloaderMiddleware | ||||||
| from ..js_executor import JsExecutor | ||||||
|
|
||||||
|
|
@@ -26,12 +29,42 @@ def init(config_original_content, js_render, user_agent): | |||||
| chrome_options.add_argument('--headless') | ||||||
| chrome_options.add_argument('user-agent={0}'.format(user_agent)) | ||||||
|
|
||||||
| CHROMEDRIVER_PATH = os.environ.get('CHROMEDRIVER_PATH', | ||||||
| "/usr/bin/chromedriver") | ||||||
| if not os.path.isfile(CHROMEDRIVER_PATH): | ||||||
| raise Exception( | ||||||
| "Env CHROMEDRIVER_PATH='{}' is not a path to a file".format( | ||||||
| CHROMEDRIVER_PATH)) | ||||||
| CHROMEDRIVER_PATH = os.environ.get('CHROMEDRIVER_PATH', '') | ||||||
| if not CHROMEDRIVER_PATH or not os.path.isfile(CHROMEDRIVER_PATH): | ||||||
| print("Could not find ChromeDriver.") | ||||||
| print("Either the Env CHROMEDRIVER_PATH='{}' path is incorrect or " | ||||||
| "ChromeDriver is not installed.".format(CHROMEDRIVER_PATH)) | ||||||
| print("Do you want to automatically download ChromeDriver?") | ||||||
| while(True): | ||||||
| user_input = input("[Y/n]: ") | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| try: | ||||||
| yes = strtobool(user_input) | ||||||
| break | ||||||
| except ValueError: | ||||||
| print("Please enter a valid input.") | ||||||
| continue | ||||||
| if yes: | ||||||
| try: | ||||||
| CHROMEDRIVER_PATH = ChromeDriverManager().install() | ||||||
|
|
||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I tested locally using docker, and I wasn't able to use the driver because the downloaded driver wasn't executable. I'm not sure if this is the best way to handle this, but it worked here in the docker env. |
||||||
| except Exception as e: | ||||||
| print("Could not download ChromeDriver. " | ||||||
| "Please install ChromeDriver manually.") | ||||||
| print(e) | ||||||
| if sys.platform == "linux" or sys.platform == "darwin": | ||||||
| os.system('read -s -n 1 -p "Press any key to continue..."') | ||||||
| if sys.platform == "win32": | ||||||
| os.system('pause') | ||||||
|
Comment on lines
+54
to
+57
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you could remove these lines :) |
||||||
| sys.exit(1) | ||||||
| else: | ||||||
| print("Please install ChromeDriver and set the CHROMEDRIVER_PATH " | ||||||
| "environment variable or remove the render_js option.") | ||||||
| if sys.platform == "linux" or sys.platform == "darwin": | ||||||
| os.system('read -s -n 1 -p "Press any key to continue..."') | ||||||
| if sys.platform == "win32": | ||||||
| os.system('pause') | ||||||
|
Comment on lines
+62
to
+65
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And these also. After using the option "n" when I was asked, I received the message and a shell error. Analyzing a little bit the lines, they don't add much value to your implementation, since the idea is to finish the execution flow. |
||||||
| sys.exit(1) | ||||||
|
|
||||||
| driver = webdriver.Chrome( | ||||||
| CHROMEDRIVER_PATH, | ||||||
| options=chrome_options) | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ def test_extra_facets_should_be_set_from_start_urls_variables_browser(self, | |
| monkeypatch): | ||
| monkeypatch.setattr("selenium.webdriver.chrome", | ||
| lambda x: MockedInit()) | ||
| monkeypatch.setattr('builtins.input', lambda _: "y") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about creating an option "auto-yes" like |
||
|
|
||
| c = config({ | ||
| "start_urls": [ | ||
|
|
@@ -43,6 +44,7 @@ def test_extra_facets_should_be_set_from_start_urls_variables_with_two_start_url | |
| self, monkeypatch): | ||
| monkeypatch.setattr("selenium.webdriver.chrome", | ||
| lambda x: MockedInit()) | ||
| monkeypatch.setattr('builtins.input', lambda _: "y") | ||
|
|
||
| c = config({ | ||
| "js-render": True, | ||
|
|
@@ -74,6 +76,7 @@ def test_extra_facets_should_be_set_from_start_urls_variables_with_multiple_tags | |
| self, monkeypatch): | ||
| monkeypatch.setattr("selenium.webdriver.chrome", | ||
| lambda x: MockedInit()) | ||
| monkeypatch.setattr('builtins.input', lambda _: "y") | ||
|
|
||
| c = config({ | ||
| "start_urls": [ | ||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could update to
webdriver-manager = "==3.5.2"🎉