Skip to content

Conversation

@kinshukdua
Copy link

Fixes #139
The library webdriver-manager is used to automatically download the correct version of ChromeDriver in case no path is passed in the environment variable.
The library will also automatically upgrade the drivers in case Chrome is updated. To maintain backwards compatibility or to use a specific version of the drivers, the user can always override it by setting the environment variable.
Reopening this PR after creating a new branch.
I'm on Windows and this seems to be working perfectly for me with the given config and I'm getting the desired output from scraping.
When running the script multiple times, it will suggest you to download the drivers every time, however it will use the cached version (default cache range is 1 day, can be increased) and won't download it again. The driver is by default stored in a directory like this ~/.wdm/chrome/drivers/chromedriver/linux/94.0.4606.61/chromedriver, it is a possible to find out if the file exists however it will be tedious as figuring out the chrome version requires using these command-

        OSType.LINUX: linux_browser_apps_to_cmd('google-chrome', 'google-chrome-stable'),
        OSType.MAC: r'/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version',
        OSType.WIN: r'reg query "HKEY_CURRENT_USER\Software\Google\Chrome\BLBeacon" /v version'

@bidoubiwa
Copy link
Contributor

Hello @kinshukdua I will not have time to review and merge your PR before the end of the hacktoberfest. So I'm adding the hacktoberfest-accepted label for it to be validated. Your PR is amazing, really sorry I cannot give it the time right now it deserves.

If you are participating in Hacktoberfest, and you would like to receive a small gift from MeiliSearch too, please complete this form.

@kinshukdua
Copy link
Author

@bidoubiwa Thank you! That is okay, I totally understand. I had a super pleasant experience contributing to Meilisearch. I hope to continue contributing in the future.

@brunoocasali brunoocasali self-requested a review February 3, 2022 14:39
@brunoocasali brunoocasali added the enhancement New feature or request label Feb 9, 2022
Copy link
Member

@brunoocasali brunoocasali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kinshukdua, sorry for this huge delay 😢

I tried your PR locally (I ran the scrapper locally, but not the tests) and I have some trouble running it because my docker environment doesn't contain all the required stuff to be possible to run successfully!

I left some comments in your PR, I hope they could help you to get this merged!

I'm really happy with your contribution, thanks a lot for sending this PR to us ❤️

Comment on lines +54 to +57
if sys.platform == "linux" or sys.platform == "darwin":
os.system('read -s -n 1 -p "Press any key to continue..."')
if sys.platform == "win32":
os.system('pause')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could remove these lines :)

Comment on lines +62 to +65
if sys.platform == "linux" or sys.platform == "darwin":
os.system('read -s -n 1 -p "Press any key to continue..."')
if sys.platform == "win32":
os.system('pause')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these also.

After using the option "n" when I was asked, I received the message and a shell error. Analyzing a little bit the lines, they don't add much value to your implementation, since the idea is to finish the execution flow.

Please install ChromeDriver and set the CHROMEDRIVER_PATH environment variable or remove the render_js option.
/bin/sh: 1: read: Illegal option -s

monkeypatch):
monkeypatch.setattr("selenium.webdriver.chrome",
lambda x: MockedInit())
monkeypatch.setattr('builtins.input', lambda _: "y")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating an option "auto-yes" like apt-get install -y? Because this will be needed when we will run in CI or automated environments.

"ChromeDriver is not installed.".format(CHROMEDRIVER_PATH))
print("Do you want to automatically download ChromeDriver?")
while(True):
user_input = input("[Y/n]: ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use only "[y/n]: " because the uppercased Y means for Linux users: press enter and the default choice will be "Yes".

Like this:
image

When `js_render` is set to `true`, the scraper will use ChromeDriver. This is needed for pages that are rendered with JavaScript, for example, pages generated with React, Vue, or applications that are running in development mode: `autoreload` `watch`.

After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH` (default value is `/usr/bin/chromedriver`).
After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH`. If the variable is not set, the scraper will automatically download and use a compatible version of ChromeDriver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH`. If the variable is not set, the scraper will automatically download and use a compatible version of ChromeDriver.
After installing ChromeDriver, provide the path to the bin using the following environment variable `CHROMEDRIVER_PATH`. If the variable is not set, the scraper will ask for permission to download and use a compatible version of ChromeDriver.

if yes:
try:
CHROMEDRIVER_PATH = ChromeDriverManager().install()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.chmod(CHROMEDRIVER_PATH, 0o777)

I tested locally using docker, and I wasn't able to use the driver because the downloaded driver wasn't executable. I'm not sure if this is the best way to handle this, but it worked here in the docker env.

meilisearch = "==0.16.1"
requests-iap = "==0.2.0"
python-keycloak-client = "==0.2.3"
webdriver-manager = "==3.4.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could update to webdriver-manager = "==3.5.2" 🎉

bors bot added a commit that referenced this pull request Feb 10, 2022
184: Add libnss3 package to Dockerfile r=brunoocasali a=brunoocasali

Following the discussions about this issue #139 and after running this #165 locally, I had some trouble using `chrome_webdriver` because my `Dockerfile` didn't have this package.

This is not a fix for the mentioned issue, is just a small part of it!

Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
@alallema alallema added the stale Pull request or issue that has recieve no activity for a long time. label Oct 27, 2022
@alallema
Copy link
Contributor

alallema commented Aug 3, 2023

This PR was opened a long time ago and hasn't been changed in over a year, so I'm closing it.

@alallema alallema closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request stale Pull request or issue that has recieve no activity for a long time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exception: Env CHROMEDRIVER_PATH='/usr/bin/chromedriver' is not a path to a file

5 participants