-
Notifications
You must be signed in to change notification settings - Fork 52
Add webdriver-manager for managing chrome drivers
#161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bidoubiwa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should force the download of the ChromeDriver to the user. @sanders41 do you have any opinion on the subject?
There are couple concerns I would have. What happens if a user already has The second concern is at many companies, software like |
|
Thanks a lot @sanders41. I 100% agree. So what I suggest is:
What do you think? |
|
@sanders41 If a user already has |
|
@bidoubiwa your plan sounds good to me. I think by not forcing the download it prevents the issues I can think of, but would still make it an option if the user wants it giving an easier path. |
|
@kinshukdua since we answered at the same moment, i'm not sure you saw my comment just above yours! |
|
@bidoubiwa I did, I agree with your suggestion, I'll add a new commit for the same. It just has to be a CLI prompt right? |
|
Yes! |
|
@bidoubiwa I've added the requested changes, please review and let me know if anything needs to be changed. |
|
Any idea why the tests are failing? |
|
@bidoubiwa They were failing because during testing, the BrowserHandler is called without |
|
I will be reviewing this PR this week :) |
|
@bidoubiwa Thanks! |
bidoubiwa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey so I found a few problems!
-
When running the script two times, both time it suggest to download the chromedriver.
-
In both cases, the scraping fails:
> Docs-Scraper: https://docs.meilisearch.com/learn/what_is_meilisearch/sdks.html 0 records)
> Docs-Scraper: https://docs.meilisearch.com/learn/getting_started/installation.html 0 records)
> Docs-Scraper: https://docs.meilisearch.com/learn/core_concepts/ 0 records)
> Docs-Scraper: https://docs.meilisearch.com/learn/getting_started/quick_start.html 0 records)
- I tried on the main branch, it still works with
js_renderand it renders correctly
> Docs-Scraper: https://docs.meilisearch.com/create/how_to/gcp.html 74 records)
> Docs-Scraper: https://docs.meilisearch.com/create/how_to/aws.html 83 records)
> Docs-Scraper: https://docs.meilisearch.com/learn/what_is_meilisearch/sdks.html 43 records)
> Docs-Scraper: https://docs.meilisearch.com/learn/getting_started/installation.html 37 records)
The problem is because you created you PR from your main branch you will have a hard time going back to this main branch.
I would suggest you reopen a PR with your PR on a branch by following these steps. If you prefer not, I suggest you re-clone this project in another directory.
Here is a test config file to see if your PR works:
{
"index_uid": "docs",
"sitemap_urls": ["https://docs.meilisearch.com/sitemap.xml"],
"start_urls": ["https://docs.meilisearch.com"],
"js_render": true,
"selectors": {
"lvl0": {
"selector": ".sidebar-heading.open",
"global": true,
"default_value": "Documentation"
},
"lvl1": ".theme-default-content h1",
"lvl2": ".theme-default-content h2",
"lvl3": ".theme-default-content h3",
"lvl4": ".theme-default-content h4",
"lvl5": ".theme-default-content h5",
"text": ".theme-default-content p, .theme-default-content li"
},
"strip_chars": " .,;:#",
"scrap_start_urls": true,
"custom_settings": {
"synonyms": {
"relevancy": ["relevant", "relevance"],
"relevant": ["relevancy", "relevance"],
"relevance": ["relevancy", "relevant"]
}
}
}|
Hi, @bidoubiwa I've opened a new PR #165 on a new branch. |
|
closed in favor of #165 |
Fixes #139
The library
webdriver-manageris used to automatically download the correct version ofChromeDriverin case no path is passed in the environment variable.The library will also automatically upgrade the drivers in case Chrome is updated. To maintain backwards compatibility or to use a specific version of the drivers, the user can always override it by setting the environment variable.