New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception: Env CHROMEDRIVER_PATH='/usr/bin/chromedriver' is not a path to a file #139
Comments
Hey @ssaylo, thanks for your feedback :) could you show me your docs-scraper configuration file and your docker command? |
@bidoubiwa {
"index_uid": "trantor_docs",
"start_urls": [
{
"url": "https://trantor-docs-dev.app.terminus.io/"
}
],
"selectors": {
"lvl0": ".ant-page-header-heading-title",
"lvl1": ".ant-card-body h1",
"lvl2": ".ant-card-body h2",
"lvl3": ".ant-card-body h3",
"lvl4": ".ant-card-body h4",
"lvl5": ".ant-card-body h5",
"lvl6": ".ant-card-body h6",
"text": ".ant-card-body p, ant-card-body a, .ant-card-body li, .ant-card-body td, .ant-card-body code span, .antd-card-body code, .antd-card-body pre, .antd-card-body strong, .antd-card-body a, .antd-card-body"
},
"js_render": true,
"js_wait": 1
} and this is my docker command docker run -t --rm --network=host \
-e MEILISEARCH_HOST_URL=127.0.0.1:80 \
-e MEILISEARCH_API_KEY=myMasterKey \
-e CHROMEDRIVER_PATH=/usr/local/bin/chromedriver \
-v /Users/ssaylo/Company/docs-scraper/config.json:/docs-scraper/config.json \
getmeili/docs-scraper:v0.9.5 pipenv run ./docs_scraper config.json
docker run -t --rm \
-e MEILISEARCH_HOST_URL=127.0.0.1:80 \
-e MEILISEARCH_API_KEY=myMasterKey \
-v /Users/ssaylo/Company/docs-scraper/config.json:/docs-scraper/config.json \
getmeili/docs-scraper:latest pipenv run ./docs_scraper config.json This problem has bothered me for a whole day and now I don't konw how to resolve it, thank you for your help. |
In your config file you are using the following options:
Which requires a downloaded
|
Docker won't be able to use the chrome driver from you local environment. You could install it into the container by updating the RUN apt-get update -Y \
&& apt-get install -y python3-pip \
&& apt-get install -y chromium Then build the container |
@sanders41 appreciate |
@sanders41 The solution that you suggested me didn't working because it need chromedriver... FROM algolia/docsearch-scraper-base
WORKDIR /docs-scraper
COPY . .
ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8
RUN apt-get update -y \
&& apt-get install -y python3-pip
RUN pip3 install pipenv
RUN pipenv --python 3.6 install or you could add these to your dockerfile: # Install selenium
ENV LC_ALL C
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
RUN useradd -d /home/seleuser -m seleuser
RUN chown -R seleuser /home/seleuser
RUN chgrp -R seleuser /home/seleuser
RUN apt-get update -y && apt-get install -yq \
software-properties-common\
python3.7
RUN add-apt-repository -y ppa:openjdk-r/ppa
RUN apt-get update -y && apt-get install -yq \
curl \
wget \
sudo \
gnupg \
&& curl -sL https://deb.nodesource.com/setup_8.x | sudo bash -
RUN apt-get update -y && apt-get install -yq \
nodejs -yq
RUN apt-get update -y && apt-get install -yq \
unzip \
xvfb \
libxi6 \
libgconf-2-4 \
default-jdk
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
RUN echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
RUN apt-get update -y && apt-get install -yq \
google-chrome-stable=85.0.4183.102-1 \
unzip
RUN wget -q https://chromedriver.storage.googleapis.com/85.0.4183.83/chromedriver_linux64.zip
RUN unzip chromedriver_linux64.zip
RUN mv chromedriver /usr/bin/chromedriver
RUN chown root:root /usr/bin/chromedriver
RUN chmod +x /usr/bin/chromedriver
RUN wget -q https://selenium-release.storage.googleapis.com/3.13/selenium-server-standalone-3.13.0.jar
RUN wget -q http://www.java2s.com/Code/JarDownload/testng/testng-6.8.7.jar.zip
RUN unzip testng-6.8.7.jar.zip
# Install DocSearch dependencies
COPY Pipfile .
COPY Pipfile.lock .
ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8
ENV PIPENV_HIDE_EMOJIS 1
RUN apt-get update -y && apt-get install -yq \
python3-pip
RUN pip3 install pipenv
RUN pipenv install --python 3.6 |
I'm surprised chromium didn't work, I've use it in place of chrome for chrome driver before without issue. Maybe it was setting it as an executable that made a difference. Either way glad you got it working. @bidoubiwa would it be worth adding chromedriver to the container by default? Seems to be more and more common to have JS rendered pages, but maybe not so much for docs specifically. I guess it's really an ease of use vs container size question. |
Sorry for the late answer, it flew under the radar 🙈 Chromedriver is 16Mb, it also needs to be updated to the chrome version of the user. So if my chrome is at 9.X and my chromedriver at 9.Y it throws an error. Alternatively we can: Use the algolia base image in our dockerfile since this is after all based on their repo. @curquiza what do you think? Or pin this issue for future users. Additionally we should add some documentation in the ##js-wait part of the documentation |
Hello, |
Hey people, I will look into it during this week :D |
184: Add libnss3 package to Dockerfile r=brunoocasali a=brunoocasali Following the discussions about this issue #139 and after running this #165 locally, I had some trouble using `chrome_webdriver` because my `Dockerfile` didn't have this package. This is not a fix for the mentioned issue, is just a small part of it! Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
Hey guys.
I have trouble in using the docker scraper, I try it on mac and ubuntu, every time I run the docker image, it tells me that the file named chromedriver is not a file.
But I could run it manually.
Can you help me resolve it ? Is the chromedriver in the docker image or locallly?
The text was updated successfully, but these errors were encountered: