Serverless Scraper with AWS Lambda Function and Selenium WebDriver
Spetial Thanks to
- Configure ~/.aws/config
$ aws configure
- git clone
$ git clone https://github.com/gkzz/serverless-scraper.git \
&& cd serverless-scraper
- Install chromedriver and headless-chrome
$ . install.sh
- Install serverless framework and modify config/.env
$ sudo npm install -g serverless
$ cd selenium-layer
$ npm init
$ sudo npm install --save serverless-dotenv-plugin
$ cat config/.env.tmpl > config/.env
$ pip install -t selenium/python/lib/python3.7/site-packages selenium
- Deploy Chrome/Selenium WebDriver!
$ serverless print
$ serverless deploy
- Prepare before deploying a Python Program with sls command
$ cd lambda
$ npm init
$ sudo npm install --save serverless-python-requirements \
> && sudo npm install --save serverless-dotenv-plugin \
> && sudo npm install --save serverless-offline
$ cat config/.env.tmpl > config/.env
- Deploy it!
$ serverless print
$ serverless deploy
- serverless
- Framework Core: 1.60.4
- Plugin: 3.2.6
- SDK: 2.2.1
- Components Core: 1.1.2
- Components CLI: 1.4.0
- ChromeDriver 2.40
- Headless Chrome v1.0.0-45
- Amazon Linux
- Python 3.7
gkz@local ~/serverless-chrome (master) $ tree -L 2
.
├── install.sh
├── lambda
│ ├── config
│ ├── handler.py
│ ├── node_modules
│ ├── package.json
│ ├── package-lock.json
│ └── serverless.yml
├── LICENSE
├── README.md
└── selenium-layer
├── config
├── driver
├── node_modules
├── package.json
├── package-lock.json
├── selenium
└── serverless.yml
8 directories, 10 files
selenium.common.exceptions.WebDriverException: Message: chrome not reachable
- Check the following issues
- adieuadieu/serverless-chrome/issues/133
Copyright (c) 2020 gkz
Licensed under the MIT license.
Unless attributed otherwise, everything is under the MIT licence. Some stuff is not from me, and without attribution, and I no longer remember where I got it from. I apologize for that.