A crawler tool to crawl data from Covid Viet Nam to google sheets automatically.
- Python - 3.9 or newer
- Pipenv
- Google Drive API
- Google Sheets
- Azure Virtual Machine
- Tableau
- Crontab
- Basic Linux knowledges
Use $ pip install pipenv
to install pipenv. Then create & move to the folder you want to be project folder.
With pipfile in your project's file, install all dependencies $ pipenv install
.
- Go to Google Cloud Platform, create new project.
- Go to API & Services -> Enable Apis & services. Enable "Google Drive API" and "Google Sheets API".
- Click on IAM & Admin, go to services account -> create a new services account.
- Move to key tab -> ADD KEY -> Create new key -> Json type. NOTE: keep this json's key file to connect with google sheets.
- Create new sheet, name it "csv-to-gg-sheet" (change any as your wish, need to change in source code too).
- Add 1 work sheet name "covid_cases" to store covid case every day, 1 work sheet name "covid_death" to store covid death case every day. *Note: when change sheets name we need to change it name in source code too.
- Publish both sheets to web with csv type, save the share link to use in script.
- Go to Azure, create new Virtual Machine, save it's ssh_key.pem.
- Connect with Virtual Machine throgh ssh protocol, install python.
- Push source code, pipfile, ssh_key.pem file, json's key file to remote machine.
- Install all dependencies by pipenv at step 2.
- Download Tableau, use google sheet as data source, make your visualization.
- Publish Tableau visualization to the tableau public server, it will auto update(1 per day) when your google sheet data change.
- On Azure Virtual Machine run
$ crontab -e
-> set cronjob to run source code at every time you wish.