DevBG Python Jobs Scraper

Overview

This is a scrapy project with a spider scraping DevBG for all Python job offers available.

All data is stored in sqlite3 database for consistency. The data stored consists of position, company, location, posting date and link to the offer.

Deltafetched is used to skip all previous data from current search.

After each scraping an email with all unsent offers from the last 2 days is sent.

The runtime of the scraper is automated with cron.

The scraper is meant to be hosted on a web server which I did with an EC2 server on AWS.

Setup on Linux

Clone the Repo

git clone https://github.com/ivo-bass/scrapers.git

Create virtual environment
```
cd scrapers

python3 -m venv venv
```
Activate virtual environment
```
source /venv/bin/activate
```
Install requirements
```
pip install -r requirements.txt
```

Create config file for email sender credentials

 cd devBG/devBG

 mkdir config && cd config && touch __init__.py

 echo "
 EMAIL_USER = 'change_this_to_your_email_address'
 EMAIL_PASS = 'change_this_to_your_email_password'
 " > config.py

 cd ../../..

Usage

Execute scraping (results will be available in db)
```
cd devBG/devBG

scrapy crawl job
```
Extract results to csv or json (if needed)

Email will be sent on each execution
```
scrapy crawl job -O results.csv

or

scrapy crawl job -O results.json
```

Schedule

Install cron if not present
Set cron task
- Open cron task scheduler
```
crontab -e
```
- Set runtime
  
  You can generate time code HERE
```
* * * * * sh /path/to/script/autorun.sh >> autorun.log
```
- Check if the task is set
```
crontab -l
```
Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
devBG		devBG
.gitignore		.gitignore
README.md		README.md
autorun.sh		autorun.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DevBG Python Jobs Scraper

Overview

Setup on Linux

Usage

Schedule

About

Releases

Packages

Languages

ivo-ignatov/scrapers

Folders and files

Latest commit

History

Repository files navigation

DevBG Python Jobs Scraper

Overview

Setup on Linux

Usage

Schedule

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages