A local web app to scrape and manage job listings from configurable search URLs.
- Ruby >=v2.0.0
- Gems:
- bundle
- sinatra
- nokogiri
- sqlite3
- chronic
- whenever
- Modules:
- open-uri
- openssl
- yaml
- json
-
If you dont have the bundle gem installed run
$ gem install bundle
-
Run
$ bundle install
in the base directory to install any gem dependencies you may be missing -
Navigate to the base directory and run
$ whenever --update-crontab
to update your crontab with the automatic scraper (optional) -
Run
$ ruby jobhound.rb
. This will automatically start a local web server and navigate to http://localhost:4567 with JobHound running -
Configure scraping sources by modifying
config
:base_url
- The base url for your scrape source. This is used for rebuilding partial urls.search_url
- The specific search page you wish to scrape listings from.listing_url_regex
- A regex pattern and replacement for the listing url during aggregationdate_posted_regex
- A regex pattern and replacement for the date posted during aggregationentry_css_path
- The CSS selector for indivitual entries. All other paths are relative to this path.url_css_path
- The CSS selector for the full listing URL relative to entry_css_pathtitle_css_path
- The CSS selector for the listing title relative to entry_css_pathsummary_css_path
- The CSS selector for the listing summary relative to entry_css_pathemployer_css_path
- The CSS selector for the listing employer relative to entry_css_pathlocation_css_path
- The CSS selector for the listing location relative to entry_css_pathdate_posted_css_path
- The CSS selector for the listing post date relative to entry_css_path
-
Go to the "Jobs" page and hit the "Scrape Listings" button. This may take a minute depending on how many sources you are scraping from