Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
web
 
 
 
 
 
 

Web>CSV Scrape Bot

Basic useful feature list:

  • Scrapes a specific page for data by cURL
  • Uses a function to strip data based on before and after pairs.
  • Saves the values to a local CSV file.
  • Echoes the current timestamp and current data values
  • Ready for starting from cli
  • Can be set up easily as an intervaled-cronjob

Single execute from CLI

You van start the script up directly from command line by issuing php -f index.php. This will execute the script once and add one new value to the CSV. Your PHP install must have PHP-(cli) installed. You can issue php -v to get this information.

Additionally, you must check the privileges on our files - chmod 755 index.php

Cron Job as a timer

Cron Jobs are a really easy way to handle timed or daemon tasks like this on server. We can set up a Cron, give it our shell script path as a parameter, and set up the timer.

EDITOR="nano" crontab -e # Open the cronTab in nano editor
*/10 * * * * /usr/bin/somedirectory/shell # Add the values

This Job here will start every 10 minutes. You can read more about setting up a Cron Job from here: AskUbuntu Q&A

As with the single execute, if not starting the job as root, you must check the privileges on our files - chmod 755 index.php

Stuff used to make this:

About

Set of PHP Cron Jobs for loosely timed scraping specific content off webpages and popular Social Networks.

Resources

Releases

No releases published

Packages

No packages published