Skip to content

Configuring Automatic Data Retrieval

Ethan Romba edited this page Jan 4, 2014 · 1 revision

BookSwap uses cron tasks to automatically populate its database with course and textbook information from the following sources:

  • Campus bookstore websites
  • Amazon Product Advertising API

Follow the steps below to configure these data-retrieval operations for your site.

Retrieving Data from Your Bookstore Website

BookSwap currently supports retrieving courses, textbook requirements, and bookstore prices from Barnes & Noble College (BNCollege) bookstore websites only. (Instructions for retrieving this data from other types of websites will be added at a later date.)

BNCollege websites provide dropdown menus for searching for books by term, department, course, etc. BookSwap retrieves course data in JSON format from the same endpoints that are used to populate these dropdowns. Textbook requirements are then retrieved by scraping the corresponding search results.

1. Set up the cron task

Bookstore data-retrieval is handled by the update_bookstore_data() method of the Cron controller (application/controllers/cron.php). We recommend using the following cron task to execute this method once per minute:

* * * * * CI_ENV="production" php /path/to/index.php cron update_bookstore_data > /dev/null

2. Configure your bookstore website parameters

The bn_college.php configuration file defines the parameters that allow BookSwap to retrieve data for your particular campus:

  • subdomain is the subdomain of your BNCollege website
  • store_id is the storeId URL parameter used in BNCollege requests
  • campus_id is the campusId URL parameter used in BNCollege requests

You can find the correct values for these parameters by examining your browser's network traffic while interacting with your BNCollege website's search dropdowns. Look for an XHR request that looks like this:

http://example.bncollege.com/webapp/wcs/stores/servlet/SomeResource?campusId=12345012&storeId=12345 ...

For the example above, the correct configuration parameters would be as follows:

$config['subdomain'] = 'example';
$config['store_id']  = '12345';
$config['campus_id'] = '12345012';

3. Configure the update frequency

The frequency by which courses and textbook requirements are updated is controlled by two configuration parameters in bookswap.php:

  • bookstore_data_ttl defines the number of seconds between when the last scrape of the bookstore website finishes and the next scrape begins.
  • bookstore_requests_per_minute defines the number of requests to make to the BNCollege website each time the update_bookstore_data() cron task is run.

NOTE: Please be considerate of the BNCollege servers when configuring the bookstore_requests_per_minute parameter. It is possible for your server's IP address to be blocked by BNCollege if you set this parameter too high!

Retrieving Data from Amazon

BookSwap retrieves textbook metadata (e.g. edition, publication date, etc.), cover images, and Amazon prices from the Amazon Product Advertising API.

1. Create the necessary Amazon accounts

Amazon data retrieval requires two different accounts:

2. Configure your Amazon API credentials

Create a new access key using the AWS Management Console. Then update the following configuration parameters in amazon_api.php:

  • access_key_id: Your Access Key ID
  • secret_access_key: Your Secret Access Key
  • associate_tag: Your Associate Tag (from your Amazon Associate Account)

All three parameters are must be specified for Amazon data retrieval to function correctly.

3. Set up the cron task

Amazon data-retrieval is handled by the update_amazon_data() method of the Cron controller (application/controllers/cron.php). We recommend executing this method once every few minutes. For example, the cron task below will execute this method every 2 minutes:

*/2 * * * * CI_ENV="production" php /path/to/index.php cron update_amazon_data > /dev/null

4. Configure the update frequency

The amazon_requests_per_cron configuration parameter in bookswap.php defines the number of requests to make to the Amazon Product Advertising API each time the update_amazon_data() cron task is run.

Each request updates 10 books in the database, and BookSwap will automatically throttle requests to 1 per second to accommodate Amazon's request limits.