-
Notifications
You must be signed in to change notification settings - Fork 5
Configuring Automatic Data Retrieval
BookSwap uses cron tasks to automatically populate its database with course and textbook information from the following sources:
- Campus bookstore websites
- Amazon Product Advertising API
Follow the steps below to configure these data-retrieval operations for your site.
BookSwap currently supports retrieving courses, textbook requirements, and bookstore prices from Barnes & Noble College (BNCollege) bookstore websites only. (Instructions for retrieving this data from other types of websites will be added at a later date.)
BNCollege websites provide dropdown menus for searching for books by term, department, course, etc. BookSwap retrieves course data in JSON format from the same endpoints that are used to populate these dropdowns. Textbook requirements are then retrieved by scraping the corresponding search results.
Bookstore data-retrieval is handled by the update_bookstore_data()
method of the Cron controller (application/controllers/cron.php). We recommend using the following cron task to execute this method once per minute:
* * * * * CI_ENV="production" php /path/to/index.php cron update_bookstore_data > /dev/null
The bn_college.php configuration file defines the parameters that allow BookSwap to retrieve data for your particular campus:
-
subdomain
is the subdomain of your BNCollege website -
store_id
is the storeId URL parameter used in BNCollege requests -
campus_id
is the campusId URL parameter used in BNCollege requests
You can find the correct values for these parameters by examining your browser's network traffic while interacting with your BNCollege website's search dropdowns. Look for an XHR request that looks like this:
http://example.bncollege.com/webapp/wcs/stores/servlet/SomeResource?campusId=12345012&storeId=12345 ...
For the example above, the correct configuration parameters would be as follows:
$config['subdomain'] = 'example';
$config['store_id'] = '12345';
$config['campus_id'] = '12345012';
The frequency by which courses and textbook requirements are updated is controlled by two configuration parameters in bookswap.php:
-
bookstore_data_ttl
defines the number of seconds between when the last scrape of the bookstore website finishes and the next scrape begins. -
bookstore_requests_per_minute
defines the number of requests to make to the BNCollege website each time theupdate_bookstore_data()
cron task is run.
NOTE: Please be considerate of the BNCollege servers when configuring the bookstore_requests_per_minute
parameter. It is possible for your server's IP address to be blocked by BNCollege if you set this parameter too high!
BookSwap retrieves textbook metadata (e.g. edition, publication date, etc.), cover images, and Amazon prices from the Amazon Product Advertising API.
Amazon data retrieval requires two different accounts:
Create a new access key using the AWS Management Console. Then update the following configuration parameters in amazon_api.php:
-
access_key_id
: Your Access Key ID -
secret_access_key
: Your Secret Access Key -
associate_tag
: Your Associate Tag (from your Amazon Associate Account)
All three parameters are must be specified for Amazon data retrieval to function correctly.
Amazon data-retrieval is handled by the update_amazon_data()
method of the Cron controller (application/controllers/cron.php). We recommend executing this method once every few minutes. For example, the cron task below will execute this method every 2 minutes:
*/2 * * * * CI_ENV="production" php /path/to/index.php cron update_amazon_data > /dev/null
The amazon_requests_per_cron
configuration parameter in bookswap.php defines the number of requests to make to the Amazon Product Advertising API each time the update_amazon_data()
cron task is run.
Each request updates 10 books in the database, and BookSwap will automatically throttle requests to 1 per second to accommodate Amazon's request limits.