Aggregates news data from omgili.com and publishes it to a Redis list.
First, it scans 'feed.omgili.com' for available .zip files to download. Files are downloaded, unzipped, and the contents (news stories in XML format) are pushed to a Redis server list called 'NEWS_XML'. Steps are taken to avoid duplicate data in the server.
To install omgili-aggregator, navigate to the 'omgili-aggregator' directory and run the following command:
bundle install
Then, to run omgili-aggregator, run the following command:
ruby lib/omgili_aggregator.rb
There are a few ways to configure the behavior of omgili-aggregator. In order to do so, modify the 'config.yaml' file in the 'omgili-aggregator' directory.
Possible configurations include:
- Redis server connection settings
- Number of threads the program will use
- Limit the amount of data downloaded from omgili
Please see the 'config.yaml' file for more details and instructions.
When you first run omgili-aggregator, a file will be generated called 'previous_downloads.csv'. This file is compared to all the currently available files found at feed.omgili.com on all future executions. In order to prevent duplicate data from being downloaded and pushed to the Redis server, it is very important not to delete or modify this file in any way.
This library is tested using Minitest. All tests can be run simultaneously from the 'omgili-aggregator' directory by running the following command:
rake
All ruby scripts were analyzed and determined to have no offenses by Rubocop, which enforces many of the guidelines outlined in the community Ruby Style Guide, as well as calculates ABC metrics.
- John Walker