Skip to content

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also .

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also .
  • 2 commits
  • 2 files changed
  • 0 commit comments
  • 1 contributor
Showing with 17 additions and 90 deletions.
  1. +13 −86 README.markdown
  2. +4 −4 src/preview.py
View
99 README.markdown
@@ -2,38 +2,7 @@
Install dependencies
- sudo apt-get install build-essential erlang erlang-dev python python-dev ipython python-pymongo nginx git-core mongodb autoconf automake libtool pkg-config
-
-Raise limit on number of file descriptors (under large workloads disco runs out)
-
- sudo bash -c "echo 'fs.file-max = 1000000' >> /etc/sysctl.conf"
- sysctl -p
- sudo bash -c "echo '
- * soft nofile 800000
- * hard nofile 800000
- root soft nofile 800000
- root hard nofile 800000
- ' >> /etc/security/limits.conf"
- sudo reboot now
-
-Build disco (with my changes)
-
- git clone git://github.com/jamii/disco.git
- cd disco
- git checkout -b deploy origin/deploy
- make
- sudo make install
- sudo make install-core
-
-Setup disco
-
- ssh-keygen # choose default options
- cat .ssh/id_rsa.pub >> .ssh/authorized_keys
- sudo disco start
- # in disco web config (localhost:8989) click configure and:
- # change node name from 'localhost' to hostname
- # change number of workers to 2 * number of cores
- # click status and you should now see that the background of the hostname has changed from red to black
+ sudo apt-get install build-essential python python-dev ipython python-pymongo git-core mongodb autoconf automake libtool pkg-config
Setup py-leveldb
@@ -59,75 +28,33 @@ Setup mongodb
sudo chown mongodb:mongodb /mnt/var/log/mongodb
sudo service mongodb start
-Setup nginx
-
- sudo mkdir /usr/local/nginx # for some reason the ubuntu package does not create this
- sudo cp nginx.conf /etc/nginx/sites-available/springer-recommendations.conf
- sudo ln -s /etc/nginx/sites-available/springer-recommendations.conf /etc/nginx/sites-enabled/springer-recommendations.conf
- sudo rm /etc/nginx/sites-enabled/default
- sudo service nginx restart
-
(cron setup will be here)
# Operation
-The recommender uses dumps of the mongodb log database, stored in the disco distributed filesystem. DDFS is a tag based filesystem - each file can have multiple tags and each tag can refer to multiple files. Tags consist of ':' separated alphanumeric strings eg 'live:downloads', 'test:regression:downloads'
-
-To add a log file to ddfs under the tag 'live:downloads':
+The recommender pulls log records from mongodb and stores the results in leveldb.
- sudo ddfs chunk live:downloads ./log_file
+To run the recommender:
-To remove all log files under the tag 'live:downloads':
+ ./build --db_name=Mongo3-backup --collection_name=LogsRaw --build_name=live
- sudo ddfs rm live:downloads
+The build_name determines where intermediate data and results are stored. Successive calls to build with the same build_name will reuse the intermediate data where possible. To help reuse, specify a start_date. All logs which were inserted before the start_date will be assumed to have been handled in a previous build.
-To build the download histograms and recommendations using all files under the tag 'live:downloads':
+ ./build --db_name=Mongo3-backup --collection_name=LogsRaw --build_name=live --start_date=2012-02-18
- cd springer-recommendations/src
- nohup python -c 'import main; main.build_all(input=["live:downloads"], build_name="live")'
- # nginx link will be here
-
-You can watch the progress in the disco web config.
-
-Results are written to /mnt/var/springer-recommendations. The build name determines the directory the results will be stored in and the mongodb database that will be used for intermediate results. This makes it easy to run tests without messing with production data.
+Results are written to /mnt/var/springer-recommendations/$build_name.
# Testing
Unit tests compare results for a simple dataset to hand-solved results:
- springer-recommendations/src
- python -c 'import test; test.unit_base()'
+ ./test-unit
Regression tests are used to catch changes in output between different versions of the code or different system configurations eg
- sudo ddfs chunk test:downloads ./some_small_log_file
- cd springer-recommendations/src
+ cd springer-recommendations
git checkout master
- python -c 'import main; main.build_all(input=["test:downloads"], build_name="regression-master")'
- git checkout some_branch
- python -c 'import main; main.build_all(input=["test:downloads"], build_name="regression-branch")'
- python -c 'import test; test.regression("regression-master", "regression-branch")
-
-The regression test walks the directory trees for each build in parallel and stops at the first difference encountered eg:
-
- Filenames do not match:
- /mnt/var/springer-recommendations/test1/histograms/daily/10.1007/BF00379779
- /mnt/var/springer-recommendations/test2/histograms/daily/10.1007/BF02247133
- Traceback (most recent call last):
- File "<string>", line 1, in <module>
- File "test.py", line 42, in regression
- raise RegressionError()
- test.RegressionError
-
-# API
-
-All results are written to the file-system under their build name and served by nginx. For example, if you wanted results from the 'live' build:
-
- ubuntu@domU-12-31-39-16-CC-12:~$ curl localhost:80/springer-recommendations/live/histograms/monthly/10.1007/s10853-009-4131-2
- {"counts": [["2011-01-01", 1], ["2011-02-01", 1], ["2011-03-01", 1], ["2011-04-01", 2], ["2011-05-01", 3]], "start_date": "2011-01-07", "end_date": "2011-05-19"}
-
- ubuntu@domU-12-31-39-16-CC-12:~$ curl localhost:80/springer-recommendations/live/histograms/daily/10.1007/s10853-009-4131-2
- {"counts": [], "start_date": "2011-12-30", "end_date": "2011-05-19"}
-
- ubuntu@domU-12-31-39-16-CC-12:~$ curl localhost:80/springer-recommendations/live/recommendations/10.1007/s10853-009-4131-2
- [[0.1258741258741259, "10.1007/s10853-010-4213-1"], [0.11538461538461539, "10.1023/B:JMSC.0000048768.52085.63"], [0.11538461538461539, "10.1023/B:JMSC.0000048767.92292.df"], [0.11538461538461539, "10.1023/B:JMSC.0000047544.44078.ca"], [0.11538461538461539, "10.1023/B:JMSC.0000045664.59279.e4"]]
+ ./build --db_name=Mongo3-backup --collection_name=LogsRaw --build_name=test-master
+ git checkout some-branch
+ ./build --db_name=Mongo3-backup --collection_name=LogsRaw --build_name=test-some-branch
+ ./test-regression --old_build=test-master --new_build=test-some-branch
View
8 src/preview.py
@@ -6,7 +6,7 @@
import json
import util
-import mr
+import db
keys = json.load(open('keys'))
@@ -20,7 +20,7 @@ def metadata(doi):
conn.close()
if status == 200:
- meta = util.encode(json.loads(data)) # !!! metadata encoding?
+ meta = json.loads(data)
if meta['records']:
return meta
else:
@@ -41,9 +41,9 @@ def recommendations(build_name, doi):
print title(doi)
print link(doi)
print
- dois = mr.get_result(build_name, 'recommendations', doi)
+ scores = db.SingleValue(build_name, 'scores', 'r').get(doi)
print '-' * 40
- for (score, doi) in dois:
+ for (score, doi) in scores:
print score
print title(doi)
print link(doi)

No commit comments for this range

Something went wrong with that request. Please try again.