Skip to content
'People who downloaded this paper also downloaded...'
Python JavaScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src
.gitignore
README.markdown
nginx.conf

README.markdown

Generates 'people who read this also read...'-style recommendations based on the Jaccard similarity between their readership sets. This code scales to large (1B rows) datasets in limited memory by using external sorting and locality sensitive hashing. Built for Springer.

Installation (on Ubuntu 11.04)

Install dependencies

sudo apt-get install build-essential python python-dev python-pip ipython git-core
sudo pip install ujson

Setup springer-recommendations

git clone git://github.com/jamii/springer-recommendations.git
sudo mkdir -p /mnt/var/springer-recommendations
sudo chown $USER:$USER /mnt/var/springer-recommendations

Operation

The recommendations engine reads a newline-separated list of input filenames on stdin and prints a list of recommendations on stdout.

Each input file should contain a newline-separated list of json-encoded [user, doi] pairs. The user field may be any unique string eg ip address or session id.

["1yud2mlgpalm2cqeyyz0o44n","10.1007\/s10526-004-6592-1"]
["q4lprrkmbr3gpvosjao0dzwm","10.1007\/978-3-540-69934-7_13"]
["3jc2hnohgreyhvlurpg3m1sn","10.1007\/978-3-8348-8229-5_14"]
["uigkldnerjvgghxvjp2ptm0i","10.1007\/s00125-009-1355-2"]
["mmnqkjwawkcz4tqjcxfam4jz","10.1007\/978-3-8274-2313-9_4"]
["e3ie31mmad2epuxno1gpidmx","10.1007\/s10549-012-2108-3"]
["1adokad3mbbg0aaexcl1yb3a","10.1007\/978-3-8349-6622-3_5"]
["fzfrjqgnizgprfxstcal12fu","10.1007\/978-3-7643-8777-8_1"]
["ihcnriijo040rchrgbytvlpg","10.1007\/BF00309663"]

For each DOI in the logs, the output contains a line of related DOIs and their Jaccard similarity to the first DOI.

["10.2478\/s11532-009-0129-5",[["10.1007\/978-1-61737-985-7_11",0.24],["10.1007\/BF01011432",0.56],["10.1007\/BF01524716",0.11],["10.1007\/BF02458601",0.87],["10.1007\/s002140050205",0.97]]]
["10.2478\/s11532-010-0087-y",[["10.1007\/BF02660070",1.0],["10.1007\/BF02988680",1.0],["10.1007\/s00709-010-0225-6",1.0],["10.1007\/s00709-010-0233-6",1.0],["10.1023\/A:1022137619834",1.0]]]
["10.2478\/s11534-010-0072-2",[["10.2478\/s11534-011-0014-7",1.0]]]
["10.2478\/s11534-011-0014-7",[["10.2478\/s11534-010-0072-2",1.0]]]
["10.2478\/s11535-011-0006-z",[["10.1007\/BF02532915",1.0],["10.1023\/A:1013623806248",1.0],["10.1134\/S1019331608020019",1.0]]]

Example usage:

find /mnt/var/springer-recommendations/logs-*.json | nohup python springer-recommendations/src/recommendations.py > recommendations.json 2> recommendations.log &
Something went wrong with that request. Please try again.