Basic recommender system for venues in Amsterdam, with a time constraint.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
public/stylesheets
routes
views
.gitignore
LICENSE
README.md
app.js
data_sample.csv.gz
install.sh
package.json
post_data_sample.sh
test_response.sh

README.md

ODE-II / WP5 / Recommender

Basic recommender system for venues in Amsterdam, with a time constraint. Setup for use within the ODE II (Dutch) project.

Using node.js server and Redis database.

Setup node.js/redis

To install software and start server on port 8005, use ./install.sh. To test the server response, use ./test_response.sh.

To add a small set of dummy data (88K), use:

gunzip -c data_sample.csv.gz > data_sample.csv
./post_data_sample.sh

To add a large set of dummy data (32MB), use:

wget --no-check-certificate "https://www.dropbox.com/s/8or7yiua66c8r56/AmsterdamCardSimulatedData.csv?dl=1" -O data_sample.csv
./post_data_sample.sh

Tested on CentOS 6.5 and Linux Mint 15 Olivia. It probably requires some default built-tools etc.

Feed Training Data

Assuming the training data is available per user/venue/time, each visit is presented to the recommender individually, as JSON, via POST.

POST item to http://HOST:PORT/train/:

{"item": {
  "user_id": "some id",
  "place_id": "some id",
  "timestamp": "ISO 8601 datetime"
 }}

Both user_id and place_id may not contain a : character.

The recommender assumes that the input data is presented chronologically. Because the data set is most likely a one-time data dump, it is important that the data is ordered by timestamp before posting, with the most recent item the last to be posted. If the data is presented out of order, the system will silently generate less- or non-meaningful recommendations.

Only full dates are considered, disregarding any hour/minute information. The extracted date is given in the response.

It is assumed that each training triple presented is unique and correct. That is, items will not be checked for duplicates, and can not be deleted.

N.B. Relations between venue visits are considered symmetrical. The notion of visiting one after the other is not considered.

Request Recommendation

Recommendations are supplied based on a place_id corresponding to the initial input. The recommendations are returned as an ordered list of place_id value tuples, with the first element the highest ranked. The value determines the order/rank, but no guarantee about its absolute interpretation is given.

GET url http://HOST:PORT/recommend/place_id, will present a JSON object:

{"status": "accept/error",
 "msg": "possible error message",
 "place_id": <requested place_id>
 "places": [
    { "place_id": "some id", "value": somevalue },
    { "place_id": "some id", "value": somevalue },
    ...]
}

Three additional arguments can be supplied to the GET request: ?normalised&names&all.

  • all: Instead of the default first 10 recommendations, reply with all venues in order.
  • names: Add names to the reply, next to the ids.
  • normalised: Rerank the results with normalised values, with respect to the total number of visits per venue.
    • =count: by raw count.
    • =: rerank only the top 25 popular vote.
    • =[X]: rerank the top [X] popular vote.

draft (not implemented): The recommender system assumes a time component by expecting the current time. As the dataset is most likely "historical", an additional argument ?today=<sometimestamp> might be necessary to add. The date is currently hardcoded as 2014-07-03.

Example query+output

This output is generated by ./test_response.sh

Examples of responses for POST to /train/, ran as:
curl -X POST -d $DATA --header "Content-Type: application/json" http://localhost:$PORT/train

Correct Training

Item as JSON:

DATA='{"item":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03T00:00:00Z"}}'
# Response:
{"status":"accept","msg":"","item":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03"}}

Item with extra content:

DATA='{"extra":"extra","item":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03T00:00:00Z","extra":"extra"}}'
# Response:
{"status":"accept","msg":"","item":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03"}}

Incorrect Training

Item without --header "Content-Type: application/json":

DATA='{"extra":"extra","item":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03T00:00:00Z","extra":"extra"}}'
Response: {"status":"error","msg":"No item found."}

Missing item:

DATA='{"item_err":{"user_id":"uid","place_id":"pid","timestamp":"2014-07-03T00:00:00Z"}}'
# Response:
{"status":"error","msg":"No item found."}

Missing user_id:

DATA='{"item":{"user_id_err":"uid","place_id":"pid","timestamp":"2014-07-03T00:00:00Z"}}'
# Response:
{"status":"error","msg":"No item.user_id found."}

Missing place_id:

DATA='{"item":{"user_id":"uid","place_id_err":"pid","timestamp":"2014-07-03T00:00:00Z"}}'
# Response:
{"status":"error","msg":"No item.place_id found."}

Missing timestamp:

DATA='{"item":{"user_id":"uid","place_id":"pid","timestamp_err":"2014-07-03T00:00:00Z"}}'
# Response:
{"status":"error","msg":"No item.timestamp found."}

Incorrect timestamp:

DATA='{"item":{"user_id":"uid","place_id":"pid","timestamp":"2014-13-34T25:00:00Z"}}'
# Response:
{"status":"error","msg":"Unparseable item.timestamp supplied."}

Recommendations

Missing place_id:

curl -X GET http://localhost:8005/
# Response:
{"status":"error","msg":"Please specify a place_id."}

(Adding some more data to test dummy recommendation.)

Recommendation for venue2:

curl -X GET http://localhost:8005/venue2
# Response:
{"status":"accept","msg":"top ten results, descending","place_id":"venue2","places":[{"place_id":"venue3","value":2},{"place_id":"venue1","value":1},{"place_id":"venue4","value":1}]}