Skip to content

Latest commit

 

History

History
231 lines (156 loc) · 12.2 KB

README.md

File metadata and controls

231 lines (156 loc) · 12.2 KB

Overview

This is a Java library for interfacing with the Sugestio recommendation service. Data is submitted as XML. Users, items and consumptions can be transmitted in bulk. The library makes use of the Concurrency API and the Jersey RESTful web service client. A zip file with JAR and all dependencies can be downloaded here (JDK <= 8) or here (JDK 9-15).

About Sugestio

Sugestio is a scalable and fault tolerant service that now brings the power of web personalisation to all developers. The RESTful web service provides an easy to use interface and a set of developer libraries that enable you to enrich your content portals, E-Commerce sites and other content based websites.

Access credentials and the Sandbox

To access the Sugestio service, you need an account name and a secret key. To run the examples from the tutorial, you can use the following credentials:

  • account name: sandbox
  • secret key: demo

The Sandbox is a read-only account. You can use these credentials to experiment with the service. The Sandbox can give personal recommendations for users 1 through 5, and similar items for items 1 through 5.

When you are ready to work with real data, you may apply for a developer account through the Sugestio website.

About this library

Features

The following API features are implemented:

  • get personalized recommendations for a given user
  • get items that are similar to a given item
  • advanced recommendation filters
  • (bulk) submit user activity (consumptions): clicks, purchases, ratings, ...
  • (bulk) submit item metadata: description, location, tags, categories, ...
  • (bulk) submit user metadata: gender, location, birthday, ...
  • (bulk) delete consumptions, user metadata, item metadata
  • get performance data (analytics): precision, recall, ...

Requirements

This library is based on Jersey -- an Open Source implementation of the JAX-RS 1.1 (JSR 311) API for building RESTful web services. The most recent version of the client requires Jersey 1.1.5.1 and the Jersey OAuth filter. All dependencies can be found inside this zip file.

Tutorial and sample code

This tutorial first explains how you can use the basic functions of the SugestioClient to easily integrate the recommendation service into your existing work flow. In the advanced tutorial, we create a new application from scratch that leverages the more advanced, multi-threaded functions of the client to quickly import a large data set.

Basic tutorial

For this basic tutorial, we use the example of a library that wants to recommend books to their members. The library already has an IT system for managing their book catalog and their member database. In this tutorial, we add recommendation service calls at various points in the existing work flow. Whenever a new book is added to the catalog, we submit its metadata to the recommendation service. We also provide the service with some demographic information on our members, such as age and gender, that can be helpful in determining what kind of books they like. Finally, we log which books are loaned by each member, so the service can get to know their actual tastes.

We start by creating a new instance of the SugestioClient class. For this basic tutorial, we use the constructor that takes two arguments -- the account key and the secret key.

SugestioClient client = new SugestioClient("sandbox", "demo");

Now we are ready to submit data to the service. Let's begin by submitting the metadata of a book. The Item constructor takes a single argument -- a unique identifier for this book. Generally speaking, if you use a relational database system such as MySQL to store your application data, this identifier can be the auto-generated primary key from your Items table. In the specific case of our library application, the ISBN number of the book is a perfect candidate for a unique identifier.

Item book = new Item("0151446474");

Additional information like keywords, author and genre can all be useful when the recommendation service tries to determine whether this book will match a person's interests.

book.addTag("history");
book.addTag("murder");
book.addTag("mystery");
book.addTag("whodunnit");	
book.addCreator("Eco, Umberto");	
book.addCategory("Fiction");

Finally, we submit the book to the service and we shut down the client library.

client.addItem(book);
client.shutdown();

We also keep some basic demographic information on our members. Like the Item class from before, the User constructor takes a single argument -- a unique identifier for this library member. Here we use a numerical ID that might have been automatically generated in the SQL backend of the library IT system. For the Gender field, we can make use of an Enumeration type.

User member = new User("1407");
member.setGender(User.Gender.M);
member.setBirthday(1975, 7, 12);

SugestioClient client = new SugestioClient("sandbox", "demo");
client.addUser(member);
client.shutdown();

Finally, we keep a record of loans. This type of object takes two arguments in its constructor, the member ID and the book ID. Let's say user 1407 is checking out "The Name of the Rose" which had ISBN 0151446474:

Consumption loan = new Consumption("1407", "0151446474");

SugestioClient client = new SugestioClient("sandbox", "demo");
client.addConsumption(loan);
client.shutdown();

Recommending new books to user 1407 is easy. Each recommendation object has an Item ID associated with it. In our case, this ID is the ISBN number.

SugestioClient client = new SugestioClient("sandbox", "demo");
List<Recommendation> recommendations = client.getRecommendations("1407");

printLine("You might also like...");

for (Recommendation recommendation : recommendations) {
	Book book = recommendation.getItem();
	printLine(book.getTitle());
}

client.shutdown();

Advanced tutorial

The recommendation service can receive user, item and consumption data in bulk. The load-balanced setup is also able to process many requests simultaneously. The Java client takes advantage of this by using multiple threads for uploading data. In this tutorial, we import the MovieLens data set containing 100,000 ratings.

The file u.data contains 100,000 lines of text, each containing four tab-separated fields: User ID | movie ID | a five star rating | a timestamp.

Let's create a method parseLine() that takes such a text line as input and converts it into a Consumption object. Here, we use two additional properties of the Consumption class. The Type property tells us what kind of interaction there was between the user and the item. In this case, the user gave a rating. Like the Gender property of the User class, we use an Enumeration as the argument. The Detail property tells us more about the rating system so that the recommendation algorithm can properly interpret and normalize the value. Here we have a five star rating system, and the user gave a rating between 1 and 5.

private Consumption parseLine(String line) throws Exception {
	
	// line structure: userid (tab) movieid (tab) rating (tab) timestamp
	String[] parts = line.split("\t");
	
	Consumption consumption = new Consumption(parts[0], parts[1]);
	consumption.setType(Consumption.Type.RATING);
	
	Double value = Double.parseDouble(parts[2]);
	consumption.setDetail(new StarRating(5, 1, value));
	
	return consumption;
}

Now, let's look at the rest of the program. This time, we create a SugestioClient object using the four-argument constructor. We have a lot of data to transmit, so we submit 500 consumptions in each web service request. We submit a maximum of 10 requests concurrently. When we use the addConsumptions() method of the SugestioClient, it transparently divides the list of consumptions into blocks of 500. A maximum of 10 such blocks are submitted concurrently. For example, if we pass a list containing 10,000 consumptions to addConsumptions, a total of 20 web service requests will be made, but no more than ten at the same time. If we pass it a list containing just 100 consumptions, only a single service call is made.

public void importMovieLens() throws Exception {
	
	// 500 consumptions per requests, 10 threads
	SugestioClient client = new SugestioClient("sandbox", "demo", 500, 10);

	List<Consumption> buffer = new ArrayList<Consumption>();
	BufferedReader br = new BufferedReader(new FileReader("D:/data/ml-data_0/u.data"));
	String line = null;
	
	while ((line = br.readLine()) != null) {

		Consumption c = parseLine(line);
		buffer.add(consumption);
		
		if (buffer.size() == 10,000) {
			client.addConsumptions(buffer); // 10,000 / 500 = 20 requests
			buffer.clear();
		}
	}

	// transmit any remaining data
	if (buffer.size() > 0) {
		client.addConsumptions(buffer);
		buffer.clear();
	}

	client.shutdown();
}

The RecommendationFilter is a helper class for restricting the recommendations according to a number of parameters. The code fragment below illustrates how to retrieve only the top five recommendations for user 1 that belong to category A, but not category B.

public List<Recommendation> getRecommendations() throws Exception {	
	String userId = "1";		
	RecommendationFilter filter = new RecommendationFilter();
	filter.setLimit(5);
	filter.inCategory("A", true);
	filter.inCategory("B", false);		
	return this.client.getRecommendations(userId, filter);
}

Error handling

Single object submissions

Successfully submitting a single user, item or consumption returns a SugestioResult object. This object contains the HTTP verb, the resource URL, the HTTP status code, and any human-readable text message. A printReport() method is provided for convenience. If there was a problem with the request, a SugestioException is raised. Consider the following example code:

try {

	Consumption c1 = new Consumption("u1", "i1");
	Consumption c2 = new Consumption("u1", null);
	SugestioResult<String> result;
	
	result = client.addConsumption(c1);			
	result.printReport();
	result = client.addConsumption(c2);			
	result.printReport();
	
} catch (SugestioException e) {
	e.getSugestioResult().printReport();
}

Consumption 1 will be submitted successfully. The output will be as follows:

POST http://api.sugestio.com/sites/sandbox/consumptions.xml
	202 Accepted

Consumption 2 is missing a valid itemId, and an exception will be raised. The output is as follows:

POST http://api.sugestio.com/sites/sandbox/consumptions.xml
	Client side problem:
	400 Bad Request: Submitted consumption data is missing required attribute itemid.

Server side problems (with a HTTP status code in the 5xx range) are caught the same way.

Bulk submissions

Consumption data will often be submitted in bulk for increased throughput. Bulk submissions do not raise an exception, because the submission may be split up into (e.g.) 10 requests, of which 9 could be successful and one could fail due to a client or server side issue. Rather, it is up to the developer to inspect the result object and take the appropriate action. Here, the addConsumptions method returns a map in which each entry is represented by a sublist of consumptions and the associated web service response.

List<Consumption> consumptions = new ArrayList<Consumption>();
// ... populate the consumptions array

Map<List<Consumption>, SugestioResult<String>> results;
results = client.addConsumptions(consumptions);

for (Map.Entry<List<Consumption>, SugestioResult<String>> entry : results.entrySet()) {
	if (!entry.getValue().isOK()) {			
		System.err.println("Failed to submit these consumptions: ");
		for (Consumption c : entry.getKey()) {
			System.err.println("\t" + c.getId());
		}
	}
}