Skip to content
Patrik Schmidt edited this page Nov 12, 2013 · 13 revisions

TODOS

  • do some Tomcat/Maven Quickstart
    • heavily based on TDD
  • finishing testsuites
  • merge Utils and make use of MUID service
  • implement API calls for NormalizedLikeRetrieval

DONE

  • Review implementation of lucene indexing of common preferences (like-button)
  • define API requests for HaveInCommons service
    • later for likebutton in general (if neccessary)
    • see NormalizedLikeRetrieval
  • defining/finalizing LikeButtonAPI
    • clearify unlike function

current objectives

  • define likebutton protocol
  • implement data queries
  • get in touch with current java development style
    • build/deployment
    • TDD

future objectives

  • get Mahout integrated using HBase as data store
    • figuring out if it's either better to controll mahout via POJO or invoking via bash
      • understanding the mahout bash script
    • do some number crunching based on the wikipedia dataset
      • writing simple data integration
      • (sharded) file output
      • get data into hadoop and read by recommender engine
      • perhaps writing the results back to hadoop or eventually into hbase
  • define security tests

notes

lucene

Combining lucene queries with logical AND to build intersections

public static double wikipediaDistance(String term0, String term1) throws ParseException, IOException {

	Query query0 = parser.parse(term0);
	Query query1 = parser.parse(term1);

	BooleanQuery combiQuery0 = new BooleanQuery();
	combiQuery0.add(query0, BooleanClause.Occur.MUST);
	TopDocs results0 = searcher.search(combiQuery0, 1);

	BooleanQuery combiQuery1 = new BooleanQuery();
	combiQuery1.add(query1, BooleanClause.Occur.MUST);
	TopDocs results1 = searcher.search(combiQuery1, 1);

	BooleanQuery query0AND1 = new BooleanQuery();
	query0AND1.add(combiQuery0, BooleanClause.Occur.MUST);
	query0AND1.add(combiQuery1, BooleanClause.Occur.MUST);

	TopDocs results0AND1 = searcher.search(query0AND1, 1);

	if(results0.totalHits < 1 || results1.totalHits < 1|| results0AND1.totalHits < 1) {
		return 0;
	}

	double log0, log1 , logCommon, maxlog, minlog;
	log0 = Math.log(results0.totalHits);
	log1 = Math.log(results1.totalHits);
	logCommon = Math.log(results0AND1.totalHits);
	maxlog = Math.max(log0, log1);
	minlog = Math.min(log0, log1);

	return 1 - 0.5 * (maxlog - logCommon) / (Math.log(reader.numDocs()) - minlog); 

}

mahout

example Data set

http://www.grouplens.org/node/73

interaction

format of rating.dat UserID::MovieID::Rating::Timestamp

alter to following representation userid,itemid,rating

$ bin/mahout recommenditembased --input ratings.dat --usersFile user.dat --numRecommendations 2
--output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION
  • usersFile: users for which you want to calculate recommendations

  • input: linked data between users and items

    DataModel model = new FileDataModel(new File("data.txt")); Recommender recommender = new SlopeOneRecommender(model); Recommender cachingRecommender = new CachingRecommender(recommender);

Hadoop

interaction

import java.io.File;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;

public class HDFSHelloWorld {

  public static final String theFilename = "hello.txt";
  public static final String message = "Hello, world!\n";

  public static void main (String [] args) throws IOException {

    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    Path filenamePath = new Path(theFilename);

    try {
      if (fs.exists(filenamePath)) {
        // remove the file first
        fs.delete(filenamePath);
      }

      FSDataOutputStream out = fs.create(filenamePath);
      out.writeUTF(message;
      out.close();

      FSDataInputStream in = fs.open(filenamePath);
      String messageIn = in.readUTF();
      System.out.print(messageIn);
      in.close();
    } catch (IOException ioe) {
      System.err.println("IOException during operation: " + ioe.toString());
      System.exit(1);
    }
  }
}

Links

http://www.ibm.com/developerworks/java/library/j-mahout/

http://girlincomputerscience.blogspot.de/2010/11/apache-mahout.html

Itembased Collaborative Filtering

Clone this wiki locally