Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
What is this: ============ This is an example showing how to use parallel matrix factorization using Graphlab's command line tools. Objective: ========= We want to build a recommendation engine that recommends items to users, but we have a very large number of items where traditional approaches do not scale. Solution: ========= Parallel Matrix Factorization. (Alternating Least Squares) A technique used to condense a large sparse matrix into a much smaller one while still preserving the original. Sparse in this case means not many users have rated items We are essentially saying: "Can we take this big/huge matrix and make some smaller ones that approximate the original" Think File Compression. We want a smaller file...it uses less resources and is easier to work with. Don't Think File Compression because we can't get back to the original matrix, we only have an approximation. We will be using a custom built command line tool that is packaged with Graphlab library for matrix factorization called pmf. Inputs: A large sparse (many user,item pairs but few rankings) matrix in matrix market format. USER<TAB>ITEM<TAB>RANK Additional Information with Examples http://bickson.blogspot.com/search/label/Matrix%20Market Parameters: --scheduler - this is how tasks will be computed this give us parallelism and control over how to compute things round_robin -- cycle over users and items max_iterations -- number of times block_size -- should be 1 --matrix-market - tell graphlab that we are giving it a matrix in 'matrix market format' so it knows how to read it --lambda - this is a weight that will be applied to your matrix vectors, optimal value is found through trial and error. prevents overfitting. --npus - the number of cpus/cores to use in the computation --D - the demension of the factorized matricies. too small results in fast run time but less accuracy. Too Large results in slow runtime but more accurate. Typical values are 20 -> 50. --maxval - maximum value of allowed rating --minval - minimum value of allowed rating Outputs: The output are two matrices U and V. In matrix U the rows are feature vectors for users where the first row is user 1, second user 2 ... In matrix V the rows are feature vectors for movies where the first row is movie 1, second movie 2 ... To get a prediction for user 1 movie 1 simply do a dot product on the first two rows of the matrix files U and V. Value: ====== We now have much smaller matricies to work with and build recommendations from. Large Scale Problem on a Single Compter -- efficient use of resources (Multicore) Fast Computation -- Results faster and at scale cost:latency ratio is low Additional resources: GraphLab pmf website: http://graphlab.org/pmf.html Matrix market format description incuding examples: http://bickson.blogspot.com/search/label/Matrix%20Market Danny Bickson has a great blog discussing large scale machine learning with some great examples, he is a principal scientist in CMUs Machine Learning Department and a big contributor to the Graphlab project. http://bickson.blogspot.com