Skip to content

Graph-based recommender for positive only feedback dataset using Gremlin graph query language

Notifications You must be signed in to change notification settings

faberchri/Graph-based-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

Graph-based Recommender

Graph-based recommender for BBC positive only feedback dataset using Gremlin graph query language. See the excellent blog post of M. Rodriguez to get an idea of the basic principles of a graph-based recommender system.

How-To-Run

  1. Download Titan Graph Database v0.4.4 (all): http://thinkaurelius.github.io/titan/

  2. Set the Java VM args for groovy/gremlin. The complete BBC 6.7 Mio ratings dataset needs at least 35Gb (!) of RAM. If you want to serialize the graph to disk you also need to increase the stack size. Hence:

    JAVA_OPTIONS="-Xms1g -Xmx35g -Xss100m"
    export JAVA_OPTIONS
  3. Pass main.groovy to gremlin.sh in script mode (-e):

    titan-all-0.4.4/bin/gremlin.sh -e path/to/Graph-based-recommender/main.groovy -p dir/for/optional/graph/serialization -u some,user,ids,to,process path/to/BBC/dataset path/to/output/directory

Graph Structure

Graphs queried for recommendations have the structure depicted below.

  • Vertices in red: User
  • Vertices in cyan: Shows
  • Vertices in purple: Show Attributes (e.g. 'genres/comedy' or 'service/bbcfour')
  • User -> Show edge: 'watched'
  • Show -> Attribute edge: 'show has attribute'

If a user watches the same show more than once the graph contains multiple 'watched' edges between the user and the show.

Hard-coded Sample Graph

Recommendation Strategies

The recommendations are determined by the implementation of IRecommendationStrategy:

interface IRecommendationStrategy{
	List<Vertex> recommendShowsToUser(Vertex user);
}
  • RankedCollaborativeFilteringRecommendationStrategy: Pure collaborative filtering. Recommended are shows that have not been watched by the current user but by users that watched at least one of the shows that the current user watched. The retrieved shows are ranked by the number of shortest paths (length == 3) from the current user to the new shows.

    class RankedCollaborativeFilteringRecommendationStrategy implements IRecommendationStrategy{
        List recommendShowsToUser(Vertex user){
            // find all path from current user to new shows in one other user distance, count number of paths to new show, order descending by counts 
            def showsWatchedByCurrentUser = [];
            // gremlin code follows
            return user.out('watched').aggregate(showsWatchedByCurrentUser).in('watched').out('watched').except(showsWatchedByCurrentUser).groupCount.cap.orderMap(T.decr).toList();
        }
    }

    If we apply this strategy on the sample graph with:

    titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

    We get:

    Your input arguments: [-t, -u, U1,U6,U12, ., .]
    -- Test graph loaded: tinkergraph[vertices:24 edges:36] --
    -- Some basic graph properties:
    ---- Number of vertices: 24
    ---- Number of vertices of type 'Show': 7
    ---- Number of vertices of type 'User': 14
    ---- Number of vertices of type 'Attribute': 3
    ---- Number of edges: 36
    ---- Number of edges of type 'hasAttribute': 11
    ---- Number of edges of type 'watched': 25
    ---------------------------------
    Calculating recommendations for 3 users
    -- Start recommendation calculation --
    -- Start recommendation calculation for user U1 (1/3) --
    ### Watched shows (5): ###
    S6 - <NA> - <NA>
    S7 - <NA> - <NA>
    S6 - <NA> - <NA>
    S1 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (3, best first): ###
    1 - S5 - <NA> - <NA>
    2 - S2 - <NA> - <NA>
    3 - S3 - <NA> - <NA>
    -- Recommendation calculation for user U1 completed (1/3) --
    -- Start recommendation calculation for user U6 (2/3) --
    ### Watched shows (1): ###
    S5 - <NA> - <NA>
    ### Recommendations (1, best first): ###
    1 - S6 - <NA> - <NA>
    -- Recommendation calculation for user U6 completed (2/3) --
    -- Start recommendation calculation for user U12 (3/3) --
    ### Watched shows (3): ###
    S2 - <NA> - <NA>
    S3 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (2, best first): ###
    1 - S6 - <NA> - <NA>
    2 - S7 - <NA> - <NA>
    -- Recommendation calculation for user U12 completed (3/3) --
    -- Recommendation calculation completed --
    Calculating recommendations for 3 users took 0.153 seconds (0.051 s per user).
    Application terminated without exceptions.
  • ItemAttributesConsideringRankedCollaborativeFilteringRecommendationStrategy: Collaborative filtering with mixed in metadata / attributes similarity. Recommended are shows that have not been watched by the current user but by users that watched at least one of the shows that the current user watched OR shows that have a common attribute. The retrieved shows are ranked by the number of shortest paths (length == 3) from the current user to the new shows, either passing through an other user or a shared attribute.

    class ItemAttributesConsideringRankedCollaborativeFilteringRecommendationStrategy implements IRecommendationStrategy{
        List recommendShowsToUser(Vertex user){
            // find all path from current user to new shows in one other user or an attribute distance, count number of paths to new show, order descending by counts 
            def showsWatchedByCurrentUser = [];
            return user.out('watched').aggregate(showsWatchedByCurrentUser).both('watched','hasAttribute').both('watched', 'hasAttribute').except(showsWatchedByCurrentUser).groupCount.cap.orderMap(T.decr).toList();
        }
    }

    If we apply this strategy on the sample graph with:

    titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

    We get:

    Your input arguments: [-t, -u, U1,U6,U12, ., .]
    -- Test graph loaded: tinkergraph[vertices:24 edges:36] --
    -- Some basic graph properties:
    ---- Number of vertices: 24
    ---- Number of vertices of type 'Show': 7
    ---- Number of vertices of type 'User': 14
    ---- Number of vertices of type 'Attribute': 3
    ---- Number of edges: 36
    ---- Number of edges of type 'hasAttribute': 11
    ---- Number of edges of type 'watched': 25
    ---------------------------------
    Calculating recommendations for 3 users
    -- Start recommendation calculation --
    -- Start recommendation calculation for user U1 (1/3) --
    ### Watched shows (5): ###
    S6 - <NA> - <NA>
    S7 - <NA> - <NA>
    S6 - <NA> - <NA>
    S1 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (4, best first): ###
    1 - S5 - <NA> - <NA>
    2 - S2 - <NA> - <NA>
    3 - S3 - <NA> - <NA>
    4 - S4 - <NA> - <NA>
    -- Recommendation calculation for user U1 completed (1/3) --
    -- Start recommendation calculation for user U6 (2/3) --
    ### Watched shows (1): ###
    S5 - <NA> - <NA>
    ### Recommendations (5, best first): ###
    1 - S6 - <NA> - <NA>
    2 - S7 - <NA> - <NA>
    3 - S3 - <NA> - <NA>
    4 - S4 - <NA> - <NA>
    5 - S2 - <NA> - <NA>
    -- Recommendation calculation for user U6 completed (2/3) --
    -- Start recommendation calculation for user U12 (3/3) --
    ### Watched shows (3): ###
    S2 - <NA> - <NA>
    S3 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (4, best first): ###
    1 - S7 - <NA> - <NA>
    2 - S6 - <NA> - <NA>
    3 - S4 - <NA> - <NA>
    4 - S5 - <NA> - <NA>
    -- Recommendation calculation for user U12 completed (3/3) --
    -- Recommendation calculation completed --
    Calculating recommendations for 3 users took 0.156 seconds (0.052 s per user).
    Application terminated without exceptions.
  • RankedItemAttributesRecommendationStrategy: Item attributes / metadata only based recommendations.

    class RankedItemAttributesRecommendationStrategy implements IRecommendationStrategy{
        List recommendShowsToUser(Vertex user){
            // get attributes of all users shows, get unwatched shows with same attributes, rank shows by number of paths to the show and append to end of result list, loop starting from previously found shows
            def l=[]; def a=[]; def s=[];
            user.out('watched').as('x').except(s).aggregate(s).out('hasAttribute').except(a).aggregate(a).in('hasAttribute').except(s).gather{def counts=it.countBy{it}; counts=counts.sort{i, j -> j.value <=> i.value}; l.addAll(counts.keySet()); return it}.scatter.loop('x'){it.loops < 20}.iterate();
            return l
        }
    }

    If we apply this strategy on the sample graph with:

    titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

    We get:

    Your input arguments: [-t, -u, U1,U6,U12, ., .]
    -- Test graph loaded: tinkergraph[vertices:24 edges:36] --
    -- Some basic graph properties:
    ---- Number of vertices: 24
    ---- Number of vertices of type 'Show': 7
    ---- Number of vertices of type 'User': 14
    ---- Number of vertices of type 'Attribute': 3
    ---- Number of edges: 36
    ---- Number of edges of type 'hasAttribute': 11
    ---- Number of edges of type 'watched': 25
    ---------------------------------
    Calculating recommendations for 3 users
    -- Start recommendation calculation --
    -- Start recommendation calculation for user U1 (1/3) --
    ### Watched shows (5): ###
    S6 - <NA> - <NA>
    S7 - <NA> - <NA>
    S6 - <NA> - <NA>
    S1 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (4, best first): ###
    1 - S5 - <NA> - <NA>
    2 - S2 - <NA> - <NA>
    3 - S3 - <NA> - <NA>
    4 - S4 - <NA> - <NA>
    -- Recommendation calculation for user U1 completed (1/3) --
    -- Start recommendation calculation for user U6 (2/3) --
    ### Watched shows (1): ###
    S5 - <NA> - <NA>
    ### Recommendations (5, best first): ###
    1 - S6 - <NA> - <NA>
    2 - S7 - <NA> - <NA>
    3 - S3 - <NA> - <NA>
    4 - S4 - <NA> - <NA>
    5 - S2 - <NA> - <NA>
    -- Recommendation calculation for user U6 completed (2/3) --
    -- Start recommendation calculation for user U12 (3/3) --
    ### Watched shows (3): ###
    S2 - <NA> - <NA>
    S3 - <NA> - <NA>
    S1 - <NA> - <NA>
    ### Recommendations (4, best first): ###
    1 - S7 - <NA> - <NA>
    2 - S6 - <NA> - <NA>
    3 - S4 - <NA> - <NA>
    4 - S5 - <NA> - <NA>
    -- Recommendation calculation for user U12 completed (3/3) --
    -- Recommendation calculation completed --
    Calculating recommendations for 3 users took 0.177 seconds (0.059 s per user).
    Application terminated without exceptions.

About

Graph-based recommender for positive only feedback dataset using Gremlin graph query language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages