# Better Recommendations Using Graph Analytics

Recommendations are big business. Amazon reports that 35% of its revenue comes from recommendations. Even more surprisingly, Netflix and YouTube report that 75% and 70% of what people watch on their platforms comes from recommendations. That means the majority of what we buy, watch, or even listen to is shaped by algorithms working quietly in the background.

We will be using Neo4j Graph Analytics for Snowflake to build our recommendations. Graph powered recommendations go deeper than traditional methods because they intuitively model user behavior.

In our example, we will be looking at co-purchasing behavior built off of data sampled from Instakart. We will discover how simply looking at items that are most frequently purchased together isnâ€™t enough to build a good recommendation, and interestingly might cause us to recommend products that customers were already planning on buying without our intervention.

So how do we build a good recommendation engine? What techniques power these systems, and how can you start applying them yourself? Well, follow along to find out!

In [None]:
USE DATABASE RETAIL_RECS;
USE SCHEMA PUBLIC;

First, let's create a co-purchase table to understand what items are currently co-purchased together.

In [None]:
USE ROLE ACCOUNTADMIN;

In [None]:
select * from products

In [None]:
-- One row per unordered product pair that appeared in the same order
CREATE OR REPLACE TABLE COPURCHASE AS
WITH DISTINCT_LINES AS (
  SELECT DISTINCT order_id, product_id
  FROM BASKETS
),
PAIRS AS (
  SELECT
    LEAST(b1.product_id, b2.product_id)  AS product_id_a,
    GREATEST(b1.product_id, b2.product_id) AS product_id_b
  FROM DISTINCT_LINES b1
  JOIN DISTINCT_LINES b2
    ON b1.order_id = b2.order_id
   AND b1.product_id < b2.product_id            -- avoid self & duplicates
)
SELECT
  product_id_a,
  product_id_b,
  COUNT(*)::FLOAT AS co_count                   -- how many orders had both
FROM PAIRS
GROUP BY 1,2;

In [None]:
-- One row per unordered product pair that appeared in the same order
CREATE OR REPLACE TABLE COPURCHASE AS
WITH DISTINCT_LINES AS (
  SELECT DISTINCT order_id, product_id
  FROM BASKETS
),
PAIRS AS (
  SELECT
    LEAST(b1.product_id, b2.product_id)   AS product_id_a,
    GREATEST(b1.product_id, b2.product_id) AS product_id_b
  FROM DISTINCT_LINES b1
  JOIN DISTINCT_LINES b2
    ON b1.order_id = b2.order_id
   AND b1.product_id < b2.product_id   -- avoid self & duplicates
)
SELECT
  p.product_id_a,
  pa.product_name AS product_name_a,
  p.product_id_b,
  pb.product_name AS product_name_b,
  COUNT(*)::FLOAT AS co_count
FROM PAIRS p
JOIN PRODUCTS pa
  ON p.product_id_a = pa.product_id
JOIN PRODUCTS pb
  ON p.product_id_b = pb.product_id
GROUP BY
  p.product_id_a,
  pa.product_name,
  p.product_id_b,
  pb.product_name;

In [None]:
select * from copurchase order by co_count desc;

## Cleaning our Data
Next, we are going to put our data into two tables: one for nodes and one for relationships. We will use these tables later to run a graph algorithm!

In [None]:
-- products as nodes; add numeric properties if useful
CREATE OR REPLACE VIEW PRODUCTS_NODES AS
SELECT
  product_id            AS nodeId
FROM PRODUCTS;

In [None]:
CREATE OR REPLACE TABLE COPURCHASE_EDGES AS
SELECT
  a AS SOURCENODEID,
  b AS TARGETNODEID,
  co_count / NULLIF(pa.cnt + pb.cnt - co_count, 0) AS WEIGHT
FROM (
  SELECT
    LEAST(b1.product_id, b2.product_id)  AS a,
    GREATEST(b1.product_id, b2.product_id) AS b,
    COUNT(*)::FLOAT AS co_count
  FROM (
    SELECT DISTINCT order_id, product_id FROM BASKETS
  ) b1
  JOIN (
    SELECT DISTINCT order_id, product_id FROM BASKETS
  ) b2
    ON b1.order_id = b2.order_id
   AND b1.product_id < b2.product_id
  GROUP BY 1,2
) pc
JOIN (
  SELECT product_id, COUNT(*)::FLOAT AS cnt
  FROM (SELECT DISTINCT order_id, product_id FROM BASKETS)
  GROUP BY 1
) pa ON pa.product_id = pc.a
JOIN (
  SELECT product_id, COUNT(*)::FLOAT AS cnt
  FROM (SELECT DISTINCT order_id, product_id FROM BASKETS)
  GROUP BY 1
) pb ON pb.product_id = pc.b;

## Granting Permissions
Next, we will grant the necessary permissions for our app to run.

In [None]:
-- Create a consumer role for users and admins of the GDS application
CREATE ROLE IF NOT EXISTS gds_user_role;
CREATE ROLE IF NOT EXISTS gds_admin_role;
GRANT APPLICATION ROLE neo4j_graph_analytics.app_user TO ROLE gds_user_role;
GRANT APPLICATION ROLE neo4j_graph_analytics.app_admin TO ROLE gds_admin_role;

CREATE DATABASE ROLE IF NOT EXISTS gds_db_role;
GRANT DATABASE ROLE gds_db_role TO ROLE gds_user_role;
GRANT DATABASE ROLE gds_db_role TO APPLICATION neo4j_graph_analytics;

-- Grant access to consumer data
GRANT USAGE ON DATABASE RETAIL_RECS TO ROLE gds_user_role;
GRANT USAGE ON SCHEMA RETAIL_RECS.PUBLIC TO ROLE gds_user_role;

-- Required to read tabular data into a graph
GRANT SELECT ON ALL TABLES IN DATABASE RETAIL_RECS TO DATABASE ROLE gds_db_role;

-- Ensure the consumer role has access to created tables/views
GRANT ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;
GRANT CREATE TABLE ON SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;
GRANT CREATE VIEW ON SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;
GRANT ALL PRIVILEGES ON FUTURE VIEWS IN SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;
GRANT ALL PRIVILEGES ON ALL VIEWS IN SCHEMA RETAIL_RECS.PUBLIC TO DATABASE ROLE gds_db_role;

-- Compute and warehouse access
GRANT USAGE ON WAREHOUSE NEO4J_GRAPH_ANALYTICS_APP_WAREHOUSE TO APPLICATION neo4j_graph_analytics;

In [None]:
use role gds_role;


## Running Node Similiarity
Next, we will run nodeSimilarity to see which products tend to be purchased together.

In [None]:
CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
  'defaultTablePrefix': 'RETAIL_RECS.PUBLIC',
  'project': {
    'nodeTables': ['PRODUCTS_NODES'],
    'relationshipTables': {
      'COPURCHASE_EDGES': {
        'sourceTable': 'PRODUCTS_NODES',
        'targetTable': 'PRODUCTS_NODES'
      }
    }
  },
  'compute': {
    'mutateProperty': 'score',
    'mutateRelationshipType': 'SIMILAR',
    'topK': 10,
    'similarityMetric': 'JACCARD'
  },
  'write': [{
    'outputTable': 'PRODUCT_SIMILARITY_JACCARD',
    'sourceLabel': 'PRODUCTS_NODES',
    'targetLabel': 'PRODUCTS_NODES',
    'relationshipType': 'SIMILAR',
    'relationshipProperty': 'score'
  }]
});

Next, let's look at the least similar items in our table:

In [None]:
SELECT
  p1.product_name         AS source_product_name,
  p2.product_name         AS target_product_name,
  s.SCORE                 AS similarity_score
FROM PRODUCT_SIMILARITY_JACCARD AS s
JOIN PRODUCTS AS p1
  ON p1.product_id = s.SOURCENODEID
JOIN PRODUCTS AS p2
  ON p2.product_id = s.TARGETNODEID
ORDER BY s.SCORE ASC
LIMIT 5;

A sample basket at checkout: Caulifower, peanut butter cereal, and of course, bananas.

In [None]:
WITH sample_basket AS (
  SELECT *
  FROM PRODUCTS
  WHERE product_id IN (
  34, -- peanut butter cereal
  13176, -- organic bananas
  5618 -- cauliflower
  )
),
similarities AS (
  SELECT
    s.SOURCENODEID AS product_id_source,
    p1.product_name AS product_source_name,
    s.TARGETNODEID AS product_id_target,
    p2.product_name AS product_target_name,
    s.SCORE AS similarity_score
  FROM PRODUCT_SIMILARITY_JACCARD s
  JOIN PRODUCTS p1 ON p1.product_id = s.SOURCENODEID
  JOIN PRODUCTS p2 ON p2.product_id = s.TARGETNODEID
),
ranked AS (
  SELECT
    sb.product_name AS basket_product,
    sim.product_target_name AS similar_product,
    sim.similarity_score,
    ROW_NUMBER() OVER (
      PARTITION BY sb.product_id
      ORDER BY sim.similarity_score DESC
    ) AS rank_num
  FROM sample_basket sb
  JOIN similarities sim
    ON sb.product_id = sim.product_id_source
)
SELECT
  basket_product,
  similar_product,
  similarity_score
FROM ranked
WHERE rank_num <= 2
ORDER BY similarity_score DESC;