## Content-based recommendation system

This is an example of a content-based recommendation system for a newspaper website. The idea is to recommend popular items in the same category as the article that the user is reading. What makes a category, though, is an interesting question to explore.
<p>
Once we have that list of top articles in each category, we could run a BigQuery query periodically (perhaps once an hour) and store it in an in-memory hashmap in our web application. Then, simply look up articles from the in-memory hashmap. One interesting consideration is whether or not to recommend articles the user has already read. 
<p>


In [10]:
import google.datalab.bigquery as bq
query="""
#standardSQL
WITH
  content_engagement AS (
  SELECT
    STRUCT(contentId AS id,
      SUM(session_duration) AS engagement) AS content,
    MAX(Category) AS category
  FROM (
    SELECT
      fullVisitorID,
      (
      SELECT
        MAX(IF(index=10,
            value,
            NULL))
      FROM
        UNNEST(hits.customDimensions)) AS contentId,
      (
      SELECT
        MAX(IF(index=7,
            value,
            NULL))
      FROM
        UNNEST(hits.customDimensions)) AS category,
      (LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId ORDER BY hits.time ASC) - hits.time) AS session_duration
    FROM
      `cloud-training-demos.GA360_test.ga_sessions_sample`,
      UNNEST(hits) AS hits
    WHERE
      # only include hits on pages
      hits.type = "PAGE"
    GROUP BY
      fullVisitorId,
      contentId,
      category,
      hits.time
    HAVING
      category IS NOT NULL)
  GROUP BY
    contentId
  HAVING
    content.engagement IS NOT NULL ),
  content_for_category AS (
  SELECT
    category,
    ARRAY_AGG(content) AS article,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY content.engagement DESC) AS article_order
  FROM
    content_engagement
  GROUP BY
    category,
    content.engagement )
SELECT
  category,
  article.id,
  article.engagement
FROM
  content_for_category,
  UNNEST(article) AS article
WHERE
  article_order <=5
"""
df = bq.Query(query).execute().result().to_dataframe()
df

Unnamed: 0,category,id,engagement
0,Lifestyle,299826775,1098966254
1,Lifestyle,299925700,740148877
2,Lifestyle,299935287,680781009
3,Lifestyle,299826767,618072785
4,Lifestyle,299907275,460224162
5,News,299410466,1859062884
6,News,299836255,1377806702
7,News,299816215,1193002783
8,News,299972800,811595627
9,News,299933565,748322723


Anyone who is currently on a Lifestyle page would be recommended these five articles

In [13]:
df[df.category == 'Lifestyle'].id

0    299826775
1    299925700
2    299935287
3    299826767
4    299907275
Name: id, dtype: object

Copyright 2018 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License