#  Predict Visitor Purchases with a Classification Model in BQML

- ### Objectives

In this notebook, you learn to perform the following tasks:

- Query and explore the ecommerce dataset

- Create a training and evaluation dataset to be used for batch prediction

- Create a classification (logistic regression) model in BQML

- Evaluate the performance of your machine learning model

- Predict and rank the probability that a visitor will make a purchase

In [1]:
# BigQuery Setup
# Importing Libraries and Credentials
import pandas as pd
import numpy as np
import seaborn as sns
from google.cloud import bigquery
from google.oauth2 import service_account
# ignore warnings
from warnings import filterwarnings
filterwarnings("ignore")


%load_ext google.cloud.bigquery

credentials = service_account.Credentials.from_service_account_file('/Users/ssamilozkan/Desktop/BigQuery/config.json')

project_id = 'dbt-bigquery-setup-369911'
client = bigquery.Client(credentials= credentials, project=project_id)

### Explore ecommerce data

Scenario: Your data analyst team exported the Google Analytics logs for an ecommerce website into BigQuery and created a new table of all the raw ecommerce visitor session data for you to explore. Using this data, you'll try to answer a few questions.

1. Out of the total visitors who visited our website, what % made a purchase?



In [3]:
%%bigquery
WITH visitors AS(
SELECT
COUNT(DISTINCT fullVisitorId) AS total_visitors
FROM `data-to-insights.ecommerce.web_analytics`
),
purchasers AS(
SELECT
COUNT(DISTINCT fullVisitorId) AS total_purchasers
FROM `data-to-insights.ecommerce.web_analytics`
WHERE totals.transactions IS NOT NULL
)
SELECT
  total_visitors,
  total_purchasers,
  total_purchasers / total_visitors AS conversion_rate
FROM visitors, purchasers

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,total_visitors,total_purchasers,conversion_rate
0,741721,20015,0.026985


2. What are the top 5 selling products?

In [4]:
%%bigquery
#standardSQL
SELECT
  p.v2ProductName,
  p.v2ProductCategory,
  SUM(p.productQuantity) AS units_sold,
  ROUND(SUM(p.localProductRevenue/1000000),2) AS revenue
FROM `data-to-insights.ecommerce.web_analytics`,
UNNEST(hits) AS h,
UNNEST(h.product) AS p
GROUP BY 1, 2
ORDER BY revenue DESC
LIMIT 5;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,v2ProductName,v2ProductCategory,units_sold,revenue
0,Nest® Learning Thermostat 3rd Gen-USA - Stainl...,Nest-USA,17651,870976.95
1,Nest® Cam Outdoor Security Camera - USA,Nest-USA,16930,684034.55
2,Nest® Cam Indoor Security Camera - USA,Nest-USA,14155,548104.47
3,Nest® Protect Smoke + CO White Wired Alarm-USA,Nest-USA,6394,178937.6
4,Nest® Protect Smoke + CO White Battery Alarm-USA,Nest-USA,6340,178572.4


3.  How many visitors bought on subsequent visits to the website?

In [5]:
%%bigquery
# visitors who bought on a return visit (could have bought on first as well
WITH all_visitor_stats AS (
SELECT
  fullvisitorid, # 741,721 unique visitors
  IF(COUNTIF(totals.transactions > 0 AND totals.newVisits IS NULL) > 0, 1, 0) AS will_buy_on_return_visit
  FROM `data-to-insights.ecommerce.web_analytics`
  GROUP BY fullvisitorid
)
SELECT
  COUNT(DISTINCT fullvisitorid) AS total_visitors,
  will_buy_on_return_visit
FROM all_visitor_stats
GROUP BY will_buy_on_return_visit

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,total_visitors,will_buy_on_return_visit
0,729848,0
1,11873,1


In [8]:
round(11873/ 729848 * 100, 2)

1.63

Analyzing the results, you can see that 1.63% of total visitors will return and purchase from the website. This includes the subset of visitors who bought on their very first session and then came back and bought again