# Funnel Queries (funnel models)

In [None]:
import os
from pathlib import Path

from dotenv import find_dotenv, load_dotenv
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL

In [None]:
PROJ_ROOT = Path().resolve().parents[3]
env_file_dir = PROJ_ROOT / '.env'
_ = load_dotenv(env_file_dir, verbose=True)

## About

### Objective

Let's say that the *Director of Product* at greenery comes to us (the head Analytics Engineer) and asks some questions:

1. How are our users moving through the product funnel?
2. Which steps in the funnel have largest drop off points?

### Background

Product funnel is defined with 3 levels for our dataset:

1. Sessions with any event of type `page_view`
2. Sessions with any event of type `add_to_cart`
3. Sessions with any event of type `checkout`

They need to understand how the product funnel is performing to set the roadmap for the next quarter. The Product and Engineering teams are asking what their projects will be, and they want to make data-informed decisions.

Thankfully, we can help using our data, and modeling it with `dbt`!

### Constraints

In addition to answering these questions right now, we want to be able to answer them at any time. *The Product and Engineering* teams will want to track how they are improving these metrics on an ongoing basis. As such, we need to think about how we can model the data in a way that allows us to set up reporting for the long-term tracking of our goals.

### Questions                                                                                         
1. Please create any additional `dbt` models needed to help answer these questions from our product team, and put your answers in a `README` in your repo.

### Notes

1. This notebook supports <kbd>Run</kbd> > <kbd>Run All Cells</kbd>.

## User Inputs

In [None]:
min_date = '2021-02-09'
max_date = '2021-02-12'

In [None]:
engine = create_engine(
    URL(
        drivername="driver",
        account=os.getenv("UPLIMIT_SNOWFLAKE_ACCOUNT"),
        user=os.getenv("UPLIMIT_SNOWFLAKE_USER"),
        password=os.getenv("UPLIMIT_SNOWFLAKE_PASS"),
        warehouse=os.getenv("UPLIMIT_SNOWFLAKE_WAREHOUSE"),
        role=os.getenv("UPLIMIT_SNOWFLAKE_ROLE"),
        database=os.getenv("UPLIMIT_SNOWFLAKE_DB_NAME"),
        schema=os.getenv("UPLIMIT_SNOWFLAKE_SCHEMA"),
    )
)

## Connect

Load Jupyter SQL extension

In [None]:
%load_ext sql

Set the maximum number of rows to be displayed to `None` (shows all rows)

In [None]:
%config SqlMagic.displaylimit = None

Connect to DuckDB database

In [None]:
%sql engine --alias connection

## Queries

**How are our users moving through the product funnel?**

In [None]:
%%sql
WITH daily_user_sessions_named AS (
    SELECT *
    FROM fct_sessions_daily
),
daily_user_sessions_filtered AS (
    SELECT *
    FROM daily_user_sessions_named
    WHERE created_at_date BETWEEN '{{ min_date }}' AND '{{ max_date }}'
),
/* ============= BI TOOL STARTS HERE ============= */
bounce_sessions AS (
    SELECT session_id,
           SUM(num_page_views) AS total_num_page_views
    FROM daily_user_sessions_filtered
    GROUP BY ALL
    HAVING total_num_page_views = 1
),
totals AS (
    SELECT ZEROIFNULL(page_views) AS page_views,
           ZEROIFNULL(add_to_carts) AS add_to_carts,
           ZEROIFNULL(checkouts) AS checkouts,
           ZEROIFNULL(num_bounce_sessions) AS num_bounce_sessions
    FROM (
        SELECT COUNT(DISTINCT(session_id)) AS page_views,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_page_views > 0
    ) t1
    LEFT JOIN (
        SELECT COUNT(DISTINCT(session_id)) AS add_to_carts,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_add_to_carts > 0
    ) t2 USING (row_num)
    LEFT JOIN (
        SELECT COUNT(DISTINCT(session_id)) AS checkouts,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_checkouts > 0
    ) t3 USING (row_num)
    LEFT JOIN (
        SELECT COUNT(DISTINCT(session_id)) AS num_bounce_sessions,
               1 AS row_num
        FROM bounce_sessions
    ) t5 USING (row_num)
    ORDER BY checkouts DESC
),
metrics AS (
    SELECT * EXCLUDE(num_bounce_sessions),
           100*(num_bounce_sessions/page_views) AS bounce_rate,
           100*(add_to_carts/page_views) AS add_to_cart_rate,
           100*(1-checkouts/add_to_carts) AS cart_abandonment_rate,
           100*(checkouts/add_to_carts) AS add_to_cart_conversion_rate,
           100*(checkouts/page_views) AS conversion_rate
    FROM totals
)
SELECT *
FROM metrics

**Observations**

1. For all available sessions between 2021-02-09 and 2029-02-12, there were
   - 578 sessions with a product page view
   - 467 sessions with at least one product being added to a shopping cart
   - 361 purchases
2. The [bounce rate](https://amplitude.com/blog/bounce-rate-calculate-and-average#what-is-bounce-rate) is ~18%. This compares favorably to the [industry standard](https://amplitude.com/blog/bounce-rate-calculate-and-average#what-is-a-good-average-bounce-rate) for e-commerce (20%-45%).
3. The [add-to-cart rate](https://blendcommerce.com/blogs/shopify/add-to-cart-rate) is ~81%. This compares to the [industry standard](https://dashthis.com/kpi-examples/add-to-cart-rate/) of 10%-20%, which [suggests](https://blendcommerce.com/blogs/shopify/add-to-cart-rate) strong appeal for Greenery's products, high usability of the Greenery website and positive impact of Greenery's marketing efforts.
4. The [cart abandonment rate](https://www.geckoboard.com/best-practice/kpi-examples/shopping-cart-abandonment-rate/) is ~23%, which is well below the industry average of ~70%-75%, which suggests execllent usability at driving sales revenue for the Greenery website.
5. The [add-to-cart abandonment rate](https://www.tidio.com/blog/add-to-cart-conversion-rate-statistics/) is ~77%. This is also well above the industry standard which is ~10%. This suggests favorable website design, product choice and customer service for the Greenery platform.
6. The [conversion rate (using sessions)](https://www.shopify.com/ca/blog/ecommerce-conversion-rate#2) is ~62%. This is nearly 20X the industry standard ([1](https://www.toptal.com/external-blogs/growth-collective/ecommerce-conversion-rates), [2](https://www.toptal.com/external-blogs/growth-collective/ecommerce-conversion-rates)) which is ~3%. From 2., ~81% of sessions end in at least one Greenery product being added to a shopping cart. Relative to this, it is encouraging that ~62% of sessions end in a conversion.

In conclusion, the Greenery platform is matching or beating the industry standard in all of the above five metrics. This suggests that, overall, **customers are moving through the product funnel with a high degree of efficiency**.

### Question 2

**Which steps in the funnel have largest drop off points?**

In [None]:
%%sql
WITH daily_user_sessions_named AS (
    SELECT *
    FROM fct_sessions_daily
),
daily_user_sessions_filtered AS (
    SELECT *
    FROM daily_user_sessions_named
    WHERE created_at_date BETWEEN '{{ min_date }}' AND '{{ max_date }}'
),
/* ============= BI TOOL STARTS HERE ============= */
overall AS (
    SELECT ZEROIFNULL(page_views) AS page_views,
           ZEROIFNULL(add_to_carts) AS add_to_carts,
           ZEROIFNULL(checkouts) AS checkouts
    FROM (
        SELECT COUNT(DISTINCT(session_id)) AS page_views,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_page_views > 0
    ) t1
    LEFT JOIN (
        SELECT COUNT(DISTINCT(session_id)) AS add_to_carts,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_add_to_carts > 0
    ) t2 USING (row_num)
    LEFT JOIN (
        SELECT COUNT(DISTINCT(session_id)) AS checkouts,
               1 AS row_num
        FROM daily_user_sessions_filtered
        WHERE num_checkouts > 0
    ) t3 USING (row_num)
    ORDER BY checkouts DESC
),
overall_tidy_dropoffs AS (
    SELECT LOWER(metric) AS metric,
           total,
           dropoff
    FROM (
        SELECT *,
               LAG(total, 1) OVER(ORDER BY total DESC) AS total_previous,
               100*(total_previous-total)/total_previous AS dropoff
        FROM overall
        UNPIVOT (total FOR metric IN (page_views, add_to_carts, checkouts))
    )
)
SELECT *
FROM overall_tidy_dropoffs

**Observations**

1. The funnel step with the largest dropoff is *checkout*s (relative to add-to-carts) at ~23%, which is slightly higher than the dropoff in add-to-carts (relative to page views), which is ~19%. This small difference suggests there is [probably no issue between](https://segment.com/blog/building-ultimate-funnel-sql/) adding Greenery products to a shopping cart and checking out the cart.

## Disconnect

Close connection

In [None]:
%sql --close connection