### Table:
|visitor_id|page_name|visit_datetime|conversion_flag|
|:--------:|:--------:|:--------:|:--------:|
|123|A|11/1/2019 9:00:00|0|
|123|A|11/1/2019 9:20:00|1|
|123|B|11/1/2019 9:30:00|1|
|...|...|...|...|...|

### Questions:
* Find average conversion rate of visitors
* Find conversion rate by the first page of visit
* Find conversion rate by the last page of visit
* Find conversion rate by the number of pages a visitor goes to
* Find conversion rate by the page path users take

In [1]:
%load_ext sql

#### * Find average conversion rate of visitors
First, I find whether the visitor *ultimately* converted. I count the percentage by dividing total visitor that has `conversion_flag` ultimately `1` divided by total of `visitor_id`. There is no `GROUP BY` in the end script because `visitor_id` is already unique.

In [5]:
%%sql postgresql://postgres:postgrepassword@localhost/
WITH conversions AS (
  SELECT
    visitor_id,
    MAX(conversion_flag) AS converted
  FROM visitor_table
  GROUP BY visitor_id)
SELECT
  COUNT(*) AS "Total Visitor",
  SUM(CASE WHEN converted = '1' THEN 1 ELSE 0 END) AS "Total Converted",
  CONCAT(ROUND((SUM(CASE WHEN converted = '1' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Conversion Rate",
  SUM(CASE WHEN converted = '0' THEN 1 ELSE 0 END) AS "Total NOT Converted",
  CONCAT(ROUND((SUM(CASE WHEN converted = '0' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Not Converted Rate"
FROM conversions;

1 rows affected.


Total Visitor,Total Converted,Conversion Rate,Total NOT Converted,Not Converted Rate
889,391,43.98%,498,56.02%


#### * Find conversion rate by the first page of visit
First, I get the first `visit_datetime`, and whether the visitor *ultimately* converted. The end script joined the CTE back to the original table by capturing exact `visit_date_time`. By grouping `page_name`, I calculate the `Conversion_Rate` the same way as the first question.

In [None]:
%%sql postgresql://postgres:postgrepassword@localhost/
WITH conversions AS (
  SELECT
    visitor_id,
    MIN(visit_datetime) AS first_visit_datetime,
	MAX(conversion_flag) AS converted
  FROM visitor_table
  GROUP BY visitor_id
),
first_page AS (
  SELECT
    vt.visitor_id,
    vt.page_name AS first_page
  FROM visitor_table AS vt
  JOIN conversions AS c ON vt.visitor_id = c.visitor_id AND vt.visit_datetime = c.first_visit_datetime
)
SELECT
  fp.first_page,
  COUNT(*) AS "Total Visitor as First Page",
  SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END) AS "Total Converted",
  CONCAT(ROUND((SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Conversion Rate",
  SUM(CASE WHEN c.converted = '0' THEN 1 ELSE 0 END) AS "Total NOT Converted",
  CONCAT(ROUND((SUM(CASE WHEN c.converted = '0' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Not Converted Rate"
FROM first_page AS fp
JOIN conversions AS c ON fp.visitor_id = c.visitor_id
GROUP BY fp.first_page

#### * Find conversion rate by the last page of visit
With the similar method as previous question, I capture `MAX()` of `last_page_visited`, but only when the `conversion_flag` is still `0`. This way I will not capture any navigation *after* the `conversion_flag` is turned into `1`. The end script based on this CTE, because all the `visitor_id` should be captured (assuming **no visitor** should have `conversion_flag` set as `1` since the beginning), then joined with another CTE which only capture whether the `visitor_id` is converted or not.

In [None]:
%%sql postgresql://postgres:postgrepassword@localhost/
WITH conversions AS (
  SELECT
    visitor_id,
    MAX(CASE WHEN conversion_flag = '0' THEN visit_datetime ELSE NULL END) AS last_visit_datetime,
	MAX(conversion_flag) AS converted
  FROM visitor_table
  GROUP BY visitor_id
),
last_page AS (
  SELECT
    vt.visitor_id,
    vt.page_name AS last_page
  FROM visitor_table AS vt
  JOIN conversions AS c ON vt.visitor_id = c.visitor_id AND vt.visit_datetime = c.last_visit_datetime
)
SELECT
  lp.last_page,
  COUNT(*) AS "Total Visitor as Last Page",
  SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END) AS "Total Converted",
  CONCAT(ROUND((SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Conversion Rate",
  SUM(CASE WHEN c.converted = '0' THEN 1 ELSE 0 END) AS "Total NOT Converted",
  CONCAT(ROUND((SUM(CASE WHEN c.converted = '0' THEN 1 ELSE 0 END)/COUNT(*)::DECIMAL)*100,2),'%') AS "Not Converted Rate"
FROM last_page AS lp
JOIN conversions AS c ON lp.visitor_id = c.visitor_id
GROUP BY lp.last_page


#### * Find conversion rate by the number of pages a visitor goes to
Same as the previous question, I created a CTE of `COUNT()` of page(s) visited, when the `conversion_flag` is still at `0`. I then ultimately joined another CTE which capture whether visitor is converted or not.

In [None]:
%%sql postgresql://postgres:postgrepassword@localhost/
WITH
	count_page_visited AS (
		SELECT
			COUNT(page_name) AS total_pages_visited,
-- 			COUNT(DISTINCT page_name) AS total_unique_pages_visited,
			visitor_id
		FROM
			visitor_table
		WHERE
			conversion_flag = '0'
		GROUP BY
			visitor_id),
	conversion AS (
		SELECT
			visitor_id,
			MAX(conversion_flag) AS converted
		FROM
			visitor_table
		GROUP BY
			visitor_id)
SELECT
	cpv.total_pages_visited,
-- 	cpv.total_unique_pages_visited,
	CONCAT(ROUND((SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END)/COUNT(*)*100::DECIMAL),2),'%') AS Conversion_Rate
FROM
	count_page_visited AS cpv
LEFT JOIN
	conversion AS c ON cpv.visitor_id = c.visitor_id
GROUP BY
	cpv.total_pages_visited
--	cpv.total_unique_pages_visited


#### Find conversion rate by the page path users take
This one a little tricky because I'm not sure if I can use `ARRAY_TO_STRING()` and `ARRAY_AGG()` on another engine. I used Postgre SQL. This time I capture the path, again, when the `conversion_flag` is still at `0`.

In [None]:
%%sql postgresql://postgres:postgrepassword@localhost/
WITH
	path AS (
		SELECT
			ARRAY_TO_STRING(ARRAY_AGG(page_name ORDER BY visit_datetime ASC), ' -> ') AS nav_path,
			visitor_id
		FROM
			visitor_table
		WHERE
			conversion_flag = '0'
		GROUP BY
			visitor_id
	),
	conversion AS (
		SELECT
			visitor_id,
			MAX(conversion_flag) AS converted
		FROM
			visitor_table
		GROUP BY
			visitor_id)
SELECT
	p.nav_path,
	CONCAT(ROUND((SUM(CASE WHEN c.converted = '1' THEN 1 ELSE 0 END)/COUNT(*)*100::DECIMAL),2),'%') AS Conversion_Rate
FROM
	path AS p
LEFT JOIN
	conversion AS c ON p.visitor_id = c.visitor_id
GROUP BY
	p.nav_path