# Retention curve

Source: https://towardsdatascience.com/twenty-five-sql-practice-exercises-5fc791e24082

From the following user activity table, write a query to return the fraction of users who are retained (show some activity) a given number of days after joining. By convention, users are considered active on their join day (day 0).

In [1]:
%run Question_30_RetentionCurve.ipynb

 * postgresql://fknight:***@localhost/postgres
Done.
Done.
7 rows affected.
7 rows affected.


## The solution is given below

In [4]:
%%sql

-- get join dates for each user

WITH join_dates AS (
SELECT user_id, action_date AS join_date FROM users
WHERE action = 'Join' ),


-- create vector containing all dates in date range

date_vector AS (
SELECT cast(generate_series(min(action_date), max(action_date),
'1 day'::interval) AS date) AS dates FROM users ),

-- cross join to get all possible user-date combinations

all_users_dates AS (
SELECT DISTINCT user_id, d.dates FROM users
CROSS JOIN date_vector d ),

-- left join users table onto all user-date combinations on matching 
-- user ID and date (null on didnt engage), join onto this each 
-- users signup date, exclude user-date combinations falling


t1 AS (
SELECT a.dates - c.join_date AS day_no, b.user_id FROM all_users_dates a
LEFT JOIN users b
ON a.user_id = b.user_id
AND a.dates = b.action_date
JOIN join_dates c
ON a.user_id = c.user_id
WHERE a.dates - c.join_date >= 0 )

-- grouping by days since signup, count (non-null) user IDs as 
-- active users, total users, and retention rate

SELECT day_no, count(*) AS n_total,
count(DISTINCT user_id) AS n_active, round(1.0*count(DISTINCT user_id)/count(*), 2) AS retention
FROM t1 GROUP BY 1

 * postgresql://fknight:***@localhost/postgres
4 rows affected.


day_no,n_total,n_active,retention
0,3,3,1.0
1,3,2,0.67
2,3,1,0.33
3,1,1,1.0
