# Stripe - Subscription Pricing Tiers Customer Retention Analysis

```SQL
CREATE TABLE fct_subscriptions (
    subscription_id integer,
    customer_id integer,
    pricing_tier varchar,
    start_date date,
    end_date date,
    renewal_status varchar
);

INSERT INTO fct_subscriptions (subscription_id, customer_id, pricing_tier, start_date, end_date, renewal_status)
VALUES
    (1, 101, 'Basic', '2024-07-05', '2024-08-05', 'Not Renewed'),
    (2, 102, 'Premium', '2024-07-10', '2024-08-10', 'Renewed'),
    (3, 103, 'Enterprise', '2024-07-15', '2024-08-15', 'Renewed'),
    (4, 101, 'Basic', '2024-08-06', '2024-09-06', 'Renewed'),
    (5, 104, 'Basic', '2024-08-10', '2024-09-10', 'Not Renewed'),
    (6, 105, 'Premium', '2024-08-12', '2024-09-12', 'Not Renewed'),
    (7, 102, 'Premium', '2024-09-01', '2024-10-01', 'Renewed'),
    (8, 106, 'Enterprise', '2024-09-05', '2024-10-05', 'Not Renewed'),
    (9, 107, 'Premium', '2024-07-20', '2024-08-20', 'Not Renewed'),
    (10, 108, 'Basic', '2024-07-22', '2024-08-22', 'Renewed'),
    (11, 109, 'Enterprise', '2024-08-15', '2024-09-15', 'Renewed'),
    (12, 110, 'Premium', '2024-09-10', '2024-10-10', 'Not Renewed'),
    (13, 111, 'Basic', '2024-09-15', '2024-10-15', 'Not Renewed'),
    (14, 103, 'Enterprise', '2024-09-20', '2024-10-20', 'Renewed'),
    (15, 112, 'Premium', '2024-08-25', '2024-09-25', 'Renewed');

SELECT * FROM fct_subscriptions;
```

In [1]:
import pandas as pd
import numpy as np

In [3]:
df_subcription = pd.read_csv('Data/022/fct_subscriptions.csv', parse_dates=['start_date','end_date'])

df_subcription.head()

Unnamed: 0,subscription_id,customer_id,pricing_tier,start_date,end_date,renewal_status
0,1,101,Basic,2024-07-05,2024-08-05,Not Renewed
1,2,102,Premium,2024-07-10,2024-08-10,Renewed
2,3,103,Enterprise,2024-07-15,2024-08-15,Renewed
3,4,101,Basic,2024-08-06,2024-09-06,Renewed
4,5,104,Basic,2024-08-10,2024-09-10,Not Renewed


# Pregunta 1

### Para el tercer trimestre (Q3) de 2024, ¿cuál es el número total de clientes distintos que iniciaron una suscripción para cada nivel de precio? Esta consulta establece los recuentos base de suscripciones para evaluar la retención de clientes.

In [8]:
df_q3 = df_subcription[
    (df_subcription['start_date'].between('2024-07-01','2024-09-30'))
]

repuesta1 = df_q3.groupby('pricing_tier')['customer_id'].nunique().reset_index()

repuesta1 = repuesta1.sort_values(by='customer_id',ascending=False)

repuesta1

Unnamed: 0,pricing_tier,customer_id
2,Premium,5
0,Basic,4
1,Enterprise,3


```SQL
SELECT
    pricing_tier,
    COUNT(DISTINCT customer_id) AS total_customer
FROM fct_subscriptions
WHERE start_date BETWEEN '2024-07-01' AND '2024-09-30'
GROUP BY pricing_tier
ORDER BY total_customer DESC;
```

# Pregunta 2

### Utilizando las suscripciones que comenzaron en el tercer trimestre (Q3) de 2024, ¿qué porcentaje de suscripciones se renovaron para cada nivel de precio? Las suscripciones renovadas tendrán un estado de renovación de 'Renewed'. Este desglose ayudará a evaluar la efectividad de la retención en los diferentes niveles.

In [11]:
df_q3 = df_subcription[
    (df_subcription['start_date'].between('2024-07-01','2024-09-30'))
].copy()

df_q3['is_renewed'] = df_q3['renewal_status'] == 'Renewed'

resultado = df_q3.groupby('pricing_tier')['is_renewed'].mean().reset_index()

resultado['renewal_rate'] = resultado['is_renewed'] * 100
resultado = resultado.drop(columns=['is_renewed']).sort_values(by='renewal_rate', ascending=False)

resultado


Unnamed: 0,pricing_tier,renewal_rate
1,Enterprise,75.0
2,Premium,50.0
0,Basic,40.0


```SQL
SELECT
    pricing_tier,
    COUNT(CASE WHEN renewal_status = 'Renewed' THEN 1 END) * 100.0 / COUNT(*) AS renewal_rate
FROM fct_subscriptions
WHERE start_date BETWEEN '2024-07-01' AND '2024-09-30'
GROUP BY pricing_tier
ORDER BY renewal_rate DESC;
```

# Pregunta 3

### Basándose en las suscripciones que comenzaron en el tercer trimestre (Q3) de 2024, clasifique (rank) los niveles de precio según su tasa de retención. Nos gustaría ver tanto la tasa de retención como el puesto en el ranking para cada nivel, de modo que podamos identificar qué modelo de precios mantiene a los clientes comprometidos por más tiempo.

In [14]:
df_q3 = df_subcription[
    df_subcription['start_date'].between('2024-07-01','2024-09-30')
].copy()

df_q3['is_renewed'] = df_q3['renewal_status'] == 'Renewed'

reporte = df_q3.groupby('pricing_tier')['is_renewed'].mean().reset_index()
reporte.columns = ['pricing_tier','retention_rate']
reporte['retention_rate'] *= 100

reporte['rank'] = reporte['retention_rate'].rank(ascending=False, method='dense').astype(int)

reporte = reporte.sort_values('rank')

reporte

Unnamed: 0,pricing_tier,retention_rate,rank
1,Enterprise,75.0,1
2,Premium,50.0,2
0,Basic,40.0,3


```SQL
WITH RetentionStats AS
    (SELECT pricing_tier,
         COUNT(CASE WHEN renewal_status = 'Renewed' THEN 1 END) * 100.0 / COUNT(*) retention_rate
     FROM fct_subscriptions
    WHERE start_date BETWEEN '2024-07-01' AND '2024-09-30'
    GROUP BY pricing_tier)
SELECT
    pricing_tier,
    retention_rate,
    RANK() OVER (ORDER BY retention_rate DESC) AS retention_rank
FROM RetentionStats
```