# Gigs Senior Data Analyst Challenge

Welcome to the Gigs data analyst take-home challenge! This notebook will help you get started with analyzing our connectivity usage data.

## About the Data

You'll be working with three main datasets:
- **Usage Data**: Detailed usage per subscription period (~100K+ records)
- **Plan Events**: Plan configuration and pricing history
- **Projects**: Project metadata

## Setup Instructions

Run the cells below to set up your environment and load the data into DuckDB.

In [1]:
# Import required libraries
import duckdb
import pandas as pd
from datetime import datetime, timedelta

print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


In [2]:
# Load JupySQL extension and configure
%load_ext sql

# Configure JupySQL for better output
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

print("✅ JupySQL configured!")

✅ JupySQL configured!


In [3]:
# Connect to DuckDB
conn = duckdb.connect('gigs-analytics.db')
%sql conn --alias duckdb

print("✅ Connected to DuckDB database: gigs-analytics.db")

✅ Connected to DuckDB database: gigs-analytics.db


In [4]:
%%sql
-- Load data into DuckDB tables
CREATE OR REPLACE TABLE usage_data AS 
SELECT * FROM 'data/usage_by_subscription_period.csv';

CREATE OR REPLACE TABLE plan_events AS 
SELECT * FROM 'data/plan_change_events.csv';

CREATE OR REPLACE TABLE projects AS 
SELECT * FROM 'data/projects.csv';

Unnamed: 0,Count
0,3


In [5]:
%%sql
-- Verify data loading
select 
  'usage_data' as table_name, 
  count(*) as row_count,
  count(distinct subscription_id) as unique_subscriptions
from usage_data
union all
select 
  'plan_events' as table_name, 
  count(*) as row_count,
  count(distinct plan_id) as unique_plans
from plan_events
union all
select 
  'projects' as table_name, 
  count(*) as row_count,
  count(distinct project_id__hashed) as unique_projects
from projects;

Unnamed: 0,table_name,row_count,unique_subscriptions
0,usage_data,53565,8457
1,plan_events,209,36
2,projects,3,3


## Your Analysis Starts Here!

Now you have everything set up. Use the cells below to start your analysis.

### Tips:
- Use `%%sql` for multi-line SQL queries
- Use `%sql variable_name <<` to store results in a Python variable
- Combine SQL with Python/Pandas for advanced analysis
- Feel free to use any visualisation library you feel comfortable with

# Big header
## Question 1
### Smaller header
**Bold text**
  
  Regular text

_italic text_

_How much data does a subscription typically consume?

Insert discussion of question #1

## Question 1: How much data does a subscription typically consume?

### Query 1: Average and Median Data Consumption Across our Subscription Base

In [229]:
%%sql

WITH monthly_usage AS (
    -- Extract year-month from reporting_date and calculate monthly usage per subscription
    SELECT 
        subscription_id,
        strftime(reporting_date::DATE, '%Y-%m') as report_month,
        MAX(cumulative_data_usage_megabyte) as monthly_data_usage_mb
    FROM usage_data
    WHERE cumulative_data_usage_megabyte IS NOT NULL 
        AND cumulative_data_usage_megabyte > 0
    GROUP BY subscription_id, strftime(reporting_date::DATE, '%Y-%m')
),

active_users_per_month AS (
    -- Count active users (those with data usage > 0) per month
    SELECT 
        report_month,
        COUNT(DISTINCT subscription_id) as active_users,
        AVG(monthly_data_usage_mb) as avg_data_consumption_mb,
        -- Calculate median using QUANTILE
        QUANTILE(monthly_data_usage_mb, 0.5) as median_data_consumption_mb
    FROM monthly_usage
    GROUP BY report_month
)

SELECT 
    ROUND(AVG(active_users)) as avg_active_users_per_month,
    ROUND(AVG(avg_data_consumption_mb)) as avg_data_consumption_mb_per_month,
    ROUND(AVG(median_data_consumption_mb)) as median_data_consumption_mb_per_month
FROM active_users_per_month;



Unnamed: 0,avg_active_users_per_month,avg_data_consumption_mb_per_month,median_data_consumption_mb_per_month
0,2466.0,1885.0,216.0


### Query 2: How Does Our Subscription Base Divide Among Data Usage Quantity Buckets

In [230]:
%%sql

WITH monthly_usage AS (
    -- Extract year-month from reporting_date and calculate monthly usage per subscription
    SELECT 
        subscription_id,
        strftime(reporting_date::DATE, '%Y-%m') as report_month,
        MAX(cumulative_data_usage_megabyte) as monthly_data_usage_mb
    FROM usage_data
    WHERE cumulative_data_usage_megabyte IS NOT NULL 
        AND cumulative_data_usage_megabyte > 0
    GROUP BY subscription_id, strftime(reporting_date::DATE, '%Y-%m')
),

data_buckets AS (
    SELECT 
        subscription_id,
        monthly_data_usage_mb,
        CASE 
            WHEN monthly_data_usage_mb >= 0 AND monthly_data_usage_mb < 100 THEN '0-100 MB'
            WHEN monthly_data_usage_mb >= 100 AND monthly_data_usage_mb < 500 THEN '100-500 MB'
            WHEN monthly_data_usage_mb >= 500 AND monthly_data_usage_mb < 1000 THEN '500MB-1GB'
            WHEN monthly_data_usage_mb >= 1000 AND monthly_data_usage_mb < 2000 THEN '1-2 GB'
            WHEN monthly_data_usage_mb >= 2000 AND monthly_data_usage_mb < 5000 THEN '2-5 GB'
            WHEN monthly_data_usage_mb >= 5000 THEN '5+ GB'
        END as consumption_bucket
    FROM monthly_usage
)

SELECT 
    consumption_bucket,
    COUNT(DISTINCT subscription_id) as active_subscriptions,
    ROUND(COUNT(DISTINCT subscription_id) * 100.0 / SUM(COUNT(DISTINCT subscription_id)) OVER()) as percentage_of_total
FROM data_buckets
GROUP BY consumption_bucket
ORDER BY 
    CASE consumption_bucket
        WHEN '0-100 MB' THEN 1
        WHEN '100-500 MB' THEN 2
        WHEN '500MB-1GB' THEN 3
        WHEN '1-2 GB' THEN 4
        WHEN '2-5 GB' THEN 5
        WHEN '5+ GB' THEN 6
    END;

Unnamed: 0,consumption_bucket,active_subscriptions,percentage_of_total
0,0-100 MB,4199,28.0
1,100-500 MB,4475,30.0
2,500MB-1GB,2123,14.0
3,1-2 GB,1853,12.0
4,2-5 GB,1166,8.0
5,5+ GB,1125,8.0


### Query 3: Which Percentage of Our Subscriptions Move between Consumption Buckets

In [231]:
%%sql
-- Count how many subscription_ids move between consumption buckets month-to-month

WITH monthly_usage AS (
    -- Extract year-month from reporting_date and calculate monthly usage per subscription
    SELECT 
        subscription_id,
        strftime(reporting_date::DATE, '%Y-%m') as report_month,
        MAX(cumulative_data_usage_megabyte) as monthly_data_usage_mb
    FROM usage_data
    WHERE cumulative_data_usage_megabyte IS NOT NULL 
        AND cumulative_data_usage_megabyte > 0
    GROUP BY subscription_id, strftime(reporting_date::DATE, '%Y-%m')
),

data_buckets AS (
    SELECT 
        subscription_id,
        report_month,
        monthly_data_usage_mb,
        CASE 
            WHEN monthly_data_usage_mb >= 0 AND monthly_data_usage_mb < 100 THEN '0-100 MB'
            WHEN monthly_data_usage_mb >= 100 AND monthly_data_usage_mb < 500 THEN '100-500 MB'
            WHEN monthly_data_usage_mb >= 500 AND monthly_data_usage_mb < 1000 THEN '500MB-1GB'
            WHEN monthly_data_usage_mb >= 1000 AND monthly_data_usage_mb < 2000 THEN '1-2 GB'
            WHEN monthly_data_usage_mb >= 2000 AND monthly_data_usage_mb < 5000 THEN '2-5 GB'
            WHEN monthly_data_usage_mb >= 5000 THEN '5+ GB'
        END as consumption_bucket
    FROM monthly_usage
),

monthly_transitions AS (
    -- Get consecutive months for each subscription with their buckets
    SELECT 
        db1.subscription_id,
        db1.report_month as current_month,
        db1.consumption_bucket as current_bucket,
        db2.report_month as next_month,
        db2.consumption_bucket as next_bucket
    FROM data_buckets db1
    JOIN data_buckets db2 ON db1.subscription_id = db2.subscription_id
    WHERE db2.report_month = CASE 
        WHEN SUBSTR(db1.report_month, 6, 2) = '12' 
        THEN (CAST(SUBSTR(db1.report_month, 1, 4) AS INTEGER) + 1) || '-01'
        ELSE SUBSTR(db1.report_month, 1, 4) || '-' || 
             CASE 
                WHEN LENGTH(CAST(CAST(SUBSTR(db1.report_month, 6, 2) AS INTEGER) + 1 AS TEXT)) = 1 
                THEN '0' || CAST(CAST(SUBSTR(db1.report_month, 6, 2) AS INTEGER) + 1 AS TEXT)
                ELSE CAST(CAST(SUBSTR(db1.report_month, 6, 2) AS INTEGER) + 1 AS TEXT)
             END
    END
),

bucket_movements AS (
    -- Categorize the movements
    SELECT 
        subscription_id,
        current_month,
        next_month,
        current_bucket,
        next_bucket,
        CASE 
            WHEN current_bucket = next_bucket THEN 'Stayed Same'
            WHEN (current_bucket = '0-100 MB' AND next_bucket IN ('100-500 MB', '500MB-1GB', '1-2 GB', '2-5 GB', '5+ GB')) OR
                 (current_bucket = '100-500 MB' AND next_bucket IN ('500MB-1GB', '1-2 GB', '2-5 GB', '5+ GB')) OR
                 (current_bucket = '500MB-1GB' AND next_bucket IN ('1-2 GB', '2-5 GB', '5+ GB')) OR
                 (current_bucket = '1-2 GB' AND next_bucket IN ('2-5 GB', '5+ GB')) OR
                 (current_bucket = '2-5 GB' AND next_bucket = '5+ GB')
                 THEN 'Moved Up'
            ELSE 'Moved Down'
        END as movement_type
    FROM monthly_transitions
),

subscription_movement_patterns AS (
    -- Analyze each subscription's overall movement pattern
    SELECT 
        subscription_id,
        CASE 
            WHEN COUNT(CASE WHEN movement_type = 'Moved Up' THEN 1 END) > 0 
                 AND COUNT(CASE WHEN movement_type = 'Moved Down' THEN 1 END) > 0 
                 THEN 'Moved Up and Down'
            WHEN COUNT(CASE WHEN movement_type = 'Moved Up' THEN 1 END) > 0 
                 AND COUNT(CASE WHEN movement_type = 'Moved Down' THEN 1 END) = 0 
                 THEN 'Moved Up'
            WHEN COUNT(CASE WHEN movement_type = 'Moved Down' THEN 1 END) > 0 
                 AND COUNT(CASE WHEN movement_type = 'Moved Up' THEN 1 END) = 0 
                 THEN 'Moved Down'
            ELSE 'Stayed Same'
        END as overall_movement_pattern
    FROM bucket_movements
    GROUP BY subscription_id
)

-- Final summary by movement pattern
SELECT 
    overall_movement_pattern,
    COUNT(*) as subscription_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage_of_subscriptions
FROM subscription_movement_patterns
GROUP BY overall_movement_pattern
ORDER BY 
    CASE overall_movement_pattern 
        WHEN 'Stayed Same' THEN 1
        WHEN 'Moved Up' THEN 2  
        WHEN 'Moved Down' THEN 3
        WHEN 'Moved Up and Down' THEN 4
    END;

Unnamed: 0,overall_movement_pattern,subscription_count,percentage_of_subscriptions
0,Stayed Same,781,15.08
1,Moved Up,311,6.0
2,Moved Down,963,18.59
3,Moved Up and Down,3125,60.33


## Question 2: How does usage look like at different plan data allowances?

### Query 1: Data Usage per Level of Data Allowance

In [232]:
%%sql

WITH plan_allowances AS ( select * from (
    SELECT DISTINCT
        project_id__hashed,
        plan_id,
        CASE 
            WHEN is_unlimited_data = 'True' OR is_unlimited_data = true THEN 999999
            ELSE data_allowance_mb
        END as plan_allowance_mb,
        CASE 
            WHEN is_unlimited_data = 'True' OR is_unlimited_data = true THEN 'Unlimited'
            WHEN data_allowance_mb = 1000 THEN '1 GB'
            WHEN data_allowance_mb = 5000 THEN '5 GB'
            ELSE CAST(data_allowance_mb AS VARCHAR) || ' MB'
        END as plan_category,
        is_unlimited_data
    FROM plan_events
    WHERE _is_current_state = true     -- Default for null/zero  -- Only current plans
) where plan_allowance_mb <> 0),

usage_by_allowance AS (
    SELECT 
        u.subscription_id,
        p.plan_category,
        p.plan_allowance_mb,
        p.is_unlimited_data,
        MAX(u.cumulative_data_usage_megabyte) as total_usage_mb,
        CASE 
            WHEN p.is_unlimited_data = 'True' OR p.is_unlimited_data = true THEN NULL
            WHEN p.plan_allowance_mb > 0 THEN 
                ROUND((MAX(u.cumulative_data_usage_megabyte) / p.plan_allowance_mb) * 100, 2)
            ELSE NULL 
        END as usage_percentage
    FROM usage_data u
    INNER JOIN plan_allowances p ON u.plan_id = p.plan_id
    WHERE u.cumulative_data_usage_megabyte IS NOT NULL
    GROUP BY u.subscription_id, p.plan_category, p.plan_allowance_mb, p.is_unlimited_data
)

SELECT 
    plan_category,
    COUNT(*) as total_subscriptions,
    ROUND(AVG(total_usage_mb), 2) as avg_usage_mb,
    ROUND(AVG(usage_percentage), 2) as avg_utilization_rate_pct,
    ROUND(QUANTILE(total_usage_mb, 0.5), 2) as median_usage_mb,
    ROUND(QUANTILE(usage_percentage, 0.5), 2) as median_utilization_rate_pct
    
FROM usage_by_allowance
GROUP BY plan_category
ORDER BY 
    CASE plan_category
        WHEN '1 GB' THEN 1
        WHEN '5 GB' THEN 2  
        WHEN 'Unlimited' THEN 3
        ELSE 4
    END;

Unnamed: 0,plan_category,total_subscriptions,avg_usage_mb,avg_utilization_rate_pct,median_usage_mb,median_utilization_rate_pct
0,1 GB,5963,955.91,95.59,336.32,33.63
1,5 GB,974,2777.71,55.55,877.25,17.54
2,Unlimited,1907,15346.76,,3351.38,


### Query 2: What % of 1 GB Subscriptions are Going Over their Limit

In [233]:
%%sql
WITH plan_allowances AS ( 
    SELECT * FROM (
        SELECT DISTINCT
            project_id__hashed,
            plan_id,
            CASE 
                WHEN is_unlimited_data = 'True' OR is_unlimited_data = true THEN 999999
                ELSE data_allowance_mb
            END as plan_allowance_mb,
            CASE 
                WHEN is_unlimited_data = 'True' OR is_unlimited_data = true THEN 'Unlimited'
                WHEN data_allowance_mb = 1000 THEN '1 GB'
                WHEN data_allowance_mb = 5000 THEN '5 GB'
                ELSE CAST(data_allowance_mb AS VARCHAR) || ' MB'
            END as plan_category,
            is_unlimited_data
        FROM plan_events
        WHERE _is_current_state = true
    ) WHERE plan_allowance_mb <> 0
),
usage_by_allowance AS (
    SELECT 
        u.subscription_id,
        p.plan_category,
        p.plan_allowance_mb,
        p.is_unlimited_data,
        MAX(u.cumulative_data_usage_megabyte) as total_usage_mb,
        CASE 
            WHEN p.is_unlimited_data = 'True' OR p.is_unlimited_data = true THEN NULL
            WHEN p.plan_allowance_mb > 0 THEN 
                ROUND((MAX(u.cumulative_data_usage_megabyte) / p.plan_allowance_mb) * 100, 2)
            ELSE NULL 
        END as usage_percentage
    FROM usage_data u
    INNER JOIN plan_allowances p ON u.plan_id = p.plan_id
    WHERE u.cumulative_data_usage_megabyte IS NOT NULL
    GROUP BY u.subscription_id, p.plan_category, p.plan_allowance_mb, p.is_unlimited_data
),
gb1_plan_analysis AS (
    SELECT 
        subscription_id,
        total_usage_mb,
        plan_allowance_mb,
        usage_percentage,
        CASE 
            WHEN total_usage_mb > plan_allowance_mb THEN 'Over Limit'
            WHEN usage_percentage >= 90 THEN 'Near Limit (90%+)'
            ELSE 'Within Limit'
        END as usage_status
    FROM usage_by_allowance
    WHERE plan_category = '1 GB'
)
SELECT 
    usage_status,
    COUNT(*) as subscription_count,
    ROUND((COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()), 2) as percentage_of_1gb_users,
    ROUND(AVG(total_usage_mb), 2) as avg_usage_mb,
    ROUND(AVG(usage_percentage), 2) as avg_utilization_pct
FROM gb1_plan_analysis
GROUP BY usage_status
ORDER BY 
    CASE usage_status
        WHEN 'Over Limit' THEN 1
        WHEN 'Near Limit (90%+)' THEN 2
        WHEN 'Within Limit' THEN 3
    END;

Unnamed: 0,usage_status,subscription_count,percentage_of_1gb_users,avg_usage_mb,avg_utilization_pct
0,Over Limit,1721,28.86,2668.11,266.81
1,Near Limit (90%+),123,2.06,949.01,94.9
2,Within Limit,4119,69.08,240.72,24.07


## Question 3: Do subscriptions typically consume consistent amounts of data throughout their lifetime?

In [234]:
%%sql 
   WITH subscription_periods AS (
    SELECT 
        subscription_id,
        project_id__hashed,
        subscription_period_number,
        cumulative_data_usage_megabyte,
        LAG(cumulative_data_usage_megabyte) OVER (
            PARTITION BY subscription_id 
            ORDER BY subscription_period_number
        ) as previous_usage
    FROM usage_data 
    WHERE cumulative_data_usage_megabyte > 0
), 
period_usage AS ( 
    SELECT  
        subscription_id, 
        project_id__hashed, 
        subscription_period_number, 
        cumulative_data_usage_megabyte, 
        previous_usage, 
        CASE  
            WHEN previous_usage IS NOT NULL  
            THEN cumulative_data_usage_megabyte - previous_usage 
            ELSE cumulative_data_usage_megabyte  
        END as period_usage_mb 
    FROM subscription_periods 
), 
subscription_variability AS ( 
    SELECT  
        subscription_id, 
        project_id__hashed, 
        COUNT(*) as total_periods, 
        ROUND(QUANTILE(period_usage_mb, 0.5), 2) as median_period_usage,
        ROUND(AVG(period_usage_mb), 2) as avg_period_usage, 
        ROUND(STDDEV(period_usage_mb), 2) as usage_stddev, 
        ROUND(MIN(period_usage_mb), 2) as min_period_usage, 
        ROUND(MAX(period_usage_mb), 2) as max_period_usage, 
        CASE  
            WHEN AVG(period_usage_mb) > 0  
            THEN ROUND((STDDEV(period_usage_mb) / AVG(period_usage_mb)), 2) 
            ELSE NULL  
        END as coefficient_of_variation 
    FROM period_usage 
    WHERE period_usage_mb IS NOT NULL AND period_usage_mb >= 0 
    GROUP BY subscription_id, project_id__hashed 
    HAVING COUNT(*) > 1 
) 
Select *,Round((subscription_count/sum(subscription_count) over ())*100,1) as per_of_sub_count from (
SELECT  
    CASE 
        WHEN coefficient_of_variation < 0.5 THEN 'Consistent (CV < 0.5)' 
        WHEN coefficient_of_variation < 1.0 THEN 'Moderate Variability (CV 0.5-1.0)' 
        WHEN coefficient_of_variation < 2.0 THEN 'High Variability (CV 1.0-2.0)' 
        ELSE 'Very High Variability (CV > 2.0)' 
    END as usage_consistency, 
    ROUND(QUANTILE(median_period_usage, 0.5), 2) as median_usage_mb,
    ROUND(AVG(avg_period_usage), 2) as avg_usage_mb, 
     ROUND(QUANTILE(coefficient_of_variation, 0.5), 2) as median_cv,
    ROUND(AVG(coefficient_of_variation), 2) as avg_cv,
    COUNT(*) as subscription_count
FROM subscription_variability 
WHERE coefficient_of_variation IS NOT NULL 
GROUP BY 1 
ORDER BY avg_cv)

Unnamed: 0,usage_consistency,median_usage_mb,avg_usage_mb,median_cv,avg_cv,subscription_count,per_of_sub_count
0,Consistent (CV < 0.5),163.38,916.67,0.32,0.29,654,14.7
1,Moderate Variability (CV 0.5-1.0),172.77,943.23,0.78,0.77,2046,45.9
2,High Variability (CV 1.0-2.0),88.47,1224.12,1.24,1.29,1705,38.2
3,Very High Variability (CV > 2.0),97.53,3563.17,2.15,2.23,55,1.2


## Question 4: Compare retention pattern for the most recently launched project versus the two older ones

In [221]:
%%sql
WITH monthly_usage AS (
    -- Get usage data with month/year grouping and join with projects
    SELECT 
        u.subscription_id,
        p.organization_name as project_name,
        DATE_TRUNC('month', u.reporting_date::date) as usage_month,
        u.cumulative_data_usage_megabyte,
        -- Flag months where usage > 1MB
        CASE WHEN u.cumulative_data_usage_megabyte > 1 THEN 1 ELSE 0 END as has_high_usage
    FROM usage_data u
    JOIN projects p ON u.project_id__hashed = p.project_id__hashed
),

subscription_monthly_counts AS (
    -- Count months with >1MB usage per subscription
    SELECT 
        subscription_id,
        project_name,
        SUM(has_high_usage) as months_with_high_usage
    FROM monthly_usage
    GROUP BY 1,2
),

project_min_dates AS (
    -- Get the minimum reporting date for each project
    SELECT 
        p.organization_name as project_name,
        MIN(u.reporting_date::date) as min_date
    FROM usage_data u
    JOIN projects p ON u.project_id__hashed = p.project_id__hashed
    GROUP BY p.organization_name
),

project_total_subscriptions AS (
    -- Get total distinct subscription count per project
    SELECT 
        p.organization_name as project_name,
        COUNT(DISTINCT u.subscription_id) as total_distinct_subscriptions
    FROM usage_data u
    JOIN projects p ON u.project_id__hashed = p.project_id__hashed
    GROUP BY p.organization_name
),

project_duration AS (
    -- Count total distinct months the project has data for
    SELECT 
        p.organization_name as project_name,
        COUNT(DISTINCT DATE_TRUNC('month', u.reporting_date::date)) as total_months_with_data
    FROM usage_data u
    JOIN projects p ON u.project_id__hashed = p.project_id__hashed
    GROUP BY p.organization_name
)

-- Final result: Count of subscriptions by months of high usage, with percentages and running totals
SELECT 
    smc.project_name,
    smc.months_with_high_usage,
    COUNT(DISTINCT smc.subscription_id) as distinct_subscription_count,
    pts.total_distinct_subscriptions,
    pd.total_months_with_data,
    -- Individual percentage of total subscriptions for this project
    ROUND(
        (COUNT(DISTINCT smc.subscription_id) * 100.0 / pts.total_distinct_subscriptions), 2
    ) as individual_percentage,
    -- Running total percentage within each project
    ROUND(
        (SUM(COUNT(DISTINCT smc.subscription_id)) 
         OVER (PARTITION BY smc.project_name 
               ORDER BY smc.months_with_high_usage 
               ROWS UNBOUNDED PRECEDING) * 100.0 / pts.total_distinct_subscriptions), 2
    ) as running_total_percentage,
    pmd.min_date as project_min_date
FROM subscription_monthly_counts smc
JOIN project_min_dates pmd ON smc.project_name = pmd.project_name
JOIN project_total_subscriptions pts ON smc.project_name = pts.project_name
JOIN project_duration pd ON smc.project_name = pd.project_name
GROUP BY smc.project_name, smc.months_with_high_usage, pts.total_distinct_subscriptions, pd.total_months_with_data, pmd.min_date
ORDER BY pmd.min_date ASC, smc.project_name, smc.months_with_high_usage;

Unnamed: 0,project_name,months_with_high_usage,distinct_subscription_count,total_distinct_subscriptions,total_months_with_data,individual_percentage,running_total_percentage,project_min_date
0,ACME Phone,0.0,710,6737,17,10.54,10.54,2024-02-07
1,ACME Phone,1.0,1040,6737,17,15.44,25.98,2024-02-07
2,ACME Phone,2.0,675,6737,17,10.02,36.0,2024-02-07
3,ACME Phone,3.0,551,6737,17,8.18,44.17,2024-02-07
4,ACME Phone,4.0,445,6737,17,6.61,50.78,2024-02-07
5,ACME Phone,5.0,329,6737,17,4.88,55.66,2024-02-07
6,ACME Phone,6.0,272,6737,17,4.04,59.7,2024-02-07
7,ACME Phone,7.0,264,6737,17,3.92,63.62,2024-02-07
8,ACME Phone,8.0,255,6737,17,3.79,67.4,2024-02-07
9,ACME Phone,9.0,235,6737,17,3.49,70.89,2024-02-07


### Analysis Results for Question 4

After digging into the data I believe the provided data is too limited to make any meaningful comparison between the retention among the the different plans. We have 15 full Months of data (and 17 months including partial months) for Acme Phone, so some conclusions can be drawn about it individually. However, Smart Devices has 2 months of full data and People Mobile only 1. This means we could compare retention month 2-3 between Smart Device and Acme Phone, but that is much too small of a data sample to draw any meaningful conclusion, The 1 month of complete data for People Mobile means no conclusion can be made at all.