<a href="https://colab.research.google.com/github/janilles/dfdapp/blob/master/dfd_pledges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Drink Free Days app

# PLEDGES ANALYSIS

Lorem ipsusm 

# Credentials to run the notebook

## Google Drive authentication (optional step)
NOTE: If login credentials are hardcoded into the database connection (code cell below) this step in not necesary. Otherwise:

Install and authenticate [PyDrive](https://pythonhosted.org/PyDrive/index.html) for loading files from Google Drive so that database passwords aren't hardcoded into the notebook.


In [0]:
# added -q for suppressing output
!pip install -U -q PyDrive

# see PyDrive documentation for libraries code snippets
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# authenticate and create the PyDrive client
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


## Database connection
- Connecting to AWS RDS database with [PyMySQL](https://pymysql.readthedocs.io/en/latest/user/examples.html).
- Retruning MySQL queries as Pandas dataframes with [```read_sql()```](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html) function.

In [0]:
# added -q for suppressing output
!pip install -q -U pymysql

import pymysql
import pandas as pd


In [0]:
# only run if using PyDrive: 'id' is Google Drive file ID
hostnm_file = drive.CreateFile(
    {'id': '1lL_DFWe2F0yNTZRJ3sq5bpJeHgCTqxMn'})
usernm_file = drive.CreateFile(
    {'id': '1l0NedyVzKKhPJ1-_cOqF1VRt_oQyr8OL'})
passwd_file = drive.CreateFile(
    {'id': '1YnGugBHvqjJk0nbTqN-683Agb0vaZKHo'})
dbname_file = drive.CreateFile(
    {'id': '1_mZ3aYMcWdRKKJXRud4sPwnQc8vVgznC'})

# variables used in the connect function below
host_name = hostnm_file.GetContentString()
user_name = usernm_file.GetContentString()
user_passwd = passwd_file.GetContentString()
db_name = dbname_file.GetContentString()


In [0]:
def connect():
    return pymysql.connect(
host = host_name,
        user=user_name,
        passwd=user_passwd,
        db=db_name,
        autocommit=True
        )

connection = connect()

def sql_to_df(sql):
    return pd.read_sql(sql, con=connection)


# Database tables used in reports (optional step)
Overview of avaliable data and tables used in the MySQL queries below. See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for MySQL syntax.

In [0]:
# formatting column width of Pandas dataframes
# increase column width so that longer comments don't get truncated
pd.set_option('max_colwidth', 100)


### Pledges table

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_apppledges'
        """)


### Days off table (drink free days)

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_appdaysoff'
        """)


### App users table

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_appusers'
        """)


# Reports
SQL queries as strings inside ```qud()``` function defined as pymysql connection above.  
See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for SQL syntax reference.

## Are people meeting their pledges (during campaign)?
Comparison of the number of days pledged and actual drink free days achieved. 

**Method:**  
Drink free days achieved are aggregated by week
number and then week numbers of pledges are matched to week numbers of drink free days.

**NB:** 
- Pledge table only holds week number for a user when a pledge is set or changed - not for all weeks a user uses the app.
- Default MySQL week begins with Sunday, so to make weeks begin with Monday use: ```WEEK(date, 1)``` instead of ```WEEK(date)```

In [0]:
#@title SQL query { vertical-output: true }
comparison = sql_to_df("""
    SELECT
        O.id,
        WEEK(date, 1) AS week_of_dfds, -- make sure week starts with Monday
        COUNT(DISTINCT date) AS dfds_in_week,
        WEEK(week, 1) AS pledge_week, -- make sure week starts with Monday
        daycount AS days_pledged
    FROM
        g_appdaysoff O
    LEFT JOIN
        g_apppledges P
    ON
        O.id=P.id
    AND
        WEEK(date, 1)=WEEK(week, 1) -- comparing like for like weeks
    WHERE
        YEAR(date) = 2018 -- else week days from other years are pulled in
    AND
        WEEK(date, 1) >= 37 -- campaign start week is 37
    GROUP BY
        id, week_of_dfds
    """)

# comparison.head()


### The quick version (smaller data sample)
Remove missing pledge week values and keep only like for like weeks.

Pledges table only holds weeks for when a pledge was initially set or altered by a user.

In [0]:
#@title Table
# remove missing values
comparison_quick = comparison.dropna()

# change days pledged values from float to integer
comparison_quick = comparison_quick.astype(
    {'days_pledged': 'int'}) 

# group data by days pledged
comparison_quick = round(
    comparison_quick.groupby('days_pledged',
                             as_index=False).agg(
                                {
                                    'dfds_in_week': 'mean',
                                    'id': 'count'
                                }), 1)

# rename columns for clarity
comparison_quick.rename(columns={
                            'dfds_in_week': 'avg_achieved',
                            'id': 'num_of_users'
                        },
                        inplace=True)

# create new column with percentages
comparison_quick['%_of_target'] = round(
    comparison_quick['avg_achieved'] /
    comparison_quick['days_pledged']*100, 1)

comparison_quick


In [0]:
#@title Chart
import altair as alt

bubble_quick = alt.Chart(
    comparison_quick,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title='Number of users')),
    tooltip=['%_of_target',
             'num_of_users']
)

rule_quick = alt.Chart(
    comparison_quick).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble_quick + rule_quick


### The longer version (larger data sample)

Replacing most missing values for pledges by copying the last set value into the weeks a user is clocking drink free days.

That's needed because the pledges table only holds weeks for when a pledge was initially set or altered.

In [0]:
#@title Table
# reduce dataframe size for for loop
comparison = comparison.astype({'week_of_dfds': 'category',
                                'dfds_in_week': 'category',
                                'pledge_week': 'category',
                                'days_pledged': 'category'})

user = comparison['id']
pledge = comparison['days_pledged']

# fill days_pledged according to these rules
for i in range(len(comparison)):
    if pd.isnull(pledge.iloc[i]):
        if user.iloc[i] == user.iloc[i-1]:
            pledge.iloc[i] = pledge.iloc[i-1]
        else:
            pass
    else:
        pass

# remove rows with NaN in days_pledged
comparison = comparison.loc[
    pd.notna(comparison['days_pledged'])]

# change data types
comparison = comparison.astype(
    {
        'days_pledged': 'int',
        'dfds_in_week': 'int'
    })

# group data by days pledged
comparison = round(
    comparison.groupby('days_pledged',
                       as_index=False).agg(
                                {
                                    'dfds_in_week': 'mean',
                                    'id': 'count'
                                }), 1)

# rename columns for clarity
comparison.rename(columns={
                            'dfds_in_week': 'avg_achieved',
                            'id': 'num_of_users'
                          },
                  inplace=True)

# create new column with percentages
comparison['%_of_target'] = round(
    comparison['avg_achieved'] /
    comparison['days_pledged']*100, 1)

comparison


In [0]:
#@title Chart
# import altair as alt

bubble = alt.Chart(
    comparison,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title="Number of users")),
    tooltip=['%_of_target',
             'num_of_users']
)

rule = alt.Chart(comparison).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble + rule


## Are people meeting their plredges (pre campaign)?
Same as above (quick version) with pre-campaign weeks in 2018.

In [0]:
#@title SQL query
comparisonPRE = sql_to_df("""
    SELECT
        O.id,
        WEEK(date, 1) AS week_of_dfds, -- make sure week starts with Monday
        COUNT(DISTINCT date) AS dfds_in_week,
        WEEK(week, 1) AS pledge_week, -- make sure week starts with Monday
        daycount AS days_pledged
    FROM
        g_appdaysoff O
    LEFT JOIN
        g_apppledges P
    ON
        O.id=P.id
    AND
        WEEK(date, 1)=WEEK(week, 1) -- comparing like for like weeks
    WHERE
        YEAR(date) = 2018 -- else week days from other years are pulled in
    AND
        WEEK(date, 1) < 37 -- campaign start week is 37
    GROUP BY
        id, week_of_dfds
    """)

# comparisonPRE.head()


In [0]:
#@title Table
# remove missing values
comparison_quick2 = comparisonPRE.dropna()

# change days pledged values from float to integer
comparison_quick2 = comparison_quick2.astype({'days_pledged': 'int'}) 

# group data by days pledged
comparison_quick2 = round(
    comparison_quick2.groupby('days_pledged',
                              as_index=False).agg(
                                 {
                                     'dfds_in_week': 'mean',
                                     'id': 'count'
                                 }), 1)

# rename columns for clarity
comparison_quick2.rename(columns={
                             'dfds_in_week': 'avg_achieved',
                             'id': 'num_of_users'
                         },
                         inplace=True)

# create new column with percentages
comparison_quick2['%_of_target'] = round(
    comparison_quick2['avg_achieved'] /
    comparison_quick2['days_pledged']*100, 1)

comparison_quick2


In [0]:
#@title Chart
bubble2 = alt.Chart(
    comparison_quick2,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title="Number of users")),
    tooltip=['%_of_target',
             'num_of_users']
)

rule2 = alt.Chart(comparison_quick2).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble2 + rule2


## Are people increasing their pledges (during campaign)?
Data caveats, assumptions and metodology:
- users can join and not pledge until later
- **NB:** pledges table doesn't hold records for all weeks (just last change ot pleding)
- if people stopped using the app we assume their last pledged number of days remains (for simplicity) i.e. they didn't increase their pledging which is what we're interested in anyway
- dropping those who (in the weeks we're looking at):
 - do not pledge at all 
 - do not change thier pledge at all

In [0]:
pledges = sql_to_df("""
    SELECT
        P.id,
        WEEK(week, 1) AS pledge_week,
        daycount AS days_pledged
    FROM
        g_apppledges P
    INNER JOIN
        g_appusers U
    ON
        P.id=U.id
    WHERE
        gender LIKE '%ale' -- exclude empty values in gender
    AND
        age BETWEEN 19 AND 79 -- 18 and 80+ are outliers
    AND
        YEAR(joined) = 2018 -- important for week number calculations
    AND
        WEEK(joined, 1) = 37 -- joined in the first week of campaign
    AND
        WEEK(lastseen, 1) >= 38 -- at least in the next week after launch
    ORDER BY
        id, pledge_week
        """)

pledges.head()


In [0]:
def pledge_columns(df=pledges):
    """
    Create columns for pledge weeks in range.
    """
    for week in range(37, 43):
        col_name = f"pledge_{week}"
        df.loc[df['pledge_week'] == week,
               col_name] = df['days_pledged']
        df.fillna(0, inplace=True)
    return df


In [0]:
result = pledge_columns()

# drop redundant columns
result.drop(columns=['pledge_week',
                     'days_pledged'],
            inplace=True)

# group dataframe by users
# call .sum() on the groupby object because
# there's at most one non-zero value per given week per user
result = result.groupby('id', as_index=False).sum()

result.head()


**Next step explanation:**

Because the pledges table doesn't hold data for all weeks a user is using the app we need to populate the subsequent weeks from the week a pledge was set or changed.

For simplicity, users who stopped using the app are given the last pledge value set for subsequent weeks. Alternative solution would be to only look at users who were last seen in the last (or later) week of the period we're looking at.

**Question for Jimmy:** Pledge can't be set to zero? Seems like it can't since there are no zero values in the pledges table.

In [0]:
# range starts at earliest week+1
for week in range(38, 43):
    result.loc[(result[f"pledge_{week}"] == 0) &
               (result[f"pledge_{week-1}"] != 0),
               f"pledge_{week}"] = result[f"pledge_{week-1}"]

result.head()


In [0]:
# increased pledging or started to pledge
# if value in week 37 < avg. value in remaining weeks
increased = result.loc[result['pledge_37'] <
                       result.iloc[:, 2:].sum(axis=1) /
                       len(result.iloc[:, 2:].columns)]

increased.head()


In [0]:
# decreased pledging 
# if value in week 37 > avg. value in remaining weeks
decreased = result.loc[result['pledge_37'] >
                       result.iloc[:, 2:].sum(axis=1) /
                       len(result.iloc[:, 2:].columns)]

decreased.head()


### Results

In [0]:
print(f"Total users: {pledges['id'].nunique()}")
      
print(f"Users who increased pledges: {len(increased)} \
({round(len(increased)/pledges['id'].nunique()*100, 1)}%)")

print(f"Users who decreased pledges: {len(decreased)} \
({round(len(decreased)/pledges['id'].nunique()*100, 1)}%)")


### Most common pledge settings and changes among app users

In [0]:
grouped = result.groupby(['pledge_37',
                          'pledge_38',
                          'pledge_39',
                          'pledge_40',
                          'pledge_41',
                          'pledge_42'],
                         as_index=False)['id'].count()

# rename column for clarity
grouped.rename(columns={'id': 'num_of_users'},
               inplace=True)

# create a percentage column
grouped['%_of_total'] = round(
    grouped['num_of_users'] /
    grouped['num_of_users'].sum()*100, 1)

# sort by the most common pledge journey
grouped.sort_values('%_of_total',
                    ascending=False).head(20)


### Melted result in a bar chart
Chart is not normalised - it assumes that nobody stops using the app for the purposes of pledge analysis. See notes under the report header.

In [0]:
# prepare data for stacked bar plot
melted = pd.melt(grouped,
                 id_vars=['num_of_users'],
                 value_vars=['pledge_37',
                             'pledge_38',
                             'pledge_39',
                             'pledge_40',
                             'pledge_41',
                             'pledge_42'],
                 value_name='days_pledged')

melted.head()


In [0]:
import altair as alt

alt.Chart(melted).mark_bar().encode(
    x='variable',
    y='sum(num_of_users)',
    color='days_pledged:N'
)

## User-level pledge journeys (visual data analysis)

In [0]:
# only keep users who either increase or decrease pledges
journeys = result.loc[result['pledge_37'] != result.iloc[:,1:].sum(axis=1) / 6]
journeys.set_index('id', inplace=True)
journeys.head()

In [0]:
journeys.loc[journeys['pledge_37'] == 0].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 0 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 1].T.plot(legend=False, 
                                                alpha=0.3, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 1 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 2].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 2 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 3].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 3 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 4].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 4 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 5].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 5 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 6].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 6 in first week');

In [0]:
journeys.loc[journeys['pledge_37'] == 7].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 7 in first week');

In [0]:
journeys.loc[journeys['pledge_38'] == 0].T.plot(legend=False, 
                                                alpha=0.1, 
                                                figsize=(12,8), 
                                                title='User journeys with pledge at 0 in SECOND week');

# Old reports below - check if they are still necessary 

## Pleges overview in numbers
* number of all users (a.k.a. app downloads)
* number of users who have pledged 
* user conversion (from downloading the app to pledging)
* number of pledges made
* average pledges per pledging user
* average pledges per all user (a.k.a. per app downloads)

### Lifetime of the app

In [0]:
# before the new app version June 2018

pledgesOverview = sql_to_df(
    """
    select
        count(distinct g_appusers.id) as 'All users',
        count(distinct g_apppledges.id) as 'Pledging users',
        round(count(distinct g_apppledges.id) / (count(distinct g_appusers.id) /100), 1) as 'User conversion %',
        count(g_apppledges.id) as 'Total pledges',
        round(count(g_apppledges.id) / count(distinct g_apppledges.id), 1) as 'Pledges/pledging user',
        round(count(g_apppledges.id) / count(distinct g_appusers.id), 1) as 'Pledges/all users'
    from
        daysoff.g_apppledges right join daysoff.g_appusers
        on g_apppledges.id=g_appusers.id
    where
        joined < '2018-06-01' -- before the new app version
    """)

pledgesOverview

### Campaign period
Fill in the dates in the 'where' clause of the SQL query below as necessary.

In [0]:
pledgesOverviewCampaign = sql_to_df(
    """
    select
        count(distinct g_appusers.id) as 'All users',
        count(distinct g_apppledges.id) as 'Pledging users',
        round(count(distinct g_apppledges.id) / (count(distinct g_appusers.id) /100), 1) as 'User conversion %',
        count(g_apppledges.id) as 'Total pledges',
        round(count(g_apppledges.id) / count(distinct g_apppledges.id), 1) as 'Pledges/pledging user',
        round(count(g_apppledges.id) / count(distinct g_appusers.id), 1) as 'Pledges/all users'
    from
        daysoff.g_apppledges right join daysoff.g_appusers
        on g_apppledges.id=g_appusers.id
    where
        joined between '2018-09-03' and '2018-09-17'
    """)

pledgesOverviewCampaign

## Pledges on timelines

In [0]:
pledges = sql_to_df(
    """
    select
        id as 'user id', 
        week as 'pledged week', 
        week(week) as 'week number',
        month(week) as 'month number',
        daycount as 'days pledged'
    from
        daysoff.g_apppledges
    """)

pledges.head()

In [0]:
pledges.iloc[:,[3,4]].corr()

In [0]:
# converting strings to datetime 

pledges['pledged week'] = pd.to_datetime(pledges['pledged week'])

### Pledges by week number (2017 and 2018 on the same axis)

In [0]:
pledges_byWeekNumber = pledges.groupby('week number')['user id'].count()

pledges_byWeekNumber.plot(title='Number of pledges by week number (2017 and 2018 on the same axis)');

### Pledges by month (2017 and 2018 on the same axis)

In [0]:
pledges_byMonthNumber = pledges.groupby('month number')['user id'].count()

pledges_byMonthNumber.plot(title='Number of pledges by month (2017 and 2018 on the same axis)');

### Pledges by calendar week

In [0]:
pledges_byCalendarWeek = pledges.groupby('pledged week')['user id'].count()

pledges_byCalendarWeek.plot(title='Number of pledges - app history timeline');

## How many days do people pledge

In [0]:
pledges_daysPledged = pledges.groupby('days pledged')['user id'].count()

pledges_daysPledged.plot(title='How many users pledge how many days');

In [0]:
#@title
pledgesCampaign = sql_to_df(
    """
    select
        g_apppledges.id as 'user id', 
        week as 'pledged week', 
        week(week) as 'week number',
        month(week) as 'month number',
        daycount as 'days pledged'
    from
        daysoff.g_apppledges right join daysoff.g_appusers
        on g_apppledges.id=g_appusers.id
    where
        joined between '2018-09-03' and '2018-09-13'
    """)


In [0]:
#@title
pledges_daysPledged = pledgesCampaign.groupby('days pledged')['user id'].count()

pledges_daysPledged.plot(title='How many users pledge how many days (campaign period)');

## Pledge variation across months
Do people pledge more or less days depending on what time of year it is?

In [0]:
pledges_variationMonth = pledges.groupby(['month number', 'days pledged'], as_index=False)['user id'].count()

### Altair Viz charts

For customisation of Altair charts [see documentation](https://altair-viz.github.io/user_guide/customization.html).

For different colour schemes replace 'scheme_name' with a string that matches any of the available [Vega color schemes]( https://vega.github.io/vega/docs/schemes/#reference).

```alt.Color('column name', scale=alt.Scale(scheme='scheme name'))```  

In [0]:
import altair as alt

In [0]:
alt.Chart(pledges_variationMonth, title='Days pledged by calendar month').mark_line().encode(
    x='days pledged', 
    y=alt.X('user id', axis=alt.Axis(title='number of users')), 
    # y='user id',
    color=alt.Color('month number:N', scale=alt.Scale(scheme='tableau20'))
)

In [0]:
#@title
# get totals for each month
pledges_variationMonthsGrouped = pledges_variationMonth.groupby('month number', as_index=False)['user id'].sum()

# get both values for percentage calculation into one table
pledges_variationMonthNorm = pd.merge(pledges_variationMonth, 
                                      pledges_variationMonthsGrouped, 
                                      how='inner', 
                                      on='month number', 
                                      suffixes=('_mth', '_sum'))

# calculate percentages
pledges_variationMonthNorm['percent of total users each month'] = pledges_variationMonthNorm['user id_mth'] / (pledges_variationMonthNorm['user id_sum'] / 100)

# plot
alt.Chart(pledges_variationMonthNorm, title='Days pledged by calendar month - normalised').mark_line().encode(
    x='days pledged',
    y='percent of total users each month',
    color=alt.Color('month number:N', scale=alt.Scale(scheme='tableau20')),
    # opacity=alt.value(0.5),
)


## Pledge days

### Most popular day combinations
1 = Monday ... 0 = Sunday

In [0]:
pledgeDayCombi = sql_to_df(
    """
    select
        count(distinct id) as 'number of users',
        daycount as 'number of days pledged',
        days as 'day combinations'
    from
        g_apppledges 
    group by 
        days, daycount
	order by
		count(distinct id) desc, daycount desc 
    """)

pledgeDayCombi.head(10)

In [0]:
alt.Chart(pledgeDayCombi).mark_bar().encode(
    #y='day combinations',
    #x='number of users'
    x=alt.X('number of users', sort="ascending"),
    y=alt.X('day combinations', sort="ascending")
)

### Most popular day combinations
1 = Monday ... 0 = Sunday

In [0]:
pledgeDays = sql_to_df(
    """
    select
        count(distinct id) as 'number of users',
        days as 'day combinations'
    from
        g_apppledges 
    group by 
        days
    order by
        count(distinct id) desc
    """)

pledgeDays.head()