<a href="https://colab.research.google.com/github/janilles/dfdapp/blob/master/dfd_pledges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Drink Free Days app

# PLEDGES ANALYSIS

Questions answered in this notebook:

- What's the conversion from downloading the app to pledging?
- Are people meeting their pledges?
- Are people increasing their pledges?
- What are the most popular pledge day combinations?

# Credentials to run the notebook

## Google Drive authentication (optional step)
NOTE: If login credentials are hardcoded into the database connection (code cell below) this step in not necesary. Otherwise:

Install and authenticate [PyDrive](https://pythonhosted.org/PyDrive/index.html) for loading files from Google Drive so that database passwords aren't hardcoded into the notebook.


In [0]:
# added -q for suppressing output
!pip install -U -q PyDrive

# see PyDrive documentation for libraries code snippets
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# authenticate and create the PyDrive client
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


## Database connection
- Connecting to AWS RDS database with [PyMySQL](https://pymysql.readthedocs.io/en/latest/user/examples.html).
- Retruning MySQL queries as Pandas dataframes with [```read_sql()```](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html) function.

In [0]:
# added -q for suppressing output
!pip install -q -U pymysql

import pymysql
import pandas as pd


In [0]:
# only run if using PyDrive: 'id' is Google Drive file ID
hostnm_file = drive.CreateFile(
    {'id': '1lL_DFWe2F0yNTZRJ3sq5bpJeHgCTqxMn'})
usernm_file = drive.CreateFile(
    {'id': '1l0NedyVzKKhPJ1-_cOqF1VRt_oQyr8OL'})
passwd_file = drive.CreateFile(
    {'id': '1YnGugBHvqjJk0nbTqN-683Agb0vaZKHo'})
dbname_file = drive.CreateFile(
    {'id': '1_mZ3aYMcWdRKKJXRud4sPwnQc8vVgznC'})

# variables used in the connect function below
host_name = hostnm_file.GetContentString()
user_name = usernm_file.GetContentString()
user_passwd = passwd_file.GetContentString()
db_name = dbname_file.GetContentString()


In [0]:
def connect():
    return pymysql.connect(
        host = host_name,
        port= 1313,
        user= user_name,
        passwd= user_passwd,
        db= db_name,
        autocommit=True
        )

connection = connect()

def sql_to_df(sql):
    return pd.read_sql(sql, con=connection)


# Database tables used in reports (optional step)
Overview of avaliable data and tables used in the MySQL queries below. See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for MySQL syntax.

In [0]:
# formatting column width of Pandas dataframes
# increase column width so that longer comments don't get truncated
pd.set_option('max_colwidth', 100)


### Pledges table

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_apppledges'
        """)


### Days off table (drink free days)

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_appdaysoff'
        """)


### App users table

In [0]:
# run pd.set_option('max_colwidth', 100) if comments column gets truncated
sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_appusers'
        """)


# Reports
SQL queries as strings inside ```sql_to_df()``` function defined as PyMySQL connection above.  
See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for SQL syntax reference.

## Conversion from downloading the app to pledging
- **NB for SQL query: ** Pledges table only ever holds one week for each user
- 'downloading' = installing the app i.e. user ID appearing in the database.

### During the first four weeks of campaign
Only for users who have last used the app in the fourth week of campaign or later and downloaded the app whenever in 2018.

In [0]:
#@title SQL query to dataframe
conversion_campaign = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT U.id) AS all_users,
        COUNT(DISTINCT P.id) AS of_which_pledging_users,
        ROUND(COUNT(DISTINCT P.id) / (COUNT(DISTINCT U.id) /100), 1) AS conversion_rate,
        CAST(ROUND(SUM(P.sum_days), 1) AS UNSIGNED) AS sum_days_pledged,
        ROUND(AVG(P.avg_days), 1) AS avg_per_user_per_wk
    FROM
        (SELECT
            id,
            AVG(daycount) AS avg_days,
            SUM(daycount) AS sum_days
        FROM
            g_apppledges
        WHERE
            YEAR(week) = 2018
        AND
            WEEK(week, 1) BETWEEN 37 AND 40 -- first 4 weeks of campaign
            
        GROUP BY
            id) P
    RIGHT JOIN 
        g_appusers U
    ON
        P.id=U.id
    WHERE
        YEAR(joined) = 2018
    AND
        WEEK(lastseen, 1) >= 40
    """)

conversion_campaign

### Users who downloaded the app during the first week of campaign
This means it takes their values from the pledges table for however long they’ve used the app for (since it’s a right join and the “joined week = 37” condition has been satisfied).

In [0]:
#@title SQL query to dataframe
conversion = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT U.id) AS all_users,
        COUNT(DISTINCT P.id) AS of_which_pledging_users,
        ROUND(COUNT(DISTINCT P.id) / (COUNT(DISTINCT U.id) /100), 1) AS conversion_rate,
        CAST(ROUND(SUM(P.sum_days), 1) AS UNSIGNED) AS sum_days_pledged,
        ROUND(AVG(P.avg_days), 1) AS avg_per_user_per_wk
    FROM
        (SELECT
            id,
            AVG(daycount) AS avg_days,
            SUM(daycount) AS sum_days
        FROM
            g_apppledges
        GROUP BY
            id) P
    RIGHT JOIN 
        g_appusers U
    ON
        P.id=U.id
    WHERE
        YEAR(joined) = 2018
    AND
        WEEK(joined, 1) = 37 -- first week of campaign
    """)

conversion

### Users who downloaded the app pre-campaign launch
Downloads of all of 2018 up to launch in week 37.

In [0]:
#@title SQL query to dataframe
conversion_pre = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT U.id) AS all_users,
        COUNT(DISTINCT P.id) AS of_which_pledging_users,
        ROUND(COUNT(DISTINCT P.id) / (COUNT(DISTINCT U.id) /100), 1) AS conversion_rate,
        CAST(ROUND(SUM(P.sum_days), 1) AS UNSIGNED) AS sum_days_pledged,
        ROUND(AVG(P.avg_days), 1) AS avg_per_user_per_wk
    FROM
        (SELECT
            id,
            AVG(daycount) AS avg_days,
            SUM(daycount) AS sum_days
        FROM
            g_apppledges
        GROUP BY
            id) P
    RIGHT JOIN 
        g_appusers U
    ON
        P.id=U.id
    WHERE
        YEAR(joined) = 2018
    AND
        WEEK(joined, 1) < 37 -- first week of campaign
    
    """)

conversion_pre

### Users who downloaded the app pre-campaign launch AND kept using it at least a week after campaign launch

In [0]:
#@title SQL to dataframe
conversion_pre_post = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT U.id) AS all_users,
        COUNT(DISTINCT P.id) AS of_which_pledging_users,
        ROUND(COUNT(DISTINCT P.id) / (COUNT(DISTINCT U.id) /100), 1) AS conversion_rate,
        CAST(ROUND(SUM(P.sum_days), 1) AS UNSIGNED) AS sum_days_pledged,
        ROUND(AVG(P.avg_days), 1) AS avg_per_user_per_wk
    FROM
        (SELECT
            id,
            AVG(daycount) AS avg_days,
            SUM(daycount) AS sum_days
        FROM
            g_apppledges
        GROUP BY
            id) P
    RIGHT JOIN 
        g_appusers U
    ON
        P.id=U.id
    WHERE
        YEAR(joined) = 2018
    AND
        WEEK(joined, 1) < 37 -- first week of campaign
    AND
        WEEK(lastseen, 1) >= 38
    
    """)

conversion_pre_post

### Users who downloaded the app pre-campaign launch AND stopped using it pre-campaign launch

In [0]:
#@title SQL to dataframe
conversion_pre_pre = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT U.id) AS all_users,
        COUNT(DISTINCT P.id) AS of_which_pledging_users,
        ROUND(COUNT(DISTINCT P.id) / (COUNT(DISTINCT U.id) /100), 1) AS conversion_rate,
        CAST(ROUND(SUM(P.sum_days), 1) AS UNSIGNED) AS sum_days_pledged,
        ROUND(AVG(P.avg_days), 1) AS avg_per_user_per_wk
    FROM
        (SELECT
            id,
            AVG(daycount) AS avg_days,
            SUM(daycount) AS sum_days
        FROM
            g_apppledges
        GROUP BY
            id) P
    RIGHT JOIN 
        g_appusers U
    ON
        P.id=U.id
    WHERE
        YEAR(joined) = 2018
    AND
        WEEK(joined, 1) < 37 -- first week of campaign
    AND
        WEEK(lastseen, 1) < 37
    
    """)

conversion_pre_pre

## Are people meeting their pledges (during campaign)?
Comparison of the number of days pledged and actual drink free days achieved. 

**Method:**  
Drink free days achieved are aggregated by week
number and then week numbers of pledges are matched to week numbers of drink free days.

**NB:** 
- Pledge table only holds week number for a user when a pledge is set or changed - not for all weeks a user uses the app.
- Default MySQL week begins with Sunday, so to make weeks begin with Monday use: ```WEEK(date, 1)``` instead of ```WEEK(date)```

In [0]:
#@title SQL query { vertical-output: true }
comparison = sql_to_df("""
    SELECT
        O.id,
        WEEK(date, 1) AS week_of_dfds, -- make sure week starts with Monday
        WEEK(week, 1) AS pledge_week, -- make sure week starts with Monday
        COUNT(date) AS dfds_in_week,        
        daycount AS days_pledged_in_week
    FROM
        g_appdaysoff O
    LEFT JOIN -- so that week numbers higher than pledge weeks are included
        g_apppledges P
    ON
        O.id=P.id
    AND
        WEEK(date, 1)=WEEK(week, 1) -- comparing like for like weeks
    WHERE
        YEAR(date) = 2018 -- else week days from other years are pulled in
    AND
        WEEK(date, 1) >= 37 -- campaign start week is 37
    GROUP BY
        id, week_of_dfds
    """)

comparison.head()


### The quick version (smaller data sample)
Remove missing pledge week values and keep only like for like weeks.

Pledges table only holds weeks for when a pledge was initially set or altered by a user.  

Number of users in each category is not sum of all users as some are counted more than once if they alter their pledges. 

In [0]:
#@title Table
# remove missing values that are there because of LEFT JOIN
comparison_quick = comparison.dropna(inplace=False)

# change days pledged values from float to integer
comparison_quick = comparison_quick.astype(
    {'days_pledged_in_week': 'int'}) 

# group data by days pledged
comparison_quick = round(
    comparison_quick.groupby('days_pledged_in_week',
                             as_index=False).agg(
                                {
                                    'dfds_in_week': 'mean',
                                    'id': 'nunique'
                                }), 1)

# rename columns for clarity
comparison_quick.rename(columns={
                            'dfds_in_week': 'avg_achieved',
                            'id': 'num_of_users'
                        },
                        inplace=True)

# create new column with percentages
comparison_quick['%_of_target'] = round(
    comparison_quick['avg_achieved'] /
    comparison_quick['days_pledged_in_week']*100, 1)

comparison_quick


In [0]:
#@title Chart
import altair as alt

bubble_quick = alt.Chart(
    comparison_quick,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged_in_week',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title='Number of users')),
    tooltip=['%_of_target',
             'num_of_users']
)

rule_quick = alt.Chart(
    comparison_quick).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble_quick + rule_quick


### The longer version (larger data sample)

Replacing most missing values for pledges by copying the last set value into the weeks a user is clocking drink free days.

That's needed because the pledges table only holds weeks for when a pledge was initially set or altered.

**Q: Does the data frame need sorting by id and week numbers first?**

In [0]:
comparison.head()

In [0]:
# @title Table
# reduce dataframe size for for loop
comparison_long = comparison.astype({'week_of_dfds': 'category',
                                     'dfds_in_week': 'category',
                                     'pledge_week': 'category',
                                     'days_pledged_in_week': 'category'})

user = comparison_long['id']
pledge = comparison_long['days_pledged_in_week']

# fill days_pledged according to these rules
for i in range(len(comparison)):
    if pd.isnull(pledge.iloc[i]):
        if user.iloc[i] == user.iloc[i-1]:
            pledge.iloc[i] = pledge.iloc[i-1]
        else:
            pass
    else:
        pass

# remove rows with NaN in days_pledged
comparison_long = comparison_long.loc[
    pd.notna(comparison_long['days_pledged_in_week'])]

# change data types
comparison_long = comparison_long.astype(
    {
        'days_pledged_in_week': 'int',
        'dfds_in_week': 'int'
    })

# group data by days pledged
comparison_long = round(
    comparison_long.groupby('days_pledged_in_week',
                            as_index=False).agg(
                                   {
                                         'dfds_in_week': 'mean',
                                         'id': 'nunique'
                                    }), 1)

# rename columns for clarity
comparison_long.rename(columns={
                                 'dfds_in_week': 'avg_achieved',
                                 'id': 'num_of_users'
                               },
                       inplace=True)

# create new column with percentages
comparison_long['%_of_target'] = round(
    comparison_long['avg_achieved'] /
    comparison_long['days_pledged_in_week']*100, 1)

comparison_long


In [0]:
# @title Chart
# import altair as alt

bubble = alt.Chart(
    comparison_long,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged_in_week',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title="Number of users")),
    tooltip=['%_of_target',
             'num_of_users']
)

rule = alt.Chart(comparison_long).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble + rule


## Are people meeting their plredges (pre campaign)?
Same as above (quick version) with pre-campaign weeks in 2018.

In [0]:
# @title SQL query
comparisonPRE = sql_to_df("""
    SELECT
        O.id,
        WEEK(date, 1) AS week_of_dfds, -- make sure week starts with Monday
        WEEK(week, 1) AS pledge_week, -- make sure week starts with Monday
        COUNT(date) AS dfds_in_week,        
        daycount AS days_pledged_in_week
    FROM
        g_appdaysoff O
    LEFT JOIN
        g_apppledges P
    ON
        O.id=P.id
    AND
        WEEK(date, 1)=WEEK(week, 1) -- comparing like for like weeks
    WHERE
        YEAR(date) = 2018 -- else week days from other years are pulled in
    AND
        WEEK(date, 1) < 37 -- campaign start week is 37
    GROUP BY
        id, week_of_dfds
    """)

# comparisonPRE.head()


In [0]:
# @title Table
# remove missing values
comparison_quick2 = comparisonPRE.dropna(inplace=False)

# change days pledged values from float to integer
comparison_quick2 = comparison_quick2.astype({'days_pledged_in_week': 'int'}) 

# group data by days pledged
comparison_quick2 = round(
    comparison_quick2.groupby('days_pledged_in_week',
                              as_index=False).agg(
                                 {
                                     'dfds_in_week': 'mean',
                                     'id': 'nunique'
                                 }), 1)

# rename columns for clarity
comparison_quick2.rename(columns={
                             'dfds_in_week': 'avg_achieved',
                             'id': 'num_of_users'
                         },
                         inplace=True)

# create new column with percentages
comparison_quick2['%_of_target'] = round(
    comparison_quick2['avg_achieved'] /
    comparison_quick2['days_pledged_in_week']*100, 1)

comparison_quick2


In [0]:
#@title Chart
bubble2 = alt.Chart(
    comparison_quick2,
    title='Are people meeting their pledges?').mark_point(
).encode(
    alt.X('days_pledged_in_week',
          title='Number of days pledged',
          scale=alt.Scale(domain=[0, 8])),
    alt.Y('%_of_target',
          title='% of pledge met (on average)'),
    size=alt.Size('num_of_users',
                  legend=alt.Legend(title="Number of users")),
    tooltip=['%_of_target',
             'num_of_users']
)

rule2 = alt.Chart(comparison_quick2).mark_rule(color='orange').encode(
    y='mean(%_of_target)',
    size=alt.value(2),
    tooltip=['mean(%_of_target)']
)

bubble2 + rule2


## Are people increasing their pledges (during campaign)?
Data caveats, assumptions and metodology:
- users can join and not pledge until later
- users selection details: see WHERE statements in SQL query
- pledges table doesn't hold records for all weeks (just last change of pleding)
- if people stopped using the app we assume their last pledged number of days remains (for simplicity) i.e. they didn't increase their pledging which is what we're interested in anyway
- dropping those who (in the weeks we're looking at):
 - do not pledge at all 
 - do not change thier pledge at all

In [0]:
#@title SQL query
pledges = sql_to_df("""
    SELECT
        P.id,
        WEEK(week, 1) AS pledge_week,
        daycount AS days_pledged
    FROM
        g_apppledges P
    INNER JOIN
        g_appusers U
    ON
        P.id=U.id
    WHERE
        gender LIKE '%ale' -- exclude empty values in gender
    AND
        age BETWEEN 19 AND 79 -- 18 and 80+ are outliers
    AND
        YEAR(joined) = 2018 -- important for week number calculations
    AND
        WEEK(joined, 1) = 37 -- joined in the first week of campaign
    AND
        WEEK(lastseen, 1) >= 38 -- at least in the next week after launch
    ORDER BY
        id, pledge_week
        """)

# pledges.head()


#### Dataframe manipulation

In [0]:
def pledge_columns(df=pledges):
    """
    Create columns for pledge weeks in range.
    """
    for week in range(37, 43):
        col_name = f"pledge_{week}"
        df.loc[df['pledge_week'] == week,
               col_name] = df['days_pledged']
        df.fillna(0, inplace=True)
    return df

# call the function
pledge_columns();


In [0]:
# drop redundant columns
pledges.drop(columns=['pledge_week',
                     'days_pledged'],
            inplace=True)

# group dataframe by users
# call .sum() on the groupby object because
# there's at most one non-zero value per given week per user
pledges = pledges.groupby('id', as_index=False).sum()

pledges.head()


**Next step explanation:**

Because the pledges table doesn't hold data for all weeks a user is using the app we need to populate the subsequent weeks from the week a pledge was set or changed.

For simplicity, users who stopped using the app are given the last pledge value set for subsequent weeks. Alternative solution would be to only look at users who were last seen in the last (or later) week of the period we're looking at.

Note: Pledge can't be set to zero.

In [0]:
# range starts at earliest week+1
for week in range(38, 43):
    pledges.loc[(pledges[f"pledge_{week}"] == 0) &
                (pledges[f"pledge_{week-1}"] != 0),
                f"pledge_{week}"] = pledges[f"pledge_{week-1}"]

pledges.head()


In [0]:
# increased pledging or started to pledge
# if value in week 37 < avg. value in remaining weeks
increased = pledges.loc[pledges['pledge_37'] <
                        pledges.iloc[:, 2:].sum(axis=1) /
                        len(pledges.iloc[:, 2:].columns)]

# increased.head()


In [0]:
# decreased pledging 
# if value in week 37 > avg. value in remaining weeks
decreased = pledges.loc[pledges['pledge_37'] >
                        pledges.iloc[:, 2:].sum(axis=1) /
                        len(pledges.iloc[:, 2:].columns)]

# decreased.head()


### Results in counts and percentages

In [0]:
#@title
print(f"Total users: {pledges['id'].nunique()}")
      
print(f"Users who increased pledges: {len(increased)} \
({round(len(increased)/pledges['id'].nunique()*100, 1)}%)")

print(f"Users who decreased pledges: {len(decreased)} \
({round(len(decreased)/pledges['id'].nunique()*100, 1)}%)")


### Most common pledge settings and changes among app users

In [0]:
#@title
# aggregate by pledge columns and add count of users
grouped = pledges.groupby(by=pledges.columns[1:].tolist(),
                as_index=False)['id'].count()

# rename column for clarity
grouped.rename(columns={'id': 'num_of_users'},
               inplace=True)

# create a percentage column
grouped['%_of_total'] = round(
    grouped['num_of_users'] /
    grouped['num_of_users'].sum()*100, 1)

# sort by the most common pledge journey
grouped.sort_values('%_of_total',
                    ascending=False).head(20)


## Pledge days

### Most popular day combinations
1 = Monday ... 0 = Sunday

In [0]:
pledgeDayCombi = sql_to_df(
    """
    SELECT
        COUNT(DISTINCT id) AS number_of_users,
        daycount AS number_of_days_pledged,
        days AS day_combinations
    FROM
        g_apppledges
    WHERE
        days IS NOT NULL
    GROUP BY 
        days, daycount
	ORDER BY
		count(distinct id) desc, daycount DESC 
    """)

# ordering values
pledgeDayCombi.sort_values('number_of_users',
                           ascending=False,
                           inplace=True)

pledgeDayCombi.head(10)


In [0]:
alt.Chart(pledgeDayCombi).mark_bar().encode(
    #y='day combinations',
    #x='number of users'
    x=alt.X('number_of_users', sort="ascending"),
    y=alt.Y('day_combinations', sort="ascending")
)

## User-level pledge journeys (visual data analysis)
Experimental...

In [0]:
# only keep users who either increase or decrease pledges
journeys = pledges.loc[
    pledges['pledge_37'] != pledges.iloc[:,1:].sum(axis=1) / 6]
journeys.set_index('id', inplace=True)
journeys.head()

In [0]:
journeys.loc[
    journeys['pledge_37'] == 0].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 0 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 1].T.plot(
        legend=False, 
        alpha=0.3, 
        figsize=(12,8), 
        title='User journeys with pledge at 1 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 2].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 2 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 3].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 3 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 4].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 4 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 5].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 5 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 6].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 6 in first week');


In [0]:
journeys.loc[
    journeys['pledge_37'] == 7].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 7 in first week');


In [0]:
journeys.loc[
    journeys['pledge_38'] == 0].T.plot(
        legend=False, 
        alpha=0.1, 
        figsize=(12,8), 
        title='User journeys with pledge at 0 in SECOND week');
