[View in Colaboratory](https://colab.research.google.com/github/janilles/dfdapp/blob/master/dfd_user_retention.ipynb)

# Drink Free Days app 
# USER RETENTION
How long are people using the app?

Looking at users who joined in the first week of the campaign.

# Credentials to run the notebook

## Google Drive authentication (optional)
NOTE: If login credentials are hardcoded into the database connection (code cell below) this step in not necesary. Otherwise: 

Install and authenticate [PyDrive](https://pythonhosted.org/PyDrive/index.html) for loading files from Google Drive so that database passwords aren't hardcoded into the notebook.

In [0]:
# added -q for suppressing output
!pip install -U -q PyDrive

# see PyDrive documentation for libraries code snippets
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# authenticate and create the PyDrive client
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

## Database connection
- Connecting to AWS RDS database with [PyMySQL](https://pymysql.readthedocs.io/en/latest/user/examples.html).
- Retruning MySQL queries as Pandas dataframes with [```read_sql()``` ](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html) function.

In [0]:
# added -q for suppressing output
!pip install -q -U pymysql

import pymysql
import pandas as pd

In [0]:
# comment out the other user/options when running this cell as necessary

# Jan's file - 'id' is Google Drive file ID
passwd_file = drive.CreateFile({'id': '1YnGugBHvqjJk0nbTqN-683Agb0vaZKHo'}) 

# this variable is used in the connect function below
user_passwd = passwd_file.GetContentString()

# If you're not using Google Drive file but are hardcoding the password
# user_passwd = password as a string

In [0]:
def connect():

    return pymysql.connect(
        
        host = "df-phereplica3.crqbvr0pveqx.eu-west-1.rds.amazonaws.com",
        
        # change user name and password as necessary
           
        user = "jan",

        passwd = user_passwd, # assigned in the cell above
   
        db = "daysoff",
        
        autocommit=True

        )

connection = connect()

def sql_to_df(sql):
    """
    Returns MySQL queries as Pandas dataframes.
    """
    return pd.read_sql(sql, con = connection)

# Database tables (optional)
Overview of avaliable data and tables used in the MySQL queries below. 
See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for MySQL syntax.

In [0]:
# formatting column width of Pandas dataframes
# increase column width so that longer comments don't get truncated

pd.set_option('max_colwidth',100)

### App users table

In [0]:
# run pd.set_option('max_colwidth',100) if comments column gets truncated

sql_to_df("""
        SELECT
            table_name, column_name, data_type, column_comment
        FROM
            information_schema.columns
        WHERE
            table_name = 'g_appusers'
        """)

# Report generation
- Write MySQL queries as long strings inside ```sql_to_df()``` function.  See [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/introduction.html) for MySQL syntax reference.
- ```sql_to_df()``` returns Pandas dataframes.

## Number of users by day
Looking at users who joined in the first week of the campaign.

In [0]:
usersByDay = sql_to_df("""
        SELECT
            COUNT(id) AS user_count,
            DAYOFYEAR(lastseen) - DAYOFYEAR(joined) AS day_count
        FROM
            g_appusers
        WHERE
            joined BETWEEN '2018-09-10' AND '2018-09-16'
        GROUP BY
            day_count
        ORDER BY
            day_count
        """)

usersByDay.head()

In [0]:
usersByDay['rolling'] = usersByDay['user_count'].cumsum()

usersByDay['percent'] = round((usersByDay['rolling']/usersByDay['user_count'].sum())*100, 1)

usersByDay['percent rev'] = round(100-((usersByDay['rolling']/usersByDay['user_count'].sum())*100), 1)

usersByDay

In [0]:
usersByDay['rolling'].plot();

In [0]:
usersByDay['user_count'].plot();

#### Export result to CSV

In [0]:
from google.colab import files

# usersByDay.to_csv('df.csv')
# files.download('df.csv')