# Funnel Analysis

A [queensai.com](https://www.queensai.com/) original project created by mentor [Samantha Lam](https://www.linkedin.com/in/samanthalam/).

## What is funnel analysis? 

Tomi Mester has an excellent blog post on this: https://data36.com/funnel-analysis/

"Funnel analysis is a powerful analytics method that shows visually the conversion between the most important steps of the user journey." 

One of the most common types of analysis in industry is an onboarding funnel analysis, which are the steps that a user goes from registering on a website to the point they do a key action that the product offers. Examples of a key action in onboarding funnels could be:
- for a music-streaming service like Spotify, this could be when a user presses play for a song
- for an email provider such as Gmail, this could be when a user sends an email
- for a mobile game like Candy Crush, this could be from when a user downloads the app to when they have completed their first level.

**Come up with your own ideas around other online services you know on what a key action they would want their user to perform to get an understanding of what their product does!**


In this project we will go through how we would do this at Mentimeter (https://www.mentimeter.com/). If you don't already know what Mentimeter is, I highly recommend you check it out ;) In essence, Mentimeter is an interactive presentation platform where you can engage the audience with real-time interaction such as polls and word clouds.

The key action we want the registered user to engage in to understand the value of Mentimeter is to present live in front of an audience. In other words, when a user's presentation has received 2 votes or more.

*Note: Each Step corresponds roughly to one week of the course at queensai so the discussion points each week will revolve around the topic in the Step.*

-----------------------


### Week 1 Course Material

What is Python and how to install
- How to install Python + Jupyter notebook on your laptop!	https://www.codecademy.com/articles/install-python3
- More on installing python	https://realpython.com/installing-python/

#### Learn
- What is a variable? It's a box with a name https://www.youtube.com/watch?v=OPBxRcosIaU
- How to use a jupyter notebook: 
 - https://www.dataquest.io/blog/jupyter-notebook-tutorial/
 - https://www.youtube.com/watch?v=1QDvkkdyGw0
- Lessons 1 & 2 https://www.udacity.com/course/introduction-to-python--ud1110

#### Inspiration
- A little bit of fun history https://www.python-course.eu/python3_history_and_philosophy.php 
- Where To Start Learning How To Code https://www.youtube.com/watch?v=-1SmUivH9dQ
- How can I become a good programmer, for beginners https://www.youtube.com/watch?v=2-VKC8g2u1Y

### Step #1: Define the steps of your funnel + understand the data!

Typically in a company you would need to understand how this is tracked in the data ecosystem. This data is often stored in some form of database that would require SQL to get this data out. The end result can be stored as a csv so let's imagine that we have someone to help us with the pulling of data but we still need to *define* what is needed.

What do we need to know to start? We want to create an onboarding funnel at Mentimeter so let's define the steps required:
- Registration
- Create a presentation
- Edit a presentation
- Presented live

So what kind of information do we need for each of these steps for us to understand how a user goes from one step to another? Well, the first thing is that we need an identifier for the user! Without this, there is no way we can know who has completed which step.

How do we represent a user? We use what is called a user ID. A user ID is a unique identifier, commonly used to log on to a website, app, or online service, e.g. it may be a username, account number, or email address. (You can read a bit more about this at: https://techterms.com/definition/user_id). So if it is a username, what data type would that be? What about an account number? Or an email address then?

In [1]:
# User ID is a username
user_id = 'ilovepython'
type(user_id)

str

In [2]:
# User ID is an account number
user_id = 123456
type(user_id)

int

In [3]:
# User ID is an email address
user_id = 'ilovepython@email.com'
type(user_id)

str

In the case of this project, our user IDs are stored as account numbers, in other words, integers.

So we can identify a user, what else do we need? 
The onboarding step and the time at which it happened. We want the onboarding step to know what action the user has done, and the time at which it happened. These three pieces of information is one of the most common ways data is tracked about a user. Who, what and when.

What would the data type of the onboarding step be? Let's say they are the words 'Registration', 'Create a presentation', 'Edit a presentation', and 'Presented live'. I bet this is straightforward.

In [4]:
# Onboarding step
onboarding_step1 = 'Registration'
type(onboarding_step1)
onboarding_step2 = 'Create a presentation'
type(onboarding_step2)
onboarding_step3 = 'Edit a presentation'
onboarding_step4 = 'Presented live'

print(type(onboarding_step1),type(onboarding_step4),type(onboarding_step3) ,type(onboarding_step4))

<class 'str'> <class 'str'> <class 'str'> <class 'str'>


How about when something happens, i.e. when the event was created at? This one is trickier. How can time be represented in Python?

In [5]:
created_at1 = '2020-09-01 09:10:08'
type(created_at1)

str

In [6]:
created_at2 = '2020-09-01 09:20:08'
type(created_at2)

str

In [7]:
created_at2-created_at1

TypeError: unsupported operand type(s) for -: 'str' and 'str'

This *looks* nice, but we also know that there is actually a data type dedicated to time in python that gives us some nice functionality related to time that we don't get from strings (https://docs.python.org/3/library/datetime.html)

In [8]:
import datetime as dt
created_at1 = dt.datetime(2020, 9, 1, 9, 10, 8)
created_at2 = dt.datetime(2021, 9, 1, 10, 20, 8)

print(created_at1, created_at2)

2020-09-01 09:10:08 2021-09-01 10:20:08


In [9]:
time_diff = created_at2-created_at1
time_diff

datetime.timedelta(365, 4200)

In [10]:
print(time_diff.days, time_diff.seconds)

365 4200


So, due to the flexibility that this object type gives us, we decide that the created_at data we get is a 'datetime.datetime' object.

Now that we are clear about the data we need, and what object types they should be, let's import this data file and take a peek at it looks like.

In [11]:
# Import the data file
import csv
data=[]
with open('onboarding_funnel_data.csv', newline='') as csvfile:
    datareader = csv.reader(csvfile, delimiter=';', quotechar='|')
    for row in datareader:
        data.append(row)

In [12]:
data[0:10]

[['user_id,funnel_step,timestamp,device'],
 ['3695,edit,2020-01-04 22:44:44,desktop'],
 ['10204,live,2020-01-05 00:22:50,desktop'],
 ['10399,live,2020-01-05 04:45:55,desktop'],
 ['7536,edit,2020-01-05 03:10:15,desktop'],
 ['6578,edit,2020-01-04 22:40:19,desktop'],
 ['801,create,2020-01-04 23:14:42,desktop'],
 ['4416,registration,2020-01-05 01:05:31,mobile'],
 ['10501,edit,2020-01-05 00:53:17,desktop'],
 ['1037,create,2020-01-05 06:10:27,desktop']]

Does this look weird?

In [13]:
data[0]

['user_id,funnel_step,timestamp,device']

If you look carefully, you'll notice that the value in the array is a whole string, rather than 4 separate ones for each column. Why this happens is because the delimiter used is a semi-colon (;) and not a comma (,) which is what the csv we are importing uses. Let's try again!

In [14]:
# Import the data file
import csv
data=[]
with open('onboarding_funnel_data.csv', newline='') as csvfile:
    datareader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in datareader:
        data.append(row)

In [15]:
data[0:10]

[['user_id', 'funnel_step', 'timestamp', 'device'],
 ['3695', 'edit', '2020-01-04 22:44:44', 'desktop'],
 ['10204', 'live', '2020-01-05 00:22:50', 'desktop'],
 ['10399', 'live', '2020-01-05 04:45:55', 'desktop'],
 ['7536', 'edit', '2020-01-05 03:10:15', 'desktop'],
 ['6578', 'edit', '2020-01-04 22:40:19', 'desktop'],
 ['801', 'create', '2020-01-04 23:14:42', 'desktop'],
 ['4416', 'registration', '2020-01-05 01:05:31', 'mobile'],
 ['10501', 'edit', '2020-01-05 00:53:17', 'desktop'],
 ['1037', 'create', '2020-01-05 06:10:27', 'desktop']]

This looks a lot better!

--------------------------

### Week 2 Course Material

#### Learn
- Lessons 3 & 4 https://www.udacity.com/course/introduction-to-python--ud1110
- Functions, Sequences, Iterations: Python Programming Bootcamp 2020 | Learn to Code in Python [Tutorial and Exercises] https://www.youtube.com/watch?v=KPuA3Vq4yvY&t=0s
- Python Functions https://www.youtube.com/watch?v=u-OmVr_fT4s
- The first 5 sections, Introduction to DataFrame https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

#### Inspiration
- How To Stay Motivated When Learning To Code https://www.youtube.com/watch?v=a0wY2TBs3zY
- Dealing with Stress and Anxiety When Learning to Code https://www.youtube.com/watch?v=anfszzl3GpA

### Step #2: More data manipulation + initial visualisation

- Always check the data!
- Let's try out some different ways of visualising the same thing. Which is most effective for understanding?

After we import and understand what kind of information the dataset contains, we can start cleaning the dataset before stepping into analysis stage. The data cleaning process is very important to make sure that the dataset we had has met our criteria and to make sure the conclusion resulted from it is correct.


--------------------------

### Week 3 Course Material

Install pandas
- The first 5 sections, from Introduction to DataFrame (includes installation) https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
- Install Pandas on Windows	https://data-flair.training/blogs/install-pandas-on-windows/
- Install Pandas https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

#### Learn
- Python: Pandas Tutorial, Intro to DataFrames 
 - (video) https://www.youtube.com/watch?v=e60ItwlZTKM
 - (article) https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python
- Intro to Data Analysis / Visualization with Python, Matplotlib and Pandas | Matplotlib Tutorial https://www.youtube.com/watch?v=a9UrKTVEeZA

#### Inspiration
- The beauty of data visualization https://www.youtube.com/watch?v=5Zg-C8AAIGg

### Step #3: More funnels

- Time-delay between the different funnel steps
- Segmentation, which users is this onboarding good for, who are dropping off?

--------------------------

### Week 4 Course Material

#### Learn
- Refactoring:
 - What is code refactoring? https://www.youtube.com/watch?v=vhYK3pDUijkv (first 3 mins)
 - How to Refactor Code in Python: A Beginner's Guide https://hubpages.com/technology/How-To-Refactor-Code-In-Python-A-Beginners-Guide
 - 4 Simple Ways to Refactor Your Python Code https://medium.com/code-85/4-simple-ways-to-refactor-your-python-code-2f491b767381
 - Python Refactoring https://www.youtube.com/watch?v=KTIl1MugsSY
- What is Data Storytelling? https://www.nugit.co/what-is-data-storytelling/
- 5 Steps for Effective Data Storytelling https://www.qlik.com/us/-/media/files/resource-library/global-us/register/ebooks/EB-5-Steps-for-Effective-Data-Storytelling-EN

#### Inspiration
- Making data mean more through storytelling https://www.youtube.com/watch?v=6xsvGYIxJok
- Storytelling with Data (long) https://www.youtube.com/watch?v=8EMW7io4rSI

### Step #4: Reproducible analysis and actionable insights

- Refactor and reuse
- Combining the funnels together, what are the key take-aways?

--------------------------