<a href="https://colab.research.google.com/github/ratfarts/datasciencecoursera/blob/master/June_11_Copy_of_Data_Analysis_in_SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
<img src="https://github.com/datacamp/data-analysis-in-sql-live-session/blob/master/assets/datacamp.svg?raw=True" alt = "DataCamp icon" width="50%">
</p>
<br><br>

## **Data Analysis with SQL**

In this webinar, you'll learn how to write advanced queries to calculate core business metrics and KPIs. You'll be able to:

* Use Common Table Expressions to temporarily store a query's results
* Fetch values from different rows using window functinos
* Use self-joins to peak into the future

## **The Dataset**


We'll use two tables. The first, `user_sessions`, stores data user session data on a social media website. The table's schema is as follows:

- `session_date`: The date on which the user accessed the site
- `user_id`: The user's unique identifier
- `time_spent_in_mins`: How much time the user spent on the site

The second, `user_data`, stores the users' metadata. The table's schema is as follows:

- `user_id`: The user's unique identifier
- `country`: The user's country
- `age`: The user's age


## **Setting up PostgreSQL**

In [0]:
#@title **This block of code will install PosgreSQL**
%%capture
!wget -qO- https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
!echo "deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main" >/etc/apt/sources.list.d/pgdg.list
!apt -qq update
!apt -yq install postgresql-12 postgresql-client-12
!service postgresql start
# make calling psql shorter
!sudo -u postgres psql -c "CREATE USER root WITH SUPERUSER"  
!psql postgres -c "CREATE DATABASE root"  # now just !psql -c "..."
# load SQL extensions
%load_ext sql
%config SqlMagic.feedback=False 
%config SqlMagic.autopandas=True
%sql postgresql+psycopg2://@/postgres

In [0]:
#@title **This will download your data to local environment**
!wget -q https://github.com/datacamp/data-analysis-in-sql-live-training/raw/master/data/user_data.csv
!wget -q https://github.com/datacamp/data-analysis-in-sql-live-training/raw/master/data/user_metadata.csv

In [3]:
#@title **This will create your table**
%%sql
-- Make sure to amend you table name, column names and types
DROP TABLE IF EXISTS user_sessions;
CREATE TABLE user_sessions(
 session_date date,
 user_id int,
 time_spent_in_mins int
);

COPY user_sessions
-- Make sure to point to correct file and delimiter 
FROM '/content/user_data.csv' DELIMITER ',' CSV HEADER;

DROP TABLE IF EXISTS user_data;
CREATE TABLE user_data(
 user_id int,
 country char(3),
 age int
);

COPY user_data
-- Make sure to point to correct file and delimiter 
FROM '/content/user_metadata.csv' DELIMITER ',' CSV HEADER;

 * postgresql+psycopg2://@/postgres


Let's start by exploring the tables.

In [0]:
%%sql

-- SELECT first 5 rows from user_sessions


In [0]:
%%sql

-- SELECT first 5 rows from user_data


In [0]:
%%sql

-- Join the two tables together


## **Data overview**

Since you have user demographics, you can start by exploring some basic metrics, like:

- Average age per country
- User count by country

In [0]:
%%sql

-- Get the average age per country


In [0]:
%%sql

-- Get the user count by country 


## **Active users**

The active users KPI counts the active users of a company's app over a certain time period:
- by day (daily active users, or DAU)
- by month (monthly active users, or MAU)

For example, Facebook had 1.76B DAU and 2.6 MAU in March.

Stickiness (DAU / MAU) measures how often users engage with an app on average. Facebook's stickiness for March was `1.76B / 2.6B ~= 0.677`, meaning that, on average, users used Facebook for `67.7% x 30 days ~= 20` days each month.

To get the daily active users, we need to count the number of unique `user_id`s for each `session_date`

In [0]:
%%sql

-- Calculate the Daily Active Users (DAU)



This is what the results would look like when visualized:

![Facespace DAU](https://github.com/datacamp/data-analysis-in-sql-live-training/raw/master/assets/facespace_dau.png)

## **Monthly active users**

Usually, reports include MAU, not DAU. How do you convert the session dates to months?

**Enter `DATE_TRUNC`**

`DATE_TRUNC(date_part, date) → DATE`: Truncates `date` to the nearest `date_part`.

**Examples**
- `DATE_TRUNC('week', '2018-06-12') :: DATE` → `'2018-06-11'`
- `DATE_TRUNC('month', '2018-06-12') :: DATE` → `'2018-06-01'`
- `DATE_TRUNC('quarter', '2018-06-12') :: DATE` → `'2018-04-01'`
- `DATE_TRUNC('year', '2018-06-12') :: DATE` → `'2018-01-01'`

**Note**: `:: DATE` is just to remove the hours, minutes, and seconds.

In [6]:
%%sql

-- Calculate the Monthly Active Users (DAU)
SELECT DISTINCT 
  session_date,
  DATE_TRUNC('quarter', session_date) AS session_date_quarter
FROM user_sessions
ORDER BY session_date ASC;


 * postgresql+psycopg2://@/postgres


Unnamed: 0,session_date,session_date_quarter
0,2020-01-01,2020-01-01 00:00:00+00:00
1,2020-01-02,2020-01-01 00:00:00+00:00
2,2020-01-03,2020-01-01 00:00:00+00:00
3,2020-01-04,2020-01-01 00:00:00+00:00
4,2020-01-05,2020-01-01 00:00:00+00:00
...,...,...
147,2020-05-27,2020-04-01 00:00:00+00:00
148,2020-05-28,2020-04-01 00:00:00+00:00
149,2020-05-29,2020-04-01 00:00:00+00:00
150,2020-05-30,2020-04-01 00:00:00+00:00


This is what the results would look like when visualized:

![Facebook MAU](https://github.com/datacamp/data-analysis-in-sql-live-training/raw/master/assets/facespace_mau.png)

## **Q&A**

## **Registration dates**

Let's define the user's registration date as the date of that user's first session.

So, each user's registration date is the minimum session date for that user in the `user_sessions` table.

We'll use these results later on to calculate the growth in registrations.

In [0]:
%%sql

-- Get each user's registration date


## **Registrations and Common Table Expressions (CTEs)**

Now that you have each user's registration date, you'll want to store the results somehow to use them in a different query. How do you do that?

**Enter Common Table Expressions (CTEs)**

```sql
WITH cte_name AS (
  ...
)

SELECT *
FROM cte_name;
```

A CTE stores the results of a query temporarily in the specificed `cte_name` so it can be used in the outer query later on.

Once you store the results of the previous query in a CTE, you can `DATE_TRUNC()` the registration dates and count the unique `user_id`s in each registration month.

In [0]:
%%sql

-- Store each user's registration date in the regs CTE
-- Calculate the number of registrations per month


## **Growth and window functions**

You now have each month's registrations. How do you calculate growth?

`Growth = (Current month - previous month) / previous month`

For example, if you had 122 registrations last month, and you have 156 registrations this month, your registrations grew by `(156 - 122) / 122 ~= 28%` this month.

So you need both the previous and the current months' registrations in the same row. How do you do that?

**Window functions**

A window function performs some operation across a set of table rows that are somehow related to the current row.

- `LAG(column_a, 1) OVER (ORDER BY column_b ASC)` Gets the previous row's value in `column_a` if you sort by `column_b`.

In [0]:
%%sql
-- Fetch the previous and current months' MAUs


Store the results in a CTE and apply the formula to get the monthly registrations growth rates. You can use `COALESCE(..., 1)` to convert any `NULL` values to 1.

In [0]:
%%sql

-- Calculate the monthly growth in registrations


## **Q&A**

## **Retained and resurrected users**

Users can be split into four groups:
- New/registered users are ones that just signed up for your platform
- Retained users used to use your app, and still do, too.
- Churned users used to use your app, and no longer do.
- Resurrected users were churned users who returned to using your app.

Retention is another core KPI that platforms use to measure how well they are at keeping their users.

The first step to calculating retention is getting each of the months in which each user is active.

In [0]:
%%sql

-- Get the months in which each user is active


## **Self-joins**

Now that you have the months in which each user is active, how do you calculate retention?

![Left joins](https://user-images.githubusercontent.com/48436758/83518570-e4ff8c00-a4da-11ea-8a5a-25ea46df2bcc.png)

If you left-join this table on itself on the same user ID and having a one-month difference in users, you'll see whether a user is still active in the next month or not. If the user isn't active, then the user is churned. The count of non-`NULL`s is the count of retained users.

```sql
...
FROM ... AS prev
LEFT JOIN ... AS curr
  ON prev.user_id = curr.user_id
 AND prev.month = (curr.month - INTERVAL '1 MONTH')
...
```



In [0]:
%%sql

-- Get whether each user churned in a given month


Store the results in a CTE and count the number of `FALSE` in the `churned_next_month` to get the retention rate.

In [0]:
%%sql

-- Calculate the retention rate


## **Average age of churners**

Now that you have the retention status of each user, you can see whether there are any trends in churns, such as older people churning more.


In [0]:
%%sql

-- Get the average age of churners versus retained users in April


## **Q&A**