# Query 1
Top 5 most popular pieces of content consumed this week

Strategy: 


- The metric to measure the popularity of a piece of content is `Ads_User_id`


- Return the top 5 `Content_id` with the most `Ads_User_id` in table `Page_Impression`

Assumption:
- There are no missing data

```sql
SELECT Content_id, COUNT(Ads_User_id) AS Popularity
FROM Page_Impression
WHERE Timestamp >= {00:00:00 this Monday}
GROUP BY Content_id
ORDER BY Popularity DESC
LIMIT 5;
```

# Query 2
Number of weekly active users for the latest full week (Monday – Sunday). WAU is calculated by counting registered users with > 60 seconds dwell time between Monday-Sunday.

Strategy:


- Connect the `User_id` with `Dwell_Time` through `OAuth_id`, sum up the total `Dwell_Time` over the latest full week


- Filter those `User_id` with more than 60 seconds total `Dwell_Time`. 


- The relationship *between* table `Registered_Users` and table `OAuth_id_Service` is __one to one__, which means each entity in the table `Registered_Users` *exclusively* corresponds to only one entity in the table `OAuth_id_Service`. 


- Also, *from* table `Ad_Service_Interaction_Data` *to* table `OAuth_id_Service`, the relationship is __one__, which means any entity in table `Ad_Service_Interaction_Data` can find *only one* entity in table `OAuth_id_Service`. 


- This means __1__ `User_id` corresponds to __1__ `OAuth_id`, and __1__ `OAuth_id` corresponds to __1__ `Ads_User_id`. This also means any `User_id` or `OAuth_id` or `Ads_User_id` can identify a unique user. 


-  That means we can count `Ads_User_id` that associate to more than 60 sec of total `Dwell_Time`. There's no need to pull in table `Registered_Users` or table `OAuth_id_Service`, which can speed up the process and save the memory usage.

Assumptions:

- There are no missing data.


- `User_id` is the primary key (PK) of table `Registered_Users`, so `User_id` is unique. 


- `OAuth_id` is the PK of table `OAuth_id_Service`. 


- `Dwell_Time` is in the unit of second


---
## *Question for Mike*:

- The middle 3 tables `Page_Impression`, `Ad_Service_Interaction_Data` and `OAuth_id_Service` have the same key `Ads_User_id`. (Right to middle) From `OAuth_id_Service` to `Ad_Service_Interaction_Data`, the relationship is one to zero or many. (Left to middle) From `Page_Impression` to `Ad_Service_Interaction_Data`, the relationship is many to one. How can this be possible?


 - Is it because table `Page_Impression` uses a composite primary key (`Ads_User_id`, `Timestamp`), but table `Ad_Service_Interaction_Data` uses a single primary key `Ads_User_id` ?

---

```sql
SELECT COUNT(*) AS Num_WAU
FROM
(
    SELECT Ads_User_id
    FROM Ad_Service_Interaction_Data 
    WHERE Timestamp >= {00:00:00 latest Monday} AND Timestamp <= {23:59:59 latest Sunday}
    GROUP BY Ads_User_id
    HAVING COUNT(Dwell_Time) > 60
)
```

# Query 3
Top 5 pieces of content from each content type consumed this week by only active users (using the above definition)

Strategy: 


- The metric to measure the popularity of a piece of content is `Ads_User_id`


- Need to filter out those `Ads_User_id` that belong to only the active users *this week*, then find the top 5 pieces for each content type

Assumptions:


- ...

```sql
WITH Active_Users AS
(
    SELECT Ads_User_id
    FROM Ad_Service_Interaction_Data 
    WHERE Timestamp >= {00:00:00 latest Monday} AND Timestamp <= {23:59:59 latest Sunday}
    GROUP BY Ads_User_id
    HAVING COUNT(Dwell_Time) > 60
),
    Popular_Content AS
(
    SELECT Content_id, COUNT(Ads_User_id) AS Popularity
    FROM Page_Impression
    WHERE (Timestamp >= {00:00:00 this Monday}) AND (Ads_User_id IN Active_Users)
    GROUP BY Content_id
    ORDER BY Popularity DESC
)

SELECT *
FROM
(
    SELECT Content_Type, 
           Content_id, 
           Popularity, 
           Rank() over (Partition BY Content_Type
                        ORDER BY Popularity DESC) AS Rank
    FROM Content_Metadata JOIN Popular_Content USING (Content_id)
)
WHERE Rank <= 5;

```