# Reboot - SQL Advanced

Tonight, we will use a Blog SQLite database:

In [None]:
!tree

## 1. Schema ERD

❓ Open the `data/blog.sqlite` in DBeaver, explore the schema and draw it on [kitt.lewagon.com/db](https://kitt.lewagon.com/db).

_TODO: Double click this cell and **paste** a screenshot of the schema for future reference_.

---
## 2. Most liked posts

Complete the code to get **the 3 most liked posts**:

In [2]:
import sqlite3

conn = sqlite3.connect("data/blog.sqlite")
c = conn.cursor()

# TODO: write the query
query = """
SELECT posts.id, posts.title, COUNT(*) as like_count
FROM likes
JOIN posts ON posts.id = likes.post_id 
GROUP BY likes.post_id 
ORDER BY like_count DESC
LIMIT 3
"""

# TODO: Execute the query
results = c.execute(query)
# TODO: Fetch and print the results
results.fetchall()

[(143, 'Half imagine another.', 84),
 (83, 'Side foot leader popular.', 82),
 (99, 'Area paper whatever mean.', 81)]

---

### Pretty Print using _pandas_

The readbility of our `print()` statements are not so good.

Next week, we will introduce [pandas](https://pandas.pydata.org/) which will largely improve the UX of our Data Exploration in Notebooks.

Execute the following cell:

In [3]:
import pandas as pd

Then try again the previous `query`, delegating the job of fetching results + displaying them to the `execute_and_print` function and `pandas`:

In [4]:
pd.read_sql_query(query, conn)

Unnamed: 0,id,title,like_count
0,143,Half imagine another.,84
1,83,Side foot leader popular.,82
2,99,Area paper whatever mean.,81


---
## 3. Find the three users who 'liked' the most

In [5]:
pd.read_sql_query("""
SELECT 
	users.id,
	users.first_name,
	users.last_name,
	COUNT(l.id) as likes_count
FROM users
JOIN likes l ON l.user_id = users.id
GROUP BY users.id
ORDER BY likes_count DESC
LIMIT 3
""", conn)

Unnamed: 0,id,first_name,last_name,likes_count
0,43,Michael,Allen,236
1,12,Donna,Ramirez,233
2,15,Barbara,Hurst,227


---
## 4. Find the most liked author

In [6]:
pd.read_sql_query("""
SELECT 
	users.id,
	users.first_name,
	users.last_name,
	COUNT(l.id) as likes_count
FROM users
JOIN posts p ON p.user_id = users.id
JOIN likes l ON l.post_id = p.id
GROUP BY users.id
ORDER BY likes_count DESC
LIMIT 1
""", conn)

Unnamed: 0,id,first_name,last_name,likes_count
0,57,Teresa,Moore,647


---
## 5. Who are the authors of the 3 most liked posts?

In [7]:
pd.read_sql_query("""
SELECT 
	users.id,
	users.first_name,
	users.last_name,
	p.title,
	COUNT(l.id) as likes_count
FROM users
JOIN posts p ON p.user_id = users.id
JOIN likes l ON l.post_id = p.id
GROUP BY p.id
ORDER BY likes_count DESC
LIMIT 3
""", conn)

Unnamed: 0,id,first_name,last_name,title,likes_count
0,72,Melissa,Henry,Half imagine another.,84
1,63,Cynthia,Raymond,Side foot leader popular.,82
2,64,Alexander,Cook,Area paper whatever mean.,81


---
## 6. How many people liked at least one post?

In [8]:
pd.read_sql_query("""
SELECT COUNT(DISTINCT l.user_id)
FROM likes l 
""", conn)

Unnamed: 0,COUNT(DISTINCT l.user_id)
0,49


ℹ️ Possible follow-up question: Who never liked a post?

In [9]:
pd.read_sql_query("""
SELECT u.id, u.first_name || ' ' || u.last_name as name
FROM users u
LEFT JOIN likes l ON l.user_id = u.id
WHERE l.id is NULL
""", conn)

Unnamed: 0,id,name
0,50,Brenda Griffin
1,51,Jennifer Mendez
2,52,Brittany Miller
3,53,Timothy Johnson
4,54,Tyler Wilson
5,55,Melissa Nelson
6,56,Madeline Porter
7,57,Teresa Moore
8,58,Grace Kerr
9,59,Pamela Mason


---
## 7. Compute the cumulative number of likes per day

In [10]:
pd.read_sql_query("""
SELECT 
	l.created_at as liked_at,
	COUNT(l.id) as like_count,
	SUM(COUNT(l.id)) OVER (
		order by l.created_at 
	) as cumulative_likes
FROM likes l
GROUP BY liked_at
""", conn)

Unnamed: 0,liked_at,like_count,cumulative_likes
0,2019-01-01,24,24
1,2019-01-02,34,58
2,2019-01-03,40,98
3,2019-01-04,36,134
4,2019-01-05,27,161
5,2019-01-06,16,177
6,2019-01-07,25,202
7,2019-01-08,23,225
8,2019-01-09,40,265
9,2019-01-10,27,292


---
## 8. (Optional) Who's the biggest fan of each author?

The biggest fan of an author is defined as the user who liked the most the author's posts.
<br><br>
<details>
    <summary>💡 Click for Hint</summary>
    You might need to use <code>WITH</code>
</details>


In [12]:
pd.read_sql_query("""
with author_likers AS (
SELECT 
	u.id as author_id,
	l.user_id as liker_id,
	COUNT(l.id) as likes
FROM users u
JOIN posts p ON p.user_id = u.id
JOIN likes l ON p.id = l.post_id
GROUP BY author_id, liker_id
)
SELECT
	authors.first_name || ' ' || authors.last_name as author,
	likers.first_name || ' ' || likers.last_name as liker,
	MAX(likes) as likes
FROM author_likers
JOIN users as authors ON authors.id = author_likers.author_id
JOIN users as likers ON likers.id = author_Likers.liker_id
GROUP BY author_Likers.author_id
""", conn)

Unnamed: 0,author,liker,likes
0,Brenda Griffin,Michael Allen,12
1,Jennifer Mendez,Kaylee Ball,21
2,Brittany Miller,Barbara Hurst,16
3,Timothy Johnson,Donna Ramirez,3
4,Tyler Wilson,Scott Thompson,8
5,Melissa Nelson,Sandra Davis,20
6,Madeline Porter,Donna Ramirez,12
7,Teresa Moore,Maria Mccarty,24
8,Grace Kerr,Ashley Brooks,19
9,Pamela Mason,Jessica Walker,6
