# Three in a row

* Source: https://towardsdatascience.com/twenty-five-sql-practice-exercises-5fc791e24082
* Source: https://leetcode.com/problems/human-traffic-of-stadium/solution/
        
The attendance table logs the number of people counted in a crowd each day an event is held. Write a query to return a table showing the date and visitor count of high-attendance periods, defined as three consecutive entries (not necessarily consecutive dates) with more than 100 visitors.

In [2]:
%run Question.ipynb

 * postgresql://fknight:***@localhost/postgres
Done.
Done.
8 rows affected.
8 rows affected.


# Part A

To determine consecutive days (and ignore gaps in the days), write a query that orders the entries numerically.

## Example answer

In [3]:
%%sql

SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
FROM attendance

 * postgresql://fknight:***@localhost/postgres
8 rows affected.


event_date,visitors,day_num
2020-01-01,10,1
2020-01-04,109,2
2020-01-05,150,3
2020-01-06,99,4
2020-01-07,145,5
2020-01-08,1455,6
2020-01-11,199,7
2020-01-12,188,8


# Part B

Using the subquery from Part A, filter out events with fewer than 100 visitors.

```sql
WITH row_numbers as (
    SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance
)
```

## Example answer

In [5]:
%%sql

WITH row_numbers as (
    SELECT 
        *, 
        row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance
)

SELECT *
FROM row_numbers
WHERE visitors > 100

 * postgresql://fknight:***@localhost/postgres
6 rows affected.


event_date,visitors,day_num
2020-01-04,109,2
2020-01-05,150,3
2020-01-07,145,5
2020-01-08,1455,6
2020-01-11,199,7
2020-01-12,188,8


# Part C

Using the subqueries from Parts A & B, write a query that groups high attendance days into sets of three.

```sql
WITH row_numbers as (
    SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance
),

high_attendance AS (
    SELECT *
    FROM row_numbers
    WHERE visitors > 100
)
```

## Example answer

In [8]:
%%sql

WITH row_numbers as (
    SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance
),

high_attendance AS (
    SELECT *
    FROM row_numbers
    WHERE visitors > 100
)

SELECT 
    a.day_num AS day1, 
    b.day_num AS day2, 
    c.day_num AS day3 
FROM high_attendance a
JOIN high_attendance b
ON a.day_num = b.day_num - 1
JOIN high_attendance c
ON a.day_num = c.day_num - 2 

 * postgresql://fknight:***@localhost/postgres
2 rows affected.


day1,day2,day3
5,6,7
6,7,8


# Part D

Using the subqueries from Parts A, B, & C, solve the original problem.

```sql
WITH row_numbers as (
    SELECT 
        *, 
        row_number() 
            OVER (ORDER BY event_date) 
            AS day_num
    FROM attendance
),

high_attendance AS (
    SELECT *
    FROM row_numbers
    WHERE visitors > 100
),

three_in_a_row AS (
    SELECT 
        a.day_num AS day1, 
        b.day_num AS day2, 
        c.day_num AS day3 
    FROM high_attendance a
    JOIN high_attendance b
    ON a.day_num = b.day_num - 1
    JOIN high_attendance c
    ON a.day_num = c.day_num - 2 
)

```

## Example answer

In [7]:
%%sql

WITH row_numbers as (
    SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance
),

high_attendance AS (
    SELECT *
    FROM row_numbers
    WHERE visitors > 100
),

three_in_a_row AS (
    SELECT 
        a.day_num AS day1, 
        b.day_num AS day2, 
        c.day_num AS day3 
    FROM high_attendance a
    JOIN high_attendance b
    ON a.day_num = b.day_num - 1
    JOIN high_attendance c
    ON a.day_num = c.day_num - 2 
)

SELECT event_date, visitors
FROM row_numbers
WHERE day_num IN (SELECT day1 FROM three_in_a_row) 
OR day_num IN (SELECT day2 FROM three_in_a_row) 
OR day_num IN (SELECT day3 FROM three_in_a_row)

 * postgresql://fknight:***@localhost/postgres
4 rows affected.


event_date,visitors
2020-01-07,145
2020-01-08,1455
2020-01-11,199
2020-01-12,188


## The solution is given below

In [4]:
%%sql

-- create row numbers to get handle on consecutive days, 
-- since date column has some gaps

WITH t1 AS (
    SELECT *, row_number() OVER (ORDER BY event_date) AS day_num
    FROM attendance 
),

-- filter this to exclude days with > 100 visitors

t2 AS (
    SELECT *
    FROM t1
    WHERE visitors > 100 
),

-- self-join (inner) twice on offset = 1 day and offset = 2 days

t3 AS (
    SELECT a.day_num AS day1, b.day_num AS day2, c.day_num AS day3 FROM t2 a
    JOIN t2 b
    ON a.day_num = b.day_num - 1
    JOIN t2 c
    ON a.day_num = c.day_num - 2 
)

-- pull date and visitor count for consecutive days surfaced in previous table

SELECT event_date, visitors
FROM t1
WHERE day_num IN (SELECT day1 FROM t3) 
OR day_num IN (SELECT day2 FROM t3) 
OR day_num IN (SELECT day3 FROM t3)

 * postgresql://fknight:***@localhost/postgres
4 rows affected.


event_date,visitors
2020-01-07,145
2020-01-08,1455
2020-01-11,199
2020-01-12,188
