# Project Employees III
 
Table: Project

    +-------------+---------+
    | Column Name | Type    |
    +-------------+---------+
    | project_id  | int     |
    | employee_id | int     |
    +-------------+---------+
(project_id, employee_id) is the primary key (combination of columns with unique values) of this table.
employee_id is a foreign key (reference column) to Employee table.

Each row of this table indicates that the employee with employee_id is working on the project with project_id.
 

Table: Employee

    +------------------+---------+
    | Column Name      | Type    |
    +------------------+---------+
    | employee_id      | int     |
    | name             | varchar |
    | experience_years | int     |
    +------------------+---------+
employee_id is the primary key (column with unique values) of this table.
Each row of this table contains information about one employee.
 

Write a solution to report the most experienced employees in each project. In case of a tie, report all employees with the maximum number of experience years.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Project table:

    +-------------+-------------+
    | project_id  | employee_id |
    +-------------+-------------+
    | 1           | 1           |
    | 1           | 2           |
    | 1           | 3           |
    | 2           | 1           |
    | 2           | 4           |
    +-------------+-------------+
    
Employee table:

    +-------------+--------+------------------+
    | employee_id | name   | experience_years |
    +-------------+--------+------------------+
    | 1           | Khaled | 3                |
    | 2           | Ali    | 2                |
    | 3           | John   | 3                |
    | 4           | Doe    | 2                |
    +-------------+--------+------------------+
Output: 

    +-------------+---------------+
    | project_id  | employee_id   |
    +-------------+---------------+
    | 1           | 1             |
    | 1           | 3             |
    | 2           | 1             |
    +-------------+---------------+
Explanation: Both employees with id 1 and 3 have the most experience among the employees of the first project. For the second project, the employee with id 1 has the most experience.

In [None]:
SELECT p.project_id, p.employee_id
FROM Project p
JOIN Employee e ON p.employee_id = e.employee_id
WHERE (p.project_id, e.experience_years) IN (
    SELECT p2.project_id, MAX(e2.experience_years)
    FROM Project p2
    JOIN Employee e2 ON p2.employee_id = e2.employee_id
    GROUP BY p2.project_id
);


Explanation:
- This solution works by leveraging SQL’s ability to handle filtering with IN for multiple columns, making it concise and performant:

- Complexity: The solution has a complexity mainly driven by the subquery filtering, which should work efficiently on indexed tables.

- Edge Cases: We handle cases where multiple employees have the same maximum experience years in a project by including all matching rows with the WHERE (project_id, experience_years) IN condition. This approach guarantees that if there are ties, they will all be returned.

# Project Employees II
Table: Project

    +-------------+---------+
    | Column Name | Type    |
    +-------------+---------+
    | project_id  | int     |
    | employee_id | int     |
    +-------------+---------+

(project_id, employee_id) is the primary key (combination of columns with unique values) of this table.
employee_id is a foreign key (reference column) to Employee table.
Each row of this table indicates that the employee with employee_id is working on the project with project_id.
 

Table: Employee

    +------------------+---------+
    | Column Name      | Type    |
    +------------------+---------+
    | employee_id      | int     |
    | name             | varchar |
    | experience_years | int     |
    +------------------+---------+
employee_id is the primary key (column with unique values) of this table.
Each row of this table contains information about one employee.
 

Write a solution to report all the projects that have the most employees.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Project table:

    +-------------+-------------+
    | project_id  | employee_id |
    +-------------+-------------+
    | 1           | 1           |
    | 1           | 2           |
    | 1           | 3           |
    | 2           | 1           |
    | 2           | 4           |
    +-------------+-------------+
Employee table:

    +-------------+--------+------------------+
    | employee_id | name   | experience_years |
    +-------------+--------+------------------+
    | 1           | Khaled | 3                |
    | 2           | Ali    | 2                |
    | 3           | John   | 1                |
    | 4           | Doe    | 2                |
    +-------------+--------+------------------+
Output: 

    +-------------+
    | project_id  |
    +-------------+
    | 1           |
    +-------------+
Explanation: The first project has 3 employees while the second one has 2.

In [None]:
SELECT project_id
FROM Project
GROUP BY project_id
HAVING COUNT(employee_id) = (
    SELECT MAX(employee_count)
    FROM (
        SELECT project_id, COUNT(employee_id) AS employee_count
        FROM Project
        GROUP BY project_id
    ) AS ProjectCounts
);


Complexity Analysis

Time Complexity:

    Inner Subquery: The inner subquery has a time complexity of  O(n), where n is the number of rows in the Project table. This is because it scans all rows to group and count them.

    Outer Query: The outer query also has a time complexity of O(n), as it re-groups and filters based on the maximum count.

    Overall Complexity: Approximately O(n), making this efficient for typical datasets, especially with indexing on project_id.

Space Complexity:

    Temporary Space: The space complexity is O(m), where m is the number of unique project_ids, as it temporarily stores the ProjectCounts result.

Edge Cases

    - All Projects Have the Same Employee Count: If all projects have the same number of employees, the query returns all project_ids, as they all meet the maximum count condition.
    - Single Project: If there’s only one project, the query simply returns that project’s ID, as it inherently has the maximum count.
    - Projects with Zero Employees: If any project has zero employees, it won’t be included in the result since the count is zero, which typically won’t match the maximum.
    - Ties for Maximum Count: If multiple projects share the maximum employee count, they’ll all be included in the result, as each meets the maximum threshold.


# Article Views I
 
Table: Views

    +---------------+---------+
    | Column Name   | Type    |
    +---------------+---------+
    | article_id    | int     |
    | author_id     | int     |
    | viewer_id     | int     |
    | view_date     | date    |
    +---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date. 
Note that equal author_id and viewer_id indicate the same person.
 

Write a solution to find all the authors that viewed at least one of their own articles.

Return the result table sorted by id in ascending order.

The result format is in the following example.

 

Example 1:

Input: 
Views table:
    
    +------------+-----------+-----------+------------+
    | article_id | author_id | viewer_id | view_date  |
    +------------+-----------+-----------+------------+
    | 1          | 3         | 5         | 2019-08-01 |
    | 1          | 3         | 6         | 2019-08-02 |
    | 2          | 7         | 7         | 2019-08-01 |
    | 2          | 7         | 6         | 2019-08-02 |
    | 4          | 7         | 1         | 2019-07-22 |
    | 3          | 4         | 4         | 2019-07-21 |
    | 3          | 4         | 4         | 2019-07-21 |
    +------------+-----------+-----------+------------+

Output: 

    +------+
    | id   |
    +------+
    | 4    |
    | 7    |
    +------+

In [None]:
SELECT DISTINCT author_id AS id
FROM Views
WHERE author_id = viewer_id
ORDER BY id ASC;


Explanation

Problem Breakdown:

    We need to identify authors who have viewed at least one of their own articles. This means we’re looking for records where the author_id is the same as the viewer_id.

    The result should return only unique author_ids in ascending order.

Query Explanation:

    SELECT DISTINCT author_id AS id:We use SELECT DISTINCT to ensure that each author appears only once in the result, regardless of how many times they viewed their articles.
    
    We rename author_id as id to match the required output format.
    
    FROM Views: We retrieve data from the Views table, which contains all article view records, including article_id, author_id, viewer_id, and view_date.
    
    WHERE author_id = viewer_id: This condition filters the rows to include only those where the author_id matches the viewer_id, meaning the author viewed their own article.
    
    ORDER BY id ASC: Finally, we sort the result by id (the author’s ID) in ascending order to meet the problem’s output requirements.

Complexity Analysis

Time Complexity:

    The query has a time complexity of O(n), where n is the number of rows in the Views table, as it performs a scan to filter rows where author_id = viewer_id and a distinct selection.

Space Complexity:

    The space complexity is O(k), where k is the number of distinct author_ids that match the condition, as we store the unique results temporarily.

Edge Cases

    No Matching Rows: If there are no rows where author_id = viewer_id, the result will be an empty table.

    Multiple Views by the Same Author: If an author views their article multiple times, DISTINCT ensures they appear only once in the result.

    Single Record Table: If the Views table has only one row and it meets the author_id = viewer_id condition, the query will return just that author’s ID.

# Consecutive Numbers
 
Table: Logs

    +-------------+---------+
    | Column Name | Type    |
    +-------------+---------+
    | id          | int     |
    | num         | varchar |
    +-------------+---------+
In SQL, id is the primary key for this table.
id is an autoincrement column starting from 1.
 

Find all numbers that appear at least three times consecutively.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Logs table:

    +----+-----+
    | id | num |
    +----+-----+
    | 1  | 1   |
    | 2  | 1   |
    | 3  | 1   |
    | 4  | 2   |
    | 5  | 1   |
    | 6  | 2   |
    | 7  | 2   |
    +----+-----+
Output: 

    +-----------------+
    | ConsecutiveNums |
    +-----------------+
    | 1               |
    +-----------------+
Explanation: 1 is the only number that appears consecutively for at least three times.

Steps and Explanation:
    
    Identify Consecutive Repetitions: We need to check whether each number (num) in the Logs table appears consecutively three or more times. Consecutive means that each appearance follows immediately after the previous one in the order of id.

Create Conditions for Consecutive Rows: For any given row with id = i, we’ll check:

    - if num at i is the same as num at i+1
    - and num at i+1 is the same as num at i+2.
    - If both of these conditions are true, then we have identified a number that appears consecutively three times starting from id = i.

SQL Query: We can write a query that uses a JOIN or WHERE clause to find consecutive rows.



In [None]:
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1, Logs l2, Logs l3
WHERE l1.num = l2.num 
  AND l2.num = l3.num 
  AND l1.id = l2.id - 1 
  AND l2.id = l3.id - 1;


Explanation of the SQL Query:
    
    - l1, l2, and l3 are aliases for the Logs table, representing three consecutive rows.
    
    - The WHERE clause checks:
        - l1.num = l2.num and l2.num = l3.num: This ensures that the same number appears in three consecutive rows.
        - l1.id = l2.id - 1 and l2.id = l3.id - 1: This ensures the rows are consecutive based on the id field.
    
    - SELECT DISTINCT l1.num returns the unique numbers that meet the criteria.
    
Edge Cases:

    Fewer than Three Rows: If the table has fewer than three rows, we can’t have any number appearing three times consecutively.

    Non-Consecutive Repetitions: If a number appears multiple times but not in consecutive rows (e.g., with different numbers in between), it should not be included in the result.

    Multiple Sets of Consecutive Repetitions: If the same number appears consecutively in two different sequences (e.g., three times in one part and three times in another part), it should still be included only once due to DISTINCT.
    
Complexity Analysis:

    Time Complexity: O(n) assuming an indexed query with id being unique, as we are scanning through the table and filtering based on adjacent rows.

    Space Complexity: O(m) where m is the number of unique numbers that appear consecutively three times, since we're storing these results.


# Get Highest Answer Rate Question 

Table: SurveyLog

    +-------------+------+
    | Column Name | Type |
    +-------------+------+
    | id          | int  |
    | action      | ENUM |
    | question_id | int  |
    | answer_id   | int  |
    | q_num       | int  |
    | timestamp   | int  |
    +-------------+------+
- This table may contain duplicate rows.
- action is an ENUM (category) of the type: "show", "answer", or "skip".
- Each row of this table indicates the user with ID = id has taken an action with the question question_id at time timestamp.
- If the action taken by the user is "answer", answer_id will contain the id of that answer, otherwise, it will be null.
- q_num is the numeral order of the question in the current session.
 

The answer rate for a question is the number of times a user answered the question by the number of times a user showed the question.

Write a solution to report the question that has the highest answer rate. If multiple questions have the same maximum answer rate, report the question with the smallest question_id.

The result format is in the following example.

 

Example 1:

Input: 
SurveyLog table:

    +----+--------+-------------+-----------+-------+-----------+
    | id | action | question_id | answer_id | q_num | timestamp |
    +----+--------+-------------+-----------+-------+-----------+
    | 5  | show   | 285         | null      | 1     | 123       |
    | 5  | answer | 285         | 124124    | 1     | 124       |
    | 5  | show   | 369         | null      | 2     | 125       |
    | 5  | skip   | 369         | null      | 2     | 126       |
    +----+--------+-------------+-----------+-------+-----------+
Output: 

    +------------+
    | survey_log |
    +------------+
    | 285        |
    +------------+
Explanation: 

    Question 285 was showed 1 time and answered 1 time. The answer rate of question 285 is 1.0
    Question 369 was showed 1 time and was not answered. The answer rate of question 369 is 0.0
    Question 285 has the highest answer rate.
    
    
Steps and Explanation:

Filter the show and answer Actions:

    We need to count how many times each question was shown and how many times it was answered.
    In the SurveyLog table, a show action means the question was displayed to a user, and an answer action means the user provided an answer to that question.
    
Aggregate Counts by Question:

    We can use conditional aggregation to count show and answer actions for each question_id.
    
Calculate the Answer Rate:

    The answer rate for each question is calculated as the count of answers divided by the count of shows.
    To avoid division by zero, we handle cases where a question was shown but never answered.
    
Select the Question with the Highest Answer Rate:

    We need to find the question with the maximum answer rate. If multiple questions have the same answer rate, we return the question with the smallest question_id.

In [None]:
# Write your MySQL query statement below
SELECT question_id as survey_log
FROM (
    SELECT 
        question_id,
        SUM(action = 'answer') / SUM(action = 'show') AS answer_rate
    FROM SurveyLog
    GROUP BY question_id
) AS AnswerRates
ORDER BY answer_rate DESC, question_id ASC
LIMIT 1;


Explanation of the SQL Query:

Inner Query (AnswerRates):

    SUM(action = 'answer') counts how many times each question was answered. This works because MySQL treats TRUE as 1 and FALSE as 0.
    SUM(action = 'show') counts how many times each question was shown.
    The answer_rate is calculated by dividing the answer count by the show count for each question_id.
    We GROUP BY question_id to calculate these values per question.
    
Outer Query:

    ORDER BY answer_rate DESC, question_id ASC: This sorts the results by answer rate in descending order so that the highest rate is at the top. If there are ties, it uses question_id in ascending order to select the smallest question_id.
    LIMIT 1 ensures we return only the question with the highest answer rate and, in the case of ties, the smallest question_id.
    
Edge Cases:

    Questions Never Shown: Questions that were answered but never shown should not be included, but they won’t appear because we divide by SUM(action = 'show').
    No Answers: If a question was shown but never answered, the answer rate will be 0. This question may be included if it has the highest rate in cases where no questions are answered.
    Multiple Questions with the Same Answer Rate: When multiple questions have the same answer rate, this query will pick the question with the smallest question_id.
    
Complexity Analysis:

    Time Complexity:  O(n), where n is the number of rows in the SurveyLog table. The GROUP BY operation iterates over each row, and the ORDER BY with LIMIT is efficient given it’s just selecting one row.
    Space Complexity: O(m), where m is the number of unique question_ids, as we store these results in temporary memory for sorting and filtering.

# Apples & Oranges
 
Table: Sales

    +---------------+---------+
    | Column Name   | Type    |
    +---------------+---------+
    | sale_date     | date    |
    | fruit         | enum    | 
    | sold_num      | int     | 
    +---------------+---------+
(sale_date, fruit) is the primary key (combination of columns with unique values) of this table.
This table contains the sales of "apples" and "oranges" sold each day.
 

Write a solution to report the difference between the number of apples and oranges sold each day.

Return the result table ordered by sale_date.

The result format is in the following example.

 

Example 1:

Input: 
Sales table:

    +------------+------------+-------------+
    | sale_date  | fruit      | sold_num    |
    +------------+------------+-------------+
    | 2020-05-01 | apples     | 10          |
    | 2020-05-01 | oranges    | 8           |
    | 2020-05-02 | apples     | 15          |
    | 2020-05-02 | oranges    | 15          |
    | 2020-05-03 | apples     | 20          |
    | 2020-05-03 | oranges    | 0           |
    | 2020-05-04 | apples     | 15          |
    | 2020-05-04 | oranges    | 16          |
    +------------+------------+-------------+
Output: 

    +------------+--------------+
    | sale_date  | diff         |
    +------------+--------------+
    | 2020-05-01 | 2            |
    | 2020-05-02 | 0            |
    | 2020-05-03 | 20           |
    | 2020-05-04 | -1           |
    +------------+--------------+
    
Explanation: 

    Day 2020-05-01, 10 apples and 8 oranges were sold (Difference  10 - 8 = 2).
    Day 2020-05-02, 15 apples and 15 oranges were sold (Difference 15 - 15 = 0).
    Day 2020-05-03, 20 apples and 0 oranges were sold (Difference 20 - 0 = 20).
    Day 2020-05-04, 15 apples and 16 oranges were sold (Difference 15 - 16 = -1).

Steps and Explanation:

    Aggregate Sales by Date: We'll need to sum the sold_num for apples and oranges separately for each sale_date.
    Calculate the Difference: Once we have the total sold numbers for both fruits for each date, we can calculate the difference by subtracting the total number of oranges sold from the total number of apples sold.
    Order the Results: Finally, we will order the results by sale_date to ensure the output is in the correct chronological order.
    
    
Explanation of the SQL Query:

SUM with CASE:

    SUM(CASE WHEN fruit = 'apples' THEN sold_num ELSE 0 END): This sums up the sold_num for apples only. If the fruit is not apples, it adds 0.
    SUM(CASE WHEN fruit = 'oranges' THEN sold_num ELSE 0 END): Similarly, this sums up the sold_num for oranges.
    Calculating the Difference:

    The difference is calculated by subtracting the total number of oranges sold from the total number of apples sold for each date.
    
GROUP BY:
    
    GROUP BY sale_date: This groups the results by each date, so we get a single result row for each date.
ORDER BY:
    
    ORDER BY sale_date: This sorts the final results in chronological order by the sale date.
    
Edge Cases:

    Dates with Only One Fruit: If a date has only apples or only oranges sold, the difference will reflect that, such as being positive or negative or even zero.

    No Sales Data for a Date: If there are no sales recorded for a date in the table, that date won't appear in the output at all since it does not satisfy the GROUP BY clause.

    Handling Null Values: In this query, null values for sold_num in the original table are not an issue since we're using conditional aggregation that defaults to 0.

Complexity Analysis:

    Time Complexity: O(n), where n is the number of rows in the Sales table. The query scans through the entire table once to aggregate data.
    Space Complexity: O(d), where d is the number of unique sale_dates, since we are storing results for each distinct date in the output.

In [None]:
SELECT sale_date,
       SUM(CASE WHEN fruit = 'apples' THEN sold_num ELSE 0 END) -
       SUM(CASE WHEN fruit = 'oranges' THEN sold_num ELSE 0 END) AS diff
FROM Sales
GROUP BY sale_date
ORDER BY sale_date;


# Winning Candidate
 
Table: Candidate

    +-------------+----------+
    | Column Name | Type     |
    +-------------+----------+
    | id          | int      |
    | name        | varchar  |
    +-------------+----------+
id is the column with unique values for this table.

Each row of this table contains information about the id and the name of a candidate.
 

Table: Vote

    +-------------+------+
    | Column Name | Type |
    +-------------+------+
    | id          | int  |
    | candidateId | int  |
    +-------------+------+
id is an auto-increment primary key (column with unique values).
candidateId is a foreign key (reference column) to id from the Candidate table.

Each row of this table determines the candidate who got the ith vote in the elections.
 

Write a solution to report the name of the winning candidate (i.e., the candidate who got the largest number of votes).

The test cases are generated so that exactly one candidate wins the elections.

The result format is in the following example.

 

Example 1:

Input: 
Candidate table:

    +----+------+
    | id | name |
    +----+------+
    | 1  | A    |
    | 2  | B    |
    | 3  | C    |
    | 4  | D    |
    | 5  | E    |
    +----+------+
Vote table:

    +----+-------------+
    | id | candidateId |
    +----+-------------+
    | 1  | 2           |
    | 2  | 4           |
    | 3  | 3           |
    | 4  | 2           |
    | 5  | 5           |
    +----+-------------+
    
Output: 

+------+
| name |
+------+
| B    |
+------+

Explanation: 
    Candidate B has 2 votes. Candidates C, D, and E have 1 vote each.
    The winner is candidate B.
    


In [None]:
SELECT c.name
FROM Candidate c
JOIN (
    SELECT candidateId, COUNT(*) AS vote_count
    FROM Vote
    GROUP BY candidateId
) v ON c.id = v.candidateId
ORDER BY v.vote_count DESC
LIMIT 1;

Steps:

    Count the Votes: We will count the number of votes each candidate received by joining the Vote table with the Candidate table based on the candidate's ID.
    Determine the Winner: After counting the votes, we will find the candidate with the maximum number of votes. Given that the problem states there is always one winner, we don't need to handle ties.
    Return the Name of the Winning Candidate: Finally, we will return the name of the winning candidate.
    
Edge Cases:

    Exactly One Candidate Wins: The problem specifies that there will always be exactly one winner, so we do not need to account for ties or situations where no votes are cast.
    Candidates with No Votes: Candidates who have not received any votes will not appear in the results from the inner query, so they will not affect the final output.
    No Votes Cast: Although not applicable per the problem's constraints, if there were no votes, the query would return an empty result.
    
Complexity Analysis:

    Time Complexity: O(n+m), where n is the number of rows in the Vote table and m is the number of rows in the Candidate table. The inner query iterates through all votes, and the join operation checks against all candidates.
    Space Complexity: O(k), where k is the number of unique candidates. The inner query results need to be stored temporarily before joining with the Candidate table.

# Report Contiguous Dates
 
Table: Failed

    +--------------+---------+
    | Column Name  | Type    |
    +--------------+---------+
    | fail_date    | date    |
    +--------------+---------+
    
fail_date is the primary key (column with unique values) for this table.

This table contains the days of failed tasks.
 

Table: Succeeded

    +--------------+---------+
    | Column Name  | Type    |
    +--------------+---------+
    | success_date | date    |
    +--------------+---------+
success_date is the primary key (column with unique values) for this table.

This table contains the days of succeeded tasks.
 

A system is running one task every day. Every task is independent of the previous tasks. The tasks can fail or succeed.

Write a solution to report the period_state for each continuous interval of days in the period from 2019-01-01 to 2019-12-31.

period_state is 'failed' if tasks in this interval failed or 'succeeded' if tasks in this interval succeeded. 
Interval of days are retrieved as start_date and end_date.

Return the result table ordered by start_date.

The result format is in the following example.

 

Example 1:

Input: 
Failed table:

    +-------------------+
    | fail_date         |
    +-------------------+
    | 2018-12-28        |
    | 2018-12-29        |
    | 2019-01-04        |
    | 2019-01-05        |
    +-------------------+
    
Succeeded table:

    +-------------------+
    | success_date      |
    +-------------------+
    | 2018-12-30        |
    | 2018-12-31        |
    | 2019-01-01        |
    | 2019-01-02        |
    | 2019-01-03        |
    | 2019-01-06        |
    +-------------------+
    
Output: 

    +--------------+--------------+--------------+
    | period_state | start_date   | end_date     |
    +--------------+--------------+--------------+
    | succeeded    | 2019-01-01   | 2019-01-03   |
    | failed       | 2019-01-04   | 2019-01-05   |
    | succeeded    | 2019-01-06   | 2019-01-06   |
    +--------------+--------------+--------------+
    
Explanation: 

    The report ignored the system state in 2018 as we care about the system in the period 2019-01-01 to 2019-12-31.
    From 2019-01-01 to 2019-01-03 all tasks succeeded and the system state was "succeeded".
    From 2019-01-04 to 2019-01-05 all tasks failed and the system state was "failed".
    From 2019-01-06 to 2019-01-06 all tasks succeeded and the system state was "succeeded".

In [None]:
SELECT stats AS period_state, MIN(day) AS start_date, MAX(day) AS end_date
FROM (
    SELECT 
        day, 
        RANK() OVER (ORDER BY day) AS overall_ranking, 
        stats, 
        rk, 
        (RANK() OVER (ORDER BY day) - rk) AS inv
    FROM (
        SELECT fail_date AS day, 'failed' AS stats, RANK() OVER (ORDER BY fail_date) AS rk
        FROM Failed
        WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'
        UNION 
        SELECT success_date AS day, 'succeeded' AS stats, RANK() OVER (ORDER BY success_date) AS rk
        FROM Succeeded
        WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31'
    ) t
) c
GROUP BY inv, stats
ORDER BY start_date;


Breakdown of the SQL Query

Inner Query (Union of Failures and Successes):

    Combines dates from the Failed and Succeeded tables within the specified date range (2019).
    Uses UNION to merge both tables into one result set with a common structure.
    Assigns ranks to each date within their respective states.

Ranking and Calculating Inversions:

    The ranks help in identifying the sequence of days, facilitating the detection of continuous intervals.
    The inv calculation allows the grouping of continuous days of the same state by subtracting the rk from the overall ranking.

Grouping by State:

    Aggregates the results to summarize the continuous periods of each state, utilizing MIN and MAX to find the range of dates for each state.

Final Output:

    Returns the period state, start date, and end date for each continuous period, ordered by start_date.
    
Edge Cases Considered

    No Data for 2019: If both Failed and Succeeded tables have no entries for the year 2019, the output will be empty, as there are no dates to evaluate.
    Continuous Successes or Failures: If all tasks are either successful or failed for the entire year, the output will consist of a single row capturing the entire range of the year (e.g., all succeeded from 2019-01-01 to 2019-12-31).
    Interleaved Dates: If there are entries with multiple successes and failures on the same day, the ranks will correctly allow for the periods to be recognized and separated, ensuring accurate output.
    Dates Without Both States: If a day appears in one state but not the other, the query will still generate results for those continuous periods, ensuring that no gaps are overlooked.

Edge Date Handling:

    The query explicitly filters dates to fall within the year 2019, thus it does not include entries from the previous or following year, avoiding incorrect aggregations.
    
Complexity Analysis

Time Complexity: 

    The time complexity of this query is O(N log N), where N is the total number of entries across both the Failed and Succeeded tables. This is primarily due to the use of the RANK() function, which involves sorting the records by date.
    The final grouping and aggregation step (using GROUP BY) also contributes to the overall complexity but is generally linear with respect to the number of groups formed.

    Space Complexity: 
    The space complexity is O(N), as the query needs to store the combined results from both tables, including the calculated ranks and states.
    The result set in memory grows with the number of unique days present in the input tables, which can affect the overall space requirement depending on the distribution of the data.

# Page Recommendations II
 
Table: Friendship

    +---------------+---------+
    | Column Name   | Type    |
    +---------------+---------+
    | user1_id      | int     |
    | user2_id      | int     |
    +---------------+---------+

    (user1_id, user2_id) is the primary key (combination of columns with unique values) for this table.

    Each row of this table indicates that the users user1_id and user2_id are friends.
 

Table: Likes

    +-------------+---------+
    | Column Name | Type    |
    +-------------+---------+
    | user_id     | int     |
    | page_id     | int     |
    +-------------+---------+
    (user_id, page_id) is the primary key (combination of columns with unique values) for this table.
    Each row of this table indicates that user_id likes page_id.
 

    You are implementing a page recommendation system for a social media website. Your system will recommend a page to user_id if the page is liked by at least one friend of user_id and is not liked by user_id.

    Write a solution to find all the possible page recommendations for every user. Each recommendation should appear as a row in the result table with these columns:

    user_id: The ID of the user that your system is making the recommendation to.
    page_id: The ID of the page that will be recommended to user_id.
    friends_likes: The number of the friends of user_id that like page_id.
    Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Friendship table:

    +----------+----------+
    | user1_id | user2_id |
    +----------+----------+
    | 1        | 2        |
    | 1        | 3        |
    | 1        | 4        |
    | 2        | 3        |
    | 2        | 4        |
    | 2        | 5        |
    | 6        | 1        |
    +----------+----------+
    
Likes table:

    +---------+---------+
    | user_id | page_id |
    +---------+---------+
    | 1       | 88      |
    | 2       | 23      |
    | 3       | 24      |
    | 4       | 56      |
    | 5       | 11      |
    | 6       | 33      |
    | 2       | 77      |
    | 3       | 77      |
    | 6       | 88      |
    +---------+---------+
Output: 

    +---------+---------+---------------+
    | user_id | page_id | friends_likes |
    +---------+---------+---------------+
    | 1       | 77      | 2             |
    | 1       | 23      | 1             |
    | 1       | 24      | 1             |
    | 1       | 56      | 1             |
    | 1       | 33      | 1             |
    | 2       | 24      | 1             |
    | 2       | 56      | 1             |
    | 2       | 11      | 1             |
    | 2       | 88      | 1             |
    | 3       | 88      | 1             |
    | 3       | 23      | 1             |
    | 4       | 88      | 1             |
    | 4       | 77      | 1             |
    | 4       | 23      | 1             |
    | 5       | 77      | 1             |
    | 5       | 23      | 1             |
    +---------+---------+---------------+
    
Explanation: 

Take user 1 as an example:
  - User 1 is friends with users 2, 3, 4, and 6.
  - Recommended pages are 23 (user 2 liked it), 24 (user 3 liked it), 56 (user 3 liked it), 33 (user 6 liked it), and 77 (user 2 and user 3 liked it).
  - Note that page 88 is not recommended because user 1 already liked it.

Another example is user 6:
  - User 6 is friends with user 1.
  - User 1 only liked page 88, but user 6 already liked it. Hence, user 6 has no recommendations.

You can recommend pages for users 2, 3, 4, and 5 using a similar process.

In [None]:
SELECT user1_id AS user_id, page_id, COUNT(user_id) AS friends_likes
FROM
(
    SELECT a.user1_id, b.user_id, b.page_id  -- user, all user friends, page_id
    FROM Friendship AS a
    JOIN Likes AS b ON a.user2_id = b.user_id
    UNION 
    SELECT a.user2_id, b.user_id, b.page_id
    FROM Friendship AS a
    JOIN Likes AS b ON a.user1_id = b.user_id
) a
WHERE CONCAT(user1_id, ",", page_id) NOT IN
    (SELECT CONCAT(user_id, ",", page_id) FROM Likes)
GROUP BY user1_id, page_id;


Breakdown of the SQL Query

Selecting the Required Columns:

    The query selects user1_id (aliased as user_id), page_id, and counts the number of friends who liked each page using COUNT(user_id) (aliased as friends_likes). This provides the necessary output structure for the recommendations.
    
Inner Query (Union of Friendships and Likes):

    The inner query performs two JOIN operations between the Friendship table (aliased as a) and the Likes table (aliased as b):
        The first SELECT statement joins where user2_id from the Friendship table matches the user_id from the Likes table, capturing pages liked by friends of user1_id.
        The second SELECT statement does the opposite, joining on user1_id to include the cases where user2_id is the friend of the user.
    The use of UNION ensures that all unique combinations of users, their friends, and the pages liked are captured.
    
Filtering Out Liked Pages:

    The WHERE clause filters the results to exclude any page that the user (represented by user1_id) has already liked. It does this by checking if the combination of user1_id and page_id is present in the Likes table using a NOT IN subquery.
    The CONCAT function is used to create a unique identifier for each user-page combination, simplifying the comparison.
    
Grouping and Counting:

    The GROUP BY user1_id, page_id clause aggregates the results by user and page, allowing the COUNT(user_id) to calculate the number of friends who liked each recommended page.
    This step summarizes how many friends of each user liked each page, forming the basis of the recommendations.

Example Explanation
Using the provided data:

For User 1:

    Friends: User 2, 3, 4, 6
    Recommended pages:
    Page 23 (liked by User 2)
    Page 24 (liked by User 3)
    Page 56 (liked by User 4)
    Page 33 (liked by User 6)
    Page 77 (liked by User 2 and User 3)
    Not recommended: Page 88 (liked by User 1)
For User 6:

    Friends: User 1
    No recommendations, as User 1's only liked page (88) is already liked by User 6.

Edge Cases Considered

    No Friendships: If there are no entries in the Friendship table, the output will be empty, as no recommendations can be made.
    No Likes: If the Likes table has no entries for a user, they will not receive any recommendations, resulting in an empty output for that user.
    Mutual Likes: If a user and their friend have mutual likes on certain pages, those pages won't appear in the recommendations, ensuring that the system suggests only unliked pages.
    Multiple Likes for a Page: If multiple friends like the same page, the count will correctly reflect the total number of friends who liked that page.
    
Complexity Analysis

    Time Complexity: The time complexity of this query is approximately O(N + M), where N is the number of friendships and M is the number of likes. The UNION operation processes both tables, and the GROUP BY operation aggregates the results.
    Space Complexity: The space complexity is O(P), where P is the number of unique pages being recommended. The intermediate results will occupy memory based on the size of the result set, which is determined by the number of user-page combinations generated from the friends' likes.