## Loading the Data

In [0]:
%run ./DataLemur_DataLoad

database,tableName,isTemporary
datalemur_q,candidates_skills,False
datalemur_q,emails,False
datalemur_q,items_per_order,False
datalemur_q,job_listings,False
datalemur_q,messages,False
datalemur_q,monthly_cards_issued,False
datalemur_q,page_likes,False
datalemur_q,pages,False
datalemur_q,parts_assembly,False
datalemur_q,pharmacy_sales_p1,False


In [0]:
%sql
show tables;

database,tableName,isTemporary


## Questions in SQL(Easy)

#### Q1 : Data Science Skills [LinkedIn SQL Interview Question]

In [0]:
%sql
-- Given a table of candidates and their skills, you're tasked with finding the candidates best suited for an open Data Science job. You want to find candidates who are proficient in Python, Tableau, and PostgreSQL.

-- Write a query to list the candidates who possess all of the required skills for the job. Sort the output by candidate ID in ascending order.
-- use the table candidates_skills

-- https://datalemur.com/questions/matching-skills

In [0]:
%sql
select candidate_id
from datalemur_q.candidates_skills
where skill in ('Python', 'Tableau', 'PostgreSQL')
group by candidate_id
having count(candidate_id) = 3
order by candidate_id

candidate_id
123


#### Q2 : Histogram of Tweets [Twitter SQL Interview Question]

In [0]:
%sql
-- Assume you're given a table Twitter tweet data, write a query to obtain a histogram of tweets posted per user in 2022. Output the tweet count per user as the bucket and the number of Twitter users who fall into that bucket.

-- In other words, group the users by the number of tweets they posted in 2022 and count the number of users in each group.

-- use tweets table for this exercise
-- question from datalemur
-- https://datalemur.com/questions/sql-histogram-tweets

In [0]:
%sql
with tb1 as (
  select user_id, count(*) tweet_bucket
  from datalemur_q.tweets
  where year(from_unixtime(unix_timestamp(tweet_date, 'MM/dd/yyyy HH:mm:ss'))) = '2022'
  group by user_id
)
select tweet_bucket, count(*) users_num from tb1
group by tweet_bucket

tweet_bucket,users_num
1,2
2,1


#### Q3 : Page With No Likes [Facebook SQL Interview Question]

In [0]:
%sql
-- Assume you're given two tables containing data about Facebook Pages and their respective likes (as in "Like a Facebook Page").

-- Write a query to return the IDs of the Facebook pages that have zero likes. The output should be sorted in ascending order based on the page IDs.

-- https://datalemur.com/questions/sql-page-with-no-likes

-- use pages & page_likes table for this task

In [0]:
%sql
with tb1 as (
  select p.page_id, pl.liked_date
  from datalemur_q.pages p
  left join datalemur_q.page_likes pl on (p.page_id = pl.page_id)
  where pl.liked_date is NULL
)
select page_id from tb1 ORDER BY page_id

page_id
20701



#### Q4 : Unfinished Parts [Tesla SQL Interview Question]

In [0]:
%sql
-- Tesla is investigating production bottlenecks and they need your help to extract the relevant data. Write a query to determine which parts have begun the assembly process but are not yet finished.

-- Assumptions:

-- parts_assembly table contains all parts currently in production, each at varying stages of the assembly process.
-- An unfinished part is one that lacks a finish_date.
-- This question is straightforward, so let's approach it with simplicity in both thinking and solution.

-- Effective April 11th 2023, the problem statement and assumptions were updated to enhance clarity.

-- # https://datalemur.com/questions/tesla-unfinished-parts
-- use the table parts_assembly for this task

In [0]:
%sql
select part, assembly_step from datalemur_q.parts_assembly where finish_date is null

part,assembly_step
bumper,3
bumper,4


#### Q5 : Laptop vs. Mobile Viewership [New York Times SQL Interview Question]

In [0]:
%sql
-- Assume you're given the table on user viewership categorised by device type where the three types are laptop, tablet, and phone.

-- Write a query that calculates the total viewership for laptops and mobile devices where mobile is defined as the sum of tablet and phone viewership. Output the total viewership for laptops as laptop_reviews and the total viewership for mobile devices as mobile_views.

-- use viewership table for this task

-- https://datalemur.com/questions/laptop-mobile-viewership

In [0]:
%sql
-- SELECT 
--   COUNT(*) FILTER (WHERE device_type = 'laptop') AS laptop_views,
--   COUNT(*) FILTER (WHERE device_type IN ('tablet', 'phone'))  AS mobile_views 
-- FROM viewership; -- learn more on how to use this

select sum(case when device_type = 'laptop' then 1 else 0 end) laptop_views,
  sum(case when device_type = 'laptop' then 0 else 1 end) mobile_views
from datalemur_q.viewership

laptop_views,mobile_views
2,3


#### Q6 : Average Post Hiatus (Part 1) [Facebook SQL Interview Question]

In [0]:
%sql
-- Given a table of Facebook posts, for each user who posted at least twice in 2021, write a query to find the number of days between each user’s first post of the year and last post of the year in the year 2021. Output the user and number of the days between each user's first and last post.

-- use posts table for this task

-- https://datalemur.com/questions/sql-average-post-hiatus-1

In [0]:
%sql
with tb1 as (
  select user_id, from_unixtime(unix_timestamp(post_date, 'MM/dd/yyyy HH:mm:ss')) days_between
  from datalemur_q.posts order by user_id, days_between asc  
) select user_id, datediff(max(days_between), min(days_between)) days_between
from tb1
group by user_id
having count(user_id) >= 2

user_id,days_between
151652,2
661093,21


#### Q7 : Teams Power Users [Microsoft SQL Interview Question]

In [0]:
%sql
-- write a Query to identify the top 2 Power Users who sent the highest number of messages on Microsoft Teams in August 2022. Display the IDs of these 2 users along with the total number of messages they sent. Output the results in descending order based on the count of messages.
-- Assumption:
-- No 2 users have semt the same number of messages in August2022

-- https://datalemur.com/questions/teams-power-users
-- use messages table for this task

In [0]:
%sql
select sender_id, count(*) messages_sent from datalemur_q.messages
where month(from_unixtime(unix_timestamp(sent_date, 'MM/dd/yyyy HH:mm:ss')) ) = '08'
group by sender_id

sender_id,messages_sent
3601,2
4500,1


#### Q8 : Duplicate Job Listings [Linkedin SQL Interview Question]

In [0]:
%sql
-- Assume you're given a table containing the job postings from various companies on the Linkedin Platform. write a query to retrive the count of companies that have posted duplicate job postings

-- Definition:
-- Duplicate job postings are defined as 2 job listings within the same company that share identical titles and descriptions

-- https://datalemur.com/questions/teams-power-users
-- use the table job_listings for this task

In [0]:
%sql
select count(distinct company_id) duplicate_companies
from datalemur_q.job_listings
group by company_id, title, description
having count(1) >= 2

duplicate_companies
1


#### Q9 : Cities with Completed Trades [Robinhood SQL Interview Question]

In [0]:
%sql
-- Assume you're given the tables containing completed trade orders and user details in a Robinhood trading system

-- write a query to retrive the top 3 cities that have the highest number of completed trade orders listed in descending order. output the city name and the corresponding number of completed trade orders

-- https://datalemur.com/questions/completed-trades
-- use the tables trades & users for this task

In [0]:
%sql
select u.city , count(*) total_orders
from datalemur_q.trades t
  inner join datalemur_q.users u on t.user_id = u.user_id
where t.status = 'Completed'
group by u.city
order by total_orders desc

city,total_orders
San Francisco,3
Boston,2
Denver,1


#### Q10 : Average Review Ratings [Amazon SQL Interview Questions]

In [0]:
%sql
-- Given the reviews table, write a query to retrive the average star rating for each product, grouped by month. The output should display the month as a numerical value, product ID and average star rating rounded to 2 decimal places. sort the output first by month and then by product ID

-- https://datalemur.com/questions/sql-avg-review-ratings
-- use the table reviews for this task

In [0]:
%sql
select month(from_unixtime(unix_timestamp(submit_date, 'MM/dd/yyyy HH:mm:ss'))) mth, product_id, round(avg(stars), 2) avg_stars
from datalemur_q.reviews
group by month(from_unixtime(unix_timestamp(submit_date, 'MM/dd/yyyy HH:mm:ss'))), product_id
order by mth, product_id


mth,product_id,avg_stars
6,50001,3.5
6,69852,4.0
7,69852,2.5


#### Q11 : App Click-through Rate (CTR)[Facebook Interview Question]

In [0]:
%sql
-- Assume you have an events table on facebook app analytics.
-- write a query to calculate the click through rate (CTR) for the app in 2022 and round the results to 2 decimal places

-- definition and note:
-- percentage of Click Through Rate(CTR) = 100*NumberOfClicks/NumberOfImpressions
-- To avoid Integer division multiply the CTR by 100.0, not 100

-- https://datalemur.com/questions/click-through-rate
-- use the table events for this task

In [0]:
%sql
select app_id, (sum(case when event_type='click' then 1 else 0 end) / sum(case when event_type='impression' then 1 else 0 end) * 100.0) ctr 
from datalemur_q.events
group by app_id

app_id,ctr
234,100.0
123,50.0


#### Q12 : Second Day Confirmation [TikTok SQL Interview Question]

In [0]:
%sql
-- Assume you're given tables with information about tiktok user sign-ups and confirmations through email and text. new users on tik tok sign up using their email addresses and upon sign-up, each user receives a text message confirmation to activate their account
-- write a query to display the user IDs of those who did not confirm their sign-ups on the first day, but confirmed on second day

-- definition:
-- action_date refers to the date when users activated their accounts and confirmed their sign-ups through text messages

-- https://datalemur.com/questions/second-day-confirmation
-- use the table emails, texts for this task

In [0]:
%sql
select * from datalemur_q.emails

email_id,user_id,signup_date
125,7771,06/14/2022 00:00:00
433,1052,07/09/2022 00:00:00


In [0]:
%sql
select * from datalemur_q.texts

text_id,email_id,signup_action,action_date
6878,125,Confirmed,06/14/2022 00:00:00
6997,433,Not Confirmed,07/09/2022 00:00:00
7000,433,Confirmed,07/10/2022 00:00:00


In [0]:
%sql
select e.user_id 
--e.email_id, e.user_id, from_unixtime(unix_timestamp(e.signup_date, 'MM/dd/yyyy HH:mm:ss')) signup_date,t.signup_action, from_unixtime(unix_timestamp(t.action_date, 'MM/dd/yyyy HH:mm:ss')) action_date
from datalemur_q.emails e
  inner join texts t on (e.email_id = t.email_id)
where signup_action = 'Confirmed' and date_diff(from_unixtime(unix_timestamp(t.action_date, 'MM/dd/yyyy HH:mm:ss')), from_unixtime(unix_timestamp(e.signup_date, 'MM/dd/yyyy HH:mm:ss'))) = 1

user_id
1052


#### Q13 : Cards Issued Difference [JPMorgan Chase SQL Interview Question]

In [0]:
%sql
-- Your Team at JPMorgan chase is preparing to launch a new credit cards and to gain some insights, you're analyzing how many credit cards were issued each month
-- write a query that outputs the name of each credit card and the difference in the number of issued cards between the month with the highest issuance cards and the lowest issuance. arrange the results based on the largest disparity.

-- https://datalemur.com/questions/cards-issued-difference
-- use the table monthly_cards_issued for this task

In [0]:
%sql
select card_name, (max(issued_amount) - min(issued_amount)) difference
from datalemur_q.monthly_cards_issued
group by card_name
order by difference desc

card_name,difference
Chase Freedom Flex,15000
Chase Sapphire Reserve,10000


#### Q14 : Compressed Mean [Alibaba SQL Interview Queston]

In [0]:
%sql
-- you're trying to find the mean number of items per order on alibaba, rounded to 1 decimal place using tables which includes information on the count of items in each order (item_count_table) and the corresponding number of orders for each item count(order_occurances table)

-- https://datalemur.com/questions/alibaba-compressed-mean
-- use the table items_per_order
-- total items / total orders

In [0]:
%sql
select round(sum(item_count*order_occurrences)/sum(order_occurrences), 2) mean
from datalemur_q.items_per_order

mean
2.7


#### Q15 : Pharmacy Analytics (Part1) [CVS Health SQL Interview Question]

In [0]:
%sql
-- CVS Health is trying to better understand its pharmacy sales and how well different products are selling. each drug can only be produced by 1 manufacturer

-- write a query to find the top 3 most profitable drugs sold, and how much profit they made. assume that there are no ties in the profits. display the result from highest to the lowest total profit

-- definition:
-- cogs stands for the cost of goods sold which is the direct cost associated with producing the drug
-- total profit = total sales - cost of goods sold

-- https://datalemur.com/questions/top-profitable-drugs
-- use the table pharmacy_sales for this task

In [0]:
%sql
select drug, (total_sales - cogs) total_profit 
from datalemur_q.pharmacy_sales_p1
order by total_profit desc
limit 3

drug,total_profit
Zyprexa,84576.516
Varicose Relief,80926.66
Surmontil,79815.03


#### Q16 : Pharmacy Analytics (Part2) [CVS Health SQL Interview Question]

In [0]:
%sql
-- CVS Health is trying to better understand its pharmacy sales and how well different products are selling. each drug can only be produced by 1 manufacturer

-- write a query to identify the manufactures assosiated with drugs that resulted in loses for CVS health and calculate the total amount of loses incurred

-- output the manufacturer's name, number of drugs assosiated with losses, and the total losses in absolute value. display the resuls sorted in descending order with the highest losses displayed at the top

-- https://datalemur.com/questions/non-profitable-drugs
-- use the table pharmacy_sales_p2 for this task

In [0]:
%sql
select manufacturer, count(*) drug_count, sum(total_sales - cogs) total_profit_loss
from datalemur_q.pharmacy_sales_p2
where (total_sales - cogs) < 0
group by manufacturer
order by total_profit_loss asc

manufacturer,drug_count,total_profit_loss
Biogen,1,-297324.75
AbbVie,1,-221429.25
Eli Lilly,1,-221422.25


#### Q17 : Pharmacy Analytics (Part2) [CVS Health SQL Interview Question]

In [0]:
%sql
-- CVS Health wants to gain a clearer understanding of its pharmacy sales and the performance of various products

-- write a query to calculate the total drug sales for each manufacturer. round the answer to the nearest million and report your results in descending order of total sales. in case of any duplicates, sort them alphabetically by the manufacturer name.

-- since this data will be displayed on a dashboard viewed by the business stakeholders, please format your results as fallows "$36 million"

-- https://datalemur.com/questions/total-drugs-sales
-- use the table pharmacy_sales_p3 for this task

In [0]:
%sql
select manufacturer, '$' || ceil(round(sum(total_sales)/1000000, 2)) || ' million' sale
from datalemur_q.pharmacy_sales_p3
group by manufacturer

manufacturer,sale
Biogen,$4 million
Eli Lilly,$3 million


## Questions in SQL(Medium)

#### Q1 : User's Third Transaction [Uber SQL Interview Question]

In [0]:
%sql
-- Assume you are given the table below on Uber transactions made by users. Write a query to obtain the third transaction of every user. Output the user id, spend and transaction date.

-- use transactions table for this task
-- https://datalemur.com/questions/sql-third-transaction

In [0]:
%sql
with tb1 as (
  select *, row_number() over (partition by user_id order by transaction_date asc) rn
  from datalemur_q.transactions
) select user_id,spend, transaction_date
from tb1 where rn = 3

user_id,spend,transaction_date
111,89.6,2022-02-05T12:00:00.000+0000


#### Q2 : Sending vs. Opening Snaps [Snapchat SQL Interview Question]

In [0]:
%sql
-- Assume you're given tables with information on Snapchat users, including their ages and time spent sending and opening snaps.

-- Write a query to obtain a breakdown of the time spent sending vs. opening snaps as a percentage of total time spent on these activities grouped by age group. Round the percentage to 2 decimal places in the output.

-- Notes:

-- Calculate the following percentages:
-- time spent sending / (Time spent sending + Time spent opening)
-- Time spent opening / (Time spent sending + Time spent opening)
-- To avoid integer division in percentages, multiply by 100.0 and not 100.

-- use the tables activities, age_breakdown for this task
-- https://datalemur.com/questions/time-spent-snaps

In [0]:
%sql
with tb1 as (
  select b.age_bucket, 
    sum(case when a.activity_type = 'send' then a.time_spent else 0 end) / ( sum(case when a.activity_type = 'send' then a.time_spent else 0 end) + sum(case when a.activity_type = 'open' then a.time_spent else 0 end) )*100.0 send_perc,
    sum(case when a.activity_type = 'open' then a.time_spent else 0 end) / ( sum(case when a.activity_type = 'send' then a.time_spent else 0 end) + sum(case when a.activity_type = 'open' then a.time_spent else 0 end) )*100.0 open_perc
    -- sum() open_perc
  from datalemur_q.activities a
  inner join datalemur_q.age_breakdown b on (a.user_id = b.user_id)
  group by b.age_bucket
) select age_bucket, round(send_perc, 2 )send_perc, round(open_perc, 2) open_perc
from tb1
where ((send_perc is not null) or (open_perc is not null))

age_bucket,send_perc,open_perc
31-35,43.75,56.25
26-30,65.4,34.6


#### Q3 : Tweets' Rolling Averages [Twitter SQL Interview Question]

In [0]:
%sql
-- Given a table of tweet data over a specified time period, calculate the 3-day rolling average of tweets for each user. Output the user ID, tweet date, and rolling averages rounded to 2 decimal places.

-- Notes:

-- A rolling average, also known as a moving average or running mean is a time-series technique that examines trends in data over a specified period of time.
-- In this case, we want to determine how the tweet count for each user changes over a 3-day period.

-- https://datalemur.com/questions/rolling-average-tweets

-- use the table tweets_hd for this task

In [0]:
%sql
select *, round(avg(tweet_count) over (order by tweet_date asc rows between 2 preceding and current row), 2)rolling_avg
from datalemur_q.tweets_hd

user_id,tweet_date,tweet_count,rolling_avg
111,2022-06-01T00:00:00.000+0000,2,2.0
111,2022-06-02T00:00:00.000+0000,1,1.5
111,2022-06-03T00:00:00.000+0000,3,2.0
111,2022-06-04T00:00:00.000+0000,4,2.67
111,2022-06-05T00:00:00.000+0000,5,4.0


#### Q4 : Highest-Grossing Items [Amazon SQL Interview Question]

In [0]:
%sql
-- Assume you're given a table containing data on Amazon customers and their spending on products in different category, write a query to identify the top two highest-grossing products within each category in the year 2022. The output should include the category, product, and total spend.

-- https://datalemur.com/questions/sql-highest-grossing
-- use the table product_spend for this task

In [0]:
%sql
with tb1 as (
  select category, product, sum(spend) total_spend
  from datalemur_q.product_spend
  where year(transaction_date) = '2022'
  group by category, product
), tb2 as (
  select *, row_number() over (partition by category order by total_spend desc) rn
  from tb1
) select * from tb2 where rn <= 2


category,product,total_spend,rn
appliance,refrigerator,299.99,1
appliance,washing machine,219.8,2
electronics,vacuum,341.0,1
electronics,wireless headset,249.9,2


#### Q5 : Top 5 Artists [Spotify SQL Interview Question]

In [0]:
%sql
-- Assume there are three Spotify tables: artists, songs, and global_song_rank, which contain information about the artists, songs, and music charts, respectively.

-- Write a query to find the top 5 artists whose songs appear most frequently in the Top 10 of the global_song_rank table. Display the top 5 artist names in ascending order, along with their song appearance ranking.

-- If two or more artists have the same number of song appearances, they should be assigned the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5). If you've never seen a rank order like this before, do the rank window function tutorial.

-- https://datalemur.com/questions/top-fans-rank

-- use the tables artists, songs, global_song_rank for this task

In [0]:
%sql
with tb1 as (
  select
    a.artist_name, s.name, g.rank
  from datalemur_q.artists a
  inner join datalemur_q.songs s on (a.artist_id = s.artist_id)
  inner join datalemur_q.global_song_rank g on (s.song_id = g.song_id)
  where g.rank <= 10
), tb2 as (
  select *, rank() over (partition by artist_name order by rank desc) rk from tb1
) select artist_name, name song_name, rank song_rank
from tb2
where rk <= 10

artist_name,song_name,song_rank
Drake,Hotline Bling,3
Ed Sheeran,Shape of You,5
Ed Sheeran,Shape of You,2


In [0]:
%sql
select * from datalemur_q.songs

song_id,artist_id,name
55511,101,Perfect
45202,101,Shape of You
22222,120,One Dance
19960,120,Hotline Bling


In [0]:
%sql
select * from datalemur_q.global_song_rank

day,song_id,rank
1,45202,5
3,45202,2
1,19960,3
9,19960,15
