# Page With No Likes

## Facebook SQL Interview Question

### Question

Assume you're given two tables containing data about Facebook Pages and their respective likes (as in "Like a Facebook Page").

Write a query to return the IDs of the Facebook pages that have zero likes. The output should be sorted in ascending order based on the page IDs.

---

### Table: `pages`

| Column Name | Type     |
|-------------|----------|
| page_id     | integer  |
| page_name   | varchar  |

---

### Example Input for `pages` Table:

| page_id | page_name            |
|---------|----------------------|
| 20001   | SQL Solutions        |
| 20045   | Brain Exercises      |
| 20701   | Tips for Data Analysts |

---

### Table: `page_likes`

| Column Name | Type      |
|-------------|-----------|
| user_id     | integer   |
| page_id     | integer   |
| liked_date  | datetime  |

---

### Example Input for `page_likes` Table:

| user_id | page_id | liked_date           |
|---------|---------|----------------------|
| 111     | 20001   | 04/08/2022 00:00:00  |
| 121     | 20045   | 03/12/2022 00:00:00  |
| 156     | 20001   | 07/25/2022 00:00:00  |

---

### Example Output:

| page_id |
|---------|
| 20701   |

---

### Explanation

Page **20701** is returned because it has **zero likes** in the `page_likes` table. All other pages have at least one like.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from datetime import datetime

# Create Spark session
spark = SparkSession.builder.master('local').appName("FacebookPages").getOrCreate()

# Define the schema for pages table
pages_schema = StructType([
    StructField("page_id", IntegerType(), True),
    StructField("page_name", StringType(), True)
])

# Define the schema for page_likes table
page_likes_schema = StructType([
    StructField("user_id", IntegerType(), True),
    StructField("page_id", IntegerType(), True),
    StructField("liked_date", TimestampType(), True)
])

# Define the data for pages table
pages_data = [
    (20001, "SQL Solutions"),
    (20045, "Brain Exercises"),
    (20701, "Tips for Data Analysts")
]

# Define the data for page_likes table
page_likes_data = [
    (111, 20001, datetime(2022, 4, 8, 0, 0)),
    (121, 20045, datetime(2022, 3, 12, 0, 0)),
    (156, 20001, datetime(2022, 7, 25, 0, 0))
]

# Create Spark DataFrames
pages_df = spark.createDataFrame(pages_data, schema=pages_schema)
page_likes_df = spark.createDataFrame(page_likes_data, schema=page_likes_schema)

# Show the DataFrames
pages_df.show()
page_likes_df.show()

+-------+--------------------+
|page_id|           page_name|
+-------+--------------------+
|  20001|       SQL Solutions|
|  20045|     Brain Exercises|
|  20701|Tips for Data Ana...|
+-------+--------------------+

+-------+-------+-------------------+
|user_id|page_id|         liked_date|
+-------+-------+-------------------+
|    111|  20001|2022-04-08 00:00:00|
|    121|  20045|2022-03-12 00:00:00|
|    156|  20001|2022-07-25 00:00:00|
+-------+-------+-------------------+



In [None]:
#Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'.


In [17]:
pages_df.join(page_likes_df,['page_id'],'anti').select('page_id').show()

+-------+
|page_id|
+-------+
|  20701|
+-------+



In [5]:
pages_df.createOrReplaceTempView('pages')
page_likes_df.createOrReplaceTempView('page_likes')

In [12]:
%%sql

select page_id from 
    pages left join page_likes
    using(page_id)
    where user_id is null



+-------+
|page_id|
+-------+
|  20701|
+-------+

