# Second Day Confirmation  
**TikTok SQL Interview Question**

---

## Question

Assume you're given tables with information about TikTok user sign-ups and confirmations through email and text.  
New users on TikTok sign up using their email addresses, and upon sign-up, each user receives a text message confirmation to activate their account.

Write a query to display the **user IDs of those who did not confirm their sign-up on the first day, but confirmed on the second day**.

---

## Definition

- `action_date` refers to the date when users activated their accounts and confirmed their sign-up through text messages.

---

## Schema

### `emails` Table:
| Column Name | Type      |
|-------------|-----------|
| email_id    | integer   |
| user_id     | integer   |
| signup_date | datetime  |

### `texts` Table:
| Column Name    | Type      |
|----------------|-----------|
| text_id        | integer   |
| email_id       | integer   |
| signup_action  | string    |
| action_date    | datetime  |

---

## Example Input

### `emails`
| email_id | user_id | signup_date         |
|----------|---------|---------------------|
| 125      | 7771    | 06/14/2022 00:00:00 |
| 433      | 1052    | 07/09/2022 00:00:00 |

### `texts`
| text_id | email_id | signup_action | action_date         |
|---------|----------|----------------|---------------------|
| 6878    | 125      | Confirmed      | 06/14/2022 00:00:00 |
| 6997    | 433      | Not Confirmed  | 07/09/2022 00:00:00 |
| 7000    | 433      | Confirmed      | 07/10/2022 00:00:00 |

---

## Example Output:
| user_id |
|---------|
| 1052    |

---

## Explanation

User **1052** signed up on **07/09/2022** and initially did **not confirm** their account.  
However, on the **next day (07/10/2022)**, they **confirmed** their sign-up.

This matches the condition of confirming on the **second day**, and hence their user ID is included in the result.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from datetime import datetime
from pyspark.sql.functions import *

# Start Spark session
spark = SparkSession.builder.master('local[1]').appName("SecondDayConfirmation").getOrCreate()

# Define schema for emails table
emails_schema = StructType([
    StructField("email_id", IntegerType(), True),
    StructField("user_id", IntegerType(), True),
    StructField("signup_date", TimestampType(), True)
])

# Define data for emails table
emails_data = [
    (125, 7771, datetime(2022, 6, 14, 0, 0, 0)),
    (433, 1052, datetime(2022, 7, 9, 0, 0, 0))
]

# Create emails DataFrame
emails_df = spark.createDataFrame(emails_data, schema=emails_schema)

# Define schema for texts table
texts_schema = StructType([
    StructField("text_id", IntegerType(), True),
    StructField("email_id", IntegerType(), True),
    StructField("signup_action", StringType(), True),
    StructField("action_date", TimestampType(), True)
])

# Define data for texts table
texts_data = [
    (6878, 125, "Confirmed", datetime(2022, 6, 14, 0, 0, 0)),
    (6997, 433, "Not Confirmed", datetime(2022, 7, 9, 0, 0, 0)),
    (7000, 433, "Confirmed", datetime(2022, 7, 10, 0, 0, 0))
]

# Create texts DataFrame
texts_df = spark.createDataFrame(texts_data, schema=texts_schema)

# Show the DataFrames
emails_df.show()
texts_df.show()


+--------+-------+-------------------+
|email_id|user_id|        signup_date|
+--------+-------+-------------------+
|     125|   7771|2022-06-14 00:00:00|
|     433|   1052|2022-07-09 00:00:00|
+--------+-------+-------------------+

+-------+--------+-------------+-------------------+
|text_id|email_id|signup_action|        action_date|
+-------+--------+-------------+-------------------+
|   6878|     125|    Confirmed|2022-06-14 00:00:00|
|   6997|     433|Not Confirmed|2022-07-09 00:00:00|
|   7000|     433|    Confirmed|2022-07-10 00:00:00|
+-------+--------+-------------+-------------------+



In [None]:
texts_df.where('signup_action="Confirmed"')\
    .join(emails_df,['email_id'])\
    .where(date_diff('action_date','signup_date')==1)\
    .select('user_id').show()

+-------+
|user_id|
+-------+
|   1052|
+-------+



In [22]:
emails_df.createOrReplaceTempView('emails')
texts_df.createOrReplaceTempView('texts')

spark.sql(''' 
select user_id 
    from texts join emails
    using(email_id)
    where date_diff(action_date,signup_date)=1 and signup_action="Confirmed"       
''').show()

+-------+
|user_id|
+-------+
|   1052|
+-------+

