# Signup Activation Rate  
**TikTok SQL Interview Question**

---

### Question  
New TikTok users sign up with their emails. They confirm their signup by replying to the text confirmation to activate their accounts. Users may receive multiple text messages for account confirmation until they have confirmed their new account.

A senior analyst is interested to know the activation rate of specified users in the `emails` table. Write a query to find the activation rate. Round the percentage to 2 decimal places.

---

### Definitions:
- The `emails` table contains the information of user signup details.
- The `texts` table contains the users' activation information.

---

### Assumptions:
- The analyst is interested in the activation rate of specific users in the `emails` table, which may not include all users that could potentially be found in the `texts` table.
- For example, user 123 in the `emails` table may not be in the `texts` table and vice versa.

*Effective April 4th 2023, we added an assumption to the question to provide additional clarity.*

---

### emails Table:
| Column Name | Type     |
|-------------|----------|
| email_id    | integer  |
| user_id     | integer  |
| signup_date | datetime |

#### Example Input:
| email_id | user_id | signup_date         |
|----------|---------|---------------------|
| 125      | 7771    | 06/14/2022 00:00:00 |
| 236      | 6950    | 07/01/2022 00:00:00 |
| 433      | 1052    | 07/09/2022 00:00:00 |

---

### texts Table:
| Column Name    | Type     |
|----------------|----------|
| text_id        | integer  |
| email_id       | integer  |
| signup_action  | varchar  |

#### Example Input:
| text_id | email_id | signup_action |
|---------|----------|----------------|
| 6878    | 125      | Confirmed      |
| 6920    | 236      | Not Confirmed  |
| 6994    | 236      | Confirmed      |

`'Confirmed'` in `signup_action` means the user has activated their account and successfully completed the signup process.

---

### Example Output:
| confirm_rate |
|--------------|
| 0.67         |

---

### Explanation:
67% of users have successfully completed their signup and activated their accounts. The remaining 33% have not yet replied to the text to confirm their signup.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from pyspark.sql.functions import *
from datetime import datetime

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("SignupActivationRate").getOrCreate()

# Sample data for emails table
emails_data = [
    (125, 7771, datetime.strptime("06/14/2022", "%m/%d/%Y")),
    (236, 6950, datetime.strptime("07/01/2022", "%m/%d/%Y")),
    (433, 1052, datetime.strptime("07/09/2022", "%m/%d/%Y"))
]

emails_schema = StructType([
    StructField("email_id", IntegerType(), True),
    StructField("user_id", IntegerType(), True),
    StructField("signup_date", TimestampType(), True)
])

emails_df = spark.createDataFrame(emails_data, schema=emails_schema)

# Sample data for texts table
texts_data = [
    (6878, 125, "Confirmed"),
    (6920, 236, "Not Confirmed"),
    (6994, 236, "Confirmed")
]

texts_schema = StructType([
    StructField("text_id", IntegerType(), True),
    StructField("email_id", IntegerType(), True),
    StructField("signup_action", StringType(), True)
])

texts_df = spark.createDataFrame(texts_data, schema=texts_schema)

# Show the DataFrames (optional)
emails_df.show()
texts_df.show()


+--------+-------+-------------------+
|email_id|user_id|        signup_date|
+--------+-------+-------------------+
|     125|   7771|2022-06-14 00:00:00|
|     236|   6950|2022-07-01 00:00:00|
|     433|   1052|2022-07-09 00:00:00|
+--------+-------+-------------------+

+-------+--------+-------------+
|text_id|email_id|signup_action|
+-------+--------+-------------+
|   6878|     125|    Confirmed|
|   6920|     236|Not Confirmed|
|   6994|     236|    Confirmed|
+-------+--------+-------------+



In [2]:
emails_df.join(texts_df,
               (emails_df.email_id==texts_df.email_id) & (texts_df.signup_action=='Confirmed'), 
               'left')\
            .agg( round(count(texts_df.email_id)/count(emails_df.email_id),2).alias('confirm_rate')).show()

+------------+
|confirm_rate|
+------------+
|        0.67|
+------------+



In [3]:
emails_df.createOrReplaceTempView('emails')
texts_df.createOrReplaceTempView('texts')
spark.sql('''
SELECT 
  DISTINCT round(count(texts.email_id)/COUNT(emails.email_id),2)
  as confirm_rate
FROM emails 
LEFT JOIN texts
ON
  emails.email_id=texts.email_id and signup_action = 'Confirmed' 
''').show()

+------------+
|confirm_rate|
+------------+
|        0.67|
+------------+

