## 1934. Confirmation Rate
### Table: Signups

| Column Name | Type     |
|-------------|----------|
| user_id     | int      |
| time_stamp  | datetime |

user_id is the column of unique values for this table.  
Each row contains information about the signup time for the user with ID user_id.

---

### Table: Confirmations

| Column Name | Type     |
|-------------|----------|
| user_id     | int      |
| time_stamp  | datetime |
| action      | ENUM     |

(user_id, time_stamp) is the primary key (combination of columns with unique values) for this table.  
user_id is a foreign key (reference column) to the Signups table.  
action is an ENUM (category) of the type ('confirmed', 'timeout')  
Each row of this table indicates that the user with ID user_id requested a confirmation message at time_stamp and that confirmation message was either confirmed ('confirmed') or expired without confirming ('timeout').

---

### Problem Statement

The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.

Write a solution to find the confirmation rate of each user.

Return the result table in any order.

---

### Example 1:

#### Input:

##### Signups table:

| user_id | time_stamp          |
|---------|---------------------|
| 3       | 2020-03-21 10:16:13 |
| 7       | 2020-01-04 13:57:59 |
| 2       | 2020-07-29 23:09:44 |
| 6       | 2020-12-09 10:39:37 |

##### Confirmations table:

| user_id | time_stamp          | action    |
|---------|---------------------|-----------|
| 3       | 2021-01-06 03:30:46 | timeout   |
| 3       | 2021-07-14 14:00:00 | timeout   |
| 7       | 2021-06-12 11:57:29 | confirmed |
| 7       | 2021-06-13 12:58:28 | confirmed |
| 7       | 2021-06-14 13:59:27 | confirmed |
| 2       | 2021-01-22 00:00:00 | confirmed |
| 2       | 2021-02-28 23:59:59 | timeout   |

---

#### Output:

| user_id | confirmation_rate |
|---------|-------------------|
| 6       | 0.00              |
| 3       | 0.00              |
| 7       | 1.00              |
| 2       | 0.50              |

---

### Explanation:

User 6 did not request any confirmation messages. The confirmation rate is 0.  
User 3 made 2 requests and both timed out. The confirmation rate is 0.  
User 7 made 3 requests and all were confirmed. The confirmation rate is 1.  
User 2 made 2 requests where one was confirmed and the other timed out. The confirmation rate is 1 / 2 = 0.5.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Define schemas
signups_schema = StructType([
    StructField("user_id", IntegerType(), True),
    StructField("time_stamp", TimestampType(), True)
])

confirmations_schema = StructType([
    StructField("user_id", IntegerType(), True),
    StructField("time_stamp", TimestampType(), True),
    StructField("action", StringType(), True)
])

# Sample data
from datetime import datetime

signups_data = [
    (3, datetime(2020, 3, 21, 10, 16, 13)),
    (7, datetime(2020, 1, 4, 13, 57, 59)),
    (2, datetime(2020, 7, 29, 23, 9, 44)),
    (6, datetime(2020, 12, 9, 10, 39, 37))
]

from datetime import datetime

confirmations_data = [
    (3, datetime(2021, 1, 6, 3, 30, 46), "timeout"),
    (3, datetime(2021, 7, 14, 14, 0, 0), "timeout"),
    (7, datetime(2021, 6, 12, 11, 57, 29), "confirmed"),
    (7, datetime(2021, 6, 13, 12, 58, 28), "confirmed"),
    (7, datetime(2021, 6, 14, 13, 59, 27), "confirmed"),
    (2, datetime(2021, 1, 22, 0, 0, 0), "confirmed"),
    (2, datetime(2021, 2, 28, 23, 59, 59), "timeout")
]

# Create DataFrames
signups_df = spark.createDataFrame(signups_data, schema=signups_schema)
confirmations_df = spark.createDataFrame(confirmations_data, schema=confirmations_schema)

# Register temp views
signups_df.createOrReplaceTempView("Signups")
confirmations_df.createOrReplaceTempView("Confirmations")


In [0]:
from pyspark.sql.functions import *
s_df = signups_df.selectExpr("user_id as u_id","time_stamp as s_ts")
c_df = confirmations_df.selectExpr("user_id as c_u_id","time_stamp as c_ts","action")
s_df.join(c_df,col("u_id") == col("c_u_id"),"left")\
    .groupBy(col("u_id"))\
        .agg(
            count("*").alias("total_request"),
            sum(when(col("action") == "confirmed",1).otherwise(0 )
                ).alias("confirmed_request")
            )\
            .withColumn(
                "confirmed_rate",
                round(
                    coalesce(col("confirmed_request"),lit(0))
                    /
                    when(
                        col("total_request") == 0,lit(1)).otherwise(col("total_request"))
                    ,2)
                ).selectExpr("u_id as user_id","confirmed_rate").display()

In [0]:
%sql
WITH cte AS (
    SELECT s.user_id,
           COUNT(*) AS total_requests,
           SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END) AS confirmed_requests
    FROM Signups s
    LEFT JOIN Confirmations c ON s.user_id = c.user_id
    GROUP BY s.user_id
)
SELECT user_id,
       ROUND(
           COALESCE(confirmed_requests * 1.0 / NULLIF(total_requests, 0), 0), 2
       )  AS confirmation_rate
FROM cte;

In [0]:

# SQL logic
query = """
SELECT 
    s.user_id,
    ROUND(
        COALESCE(SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END), 0) * 1.0 /
        COALESCE(COUNT(c.action), 0), 2
    ) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c
ON s.user_id = c.user_id
GROUP BY s.user_id
"""

result_df = spark.sql(query)
display(result_df)