# App Click-through Rate (CTR)  
**Facebook SQL Interview Question**

---

## Question

Assume you have an `events` table on Facebook app analytics.  
Write a query to calculate the **click-through rate (CTR)** for each app in **2022** and round the results to **2 decimal places**.

---

## Definition

**Click-through rate (CTR)** is calculated as:


> 💡 Note: Use `100.0` to avoid integer division and ensure the result is a float.

---

## Schema

### `events` Table:
| Column Name | Type      |
|-------------|-----------|
| app_id      | integer   |
| event_type  | string    |
| timestamp   | datetime  |

---

## Example Input:
| app_id | event_type  | timestamp           |
|--------|-------------|---------------------|
| 123    | impression  | 07/18/2022 11:36:12 |
| 123    | impression  | 07/18/2022 11:37:12 |
| 123    | click       | 07/18/2022 11:37:42 |
| 234    | impression  | 07/18/2022 14:15:12 |
| 234    | click       | 07/18/2022 14:16:12 |

---

## Example Output:
| app_id | ctr   |
|--------|--------|
| 123    | 50.00  |
| 234    | 100.00 |

---

## Explanation

- **App 123**:  
  - Impressions: 2  
  - Clicks: 1  
  - CTR = (1 / 2) * 100.0 = **50.00**

- **App 234**:  
  - Impressions: 1  
  - Clicks: 1  
  - CTR = (1 / 1) * 100.0 = **100.00**

---


In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from datetime import datetime
from pyspark.sql.functions import *

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("AppCTR").getOrCreate()

# Define schema for events table
events_schema = StructType([
    StructField("app_id", IntegerType(), True),
    StructField("event_type", StringType(), True),
    StructField("timestamp", TimestampType(), True)
])

# Sample data based on the question
events_data = [
    (123, "impression", datetime(2022, 7, 18, 11, 36, 12)),
    (123, "impression", datetime(2022, 7, 18, 11, 37, 12)),
    (123, "click", datetime(2022, 7, 18, 11, 37, 42)),
    (234, "impression", datetime(2022, 7, 18, 14, 15, 12)),
    (234, "click", datetime(2022, 7, 18, 14, 16, 12))
]

# Create the DataFrame
events_df = spark.createDataFrame(events_data, schema=events_schema)

# Show the DataFrame
events_df.show()


+------+----------+-------------------+
|app_id|event_type|          timestamp|
+------+----------+-------------------+
|   123|impression|2022-07-18 11:36:12|
|   123|impression|2022-07-18 11:37:12|
|   123|     click|2022-07-18 11:37:42|
|   234|impression|2022-07-18 14:15:12|
|   234|     click|2022-07-18 14:16:12|
+------+----------+-------------------+



In [13]:
events_df.groupBy('app_id')\
    .agg(
        round(100*sum(when(col('event_type')=='click',1))/sum(when(col('event_type')=='impression',1)),2).alias('ctr'))\
    .show()

+------+-----+
|app_id|  ctr|
+------+-----+
|   234|100.0|
|   123| 50.0|
+------+-----+



In [15]:
events_df.createOrReplaceTempView('events_df')
spark.sql(
""" 
select 
    app_id,
    round( 100*sum(case when event_type='click' then 1 end)/sum(case when event_type='impression' then 1 end),2 ) as ctr
    from events_df
    group by app_id
"""
).show()

+------+-----+
|app_id|  ctr|
+------+-----+
|   234|100.0|
|   123| 50.0|
+------+-----+

