## 1113. Reported Posts
### Table: Actions

| Column Name   | Type    |
|---------------|---------|
| user_id       | int     |
| post_id       | int     |
| action_date   | date    |
| action        | enum    |
| extra         | varchar |

**Primary Key**: None — this table may contain duplicate rows.

**Description**:
The `Actions` table records user interactions with posts.  
The `action` column is an ENUM type with values:  
`('view', 'like', 'reaction', 'comment', 'report', 'share')`  
The `extra` column contains optional metadata like report reasons or reaction types.

---

### 🧠 Business Logic:
There is no primary key for this table, it may have duplicate rows.

The action column is an ENUM type of ('view', 'like', 'reaction', 'comment', 'report', 'share').

The extra column has optional information about the action such as a reason for report or a type of reaction. 

Write an SQL query that reports the number of posts reported yesterday for each report reason. Assume today is 2019-07-05.


---

### ✅ Expected Output Format:

#### Input:
| user_id | post_id | action_date | action | extra  |
|---------|---------|-------------|--------|--------|
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 4       | 2019-07-04  | view   | null   |
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-04  | view   | null   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | view   | null   |
| 5       | 5       | 2019-07-04  | report | racism |

#### Output:
| report_reason | report_count |
|---------------|--------------|
| spam          | 1            |
| racism        | 2            |

**Explanation**:  
- `spam`: post_id 4 was reported → count = 1  
- `racism`: post_ids 2 and 5 were reported → count = 2  
- Only distinct post_ids are counted per reason

In [0]:
    from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType
from pyspark.sql.functions import to_date, col

# Sample data with date as string
data = [
    (1, 1, "2019-07-01", "view", None),
    (1, 1, "2019-07-01", "like", None),
    (1, 1, "2019-07-01", "share", None),
    (2, 4, "2019-07-04", "view", None),
    (2, 4, "2019-07-04", "report", "spam"),
    (3, 4, "2019-07-04", "view", None),
    (3, 4, "2019-07-04", "report", "spam"),
    (4, 3, "2019-07-02", "view", None),
    (4, 3, "2019-07-02", "report", "spam"),
    (5, 2, "2019-07-04", "view", None),
    (5, 2, "2019-07-04", "report", "racism"),
    (5, 5, "2019-07-04", "view", None),
    (5, 5, "2019-07-04", "report", "racism")
]

# Define schema with action_date as StringType first
schema = StructType([
    StructField("user_id", IntegerType(), True),
    StructField("post_id", IntegerType(), True),
    StructField("action_date", StringType(), True),  # initially string
    StructField("action", StringType(), True),
    StructField("extra", StringType(), True)
])

# Create DataFrame
df_raw = spark.createDataFrame(data, schema)

# Convert action_date to proper DateType
df = df_raw.withColumn("action_date", to_date(col("action_date"), "yyyy-MM-dd"))

# Register Temp View
df.createOrReplaceTempView("Actions")

# SQL logic to count distinct post_ids reported on 2019-07-04
result = spark.sql("""
    SELECT extra AS report_reason,
           COUNT(DISTINCT post_id) AS report_count
    FROM Actions
    WHERE action = 'report'
      AND action_date = DATE('2019-07-04')
    GROUP BY extra
""")

# Display result
display(result)

In [0]:
from pyspark.sql.functions import *
df.filter((col("action") == "report") & (col("action_date") == "2019-07-04"))\
    .groupBy(col("extra"))\
        .agg(countDistinct("post_id"))\
            .display()