## 1050. Actors and Directors Who Cooperated At Least Three Times
### Table: ActorDirector

| Column Name | Type |
|-------------|------|
| actor_id    | int  |
| director_id | int  |
| timestamp   | int  |

**Primary Key**: `timestamp` — each row represents a unique cooperation event between an actor and a director.

**Description**:
Each row in the `ActorDirector` table records a collaboration between an actor and a director at a specific timestamp.

---

### 🧠 Business Logic:
Find all `(actor_id, director_id)` pairs where the actor has cooperated with the director **at least three times**.

---

### ✅ Expected Output Format:

#### Input:
| actor_id | director_id | timestamp |
|----------|-------------|-----------|
| 1        | 1           | 0         |
| 1        | 1           | 1         |
| 1        | 1           | 2         |
| 1        | 2           | 3         |
| 1        | 2           | 4         |
| 2        | 1           | 5         |
| 2        | 1           | 6         |

#### Output:
| actor_id | director_id |
|----------|-------------|
| 1        | 1           |

**Explanation**:  
Only the pair `(1, 1)` has collaborated **three times**. All other pairs have fewer than three collaborations.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType

# Sample data
data = [
    (1, 1, 0),
    (1, 1, 1),
    (1, 1, 2),
    (1, 2, 3),
    (1, 2, 4),
    (2, 1, 5),
    (2, 1, 6)
]

# Define schema
schema = StructType([
    StructField("actor_id", IntegerType(), False),
    StructField("director_id", IntegerType(), False),
    StructField("timestamp", IntegerType(), False)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Register Temp View
df.createOrReplaceTempView("ActorDirector")

# SQL logic to find pairs with at least 3 collaborations
result = spark.sql("""
    SELECT actor_id, director_id
    FROM ActorDirector
    GROUP BY actor_id, director_id
    HAVING COUNT(*) >= 3
""")

# Display result
display(result)

In [0]:
%sql
with cte as (
Select actor_id , director_id , COUNT(*) AS COLLAB from ActorDirector GROUP BY  actor_id , director_id 
)
select actor_id , director_id  from cte WHERE COLLAB >=3

In [0]:
%sql
Select actor_id , director_id AS COLLAB from ActorDirector GROUP BY  actor_id , director_id HAVING COUNT(*) >= 3


In [0]:
from pyspark.sql.functions import *

df.groupBy(col("actor_id"),col("director_id")).agg(count("*").alias("Collab"))\
    .filter(col("collab")>=3)\
    .select("actor_id","director_id" )\
    .display()
