# 🧠 Leetcode 511 — Game Play Analysis I (Databricks Edition)

---

## 📘 Problem Statement

### Table: Activity

| Column Name  | Type    |
|--------------|---------|
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |

- `(player_id, event_date)` is the primary key.
- This table shows the activity of players of some games.
- Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

---

## 🎯 Objective

Write a query to find the **first login date** for each player.

Return the result table in any order.

---

## 🧾 Example

### Input

**Activity Table**

| player_id | device_id | event_date | games_played |
|-----------|-----------|------------|--------------|
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |

### Output

| player_id | first_login |
|-----------|-------------|
| 1         | 2016-03-01  |
| 2         | 2017-06-25  |
| 3         | 2016-03-02  |

---

## 🧱 PySpark DataFrame Creation

```python
from pyspark.sql import Row

# Sample data
activity_data = [
    Row(player_id=1, device_id=2, event_date="2016-03-01", games_played=5),
    Row(player_id=1, device_id=2, event_date="2016-05-02", games_played=6),
    Row(player_id=2, device_id=3, event_date="2017-06-25", games_played=1),
    Row(player_id=3, device_id=1, event_date="2016-03-02", games_played=0),
    Row(player_id=3, device_id=4, event_date="2018-07-03", games_played=5)
]

# Create DataFrame
activity_df = spark.createDataFrame(activity_data)

# Register temp view
activity_df.createOrReplaceTempView("Activity")
```

---

## ✅ SQL Solution

```sql
SELECT player_id, MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id;
```

---

## 🧪 PySpark Solution

```python
from pyspark.sql.functions import min

result_df = activity_df.groupBy("player_id") \
                       .agg(min("event_date").alias("first_login"))

result_df.show()
```

---

📘 *This notebook is part of DataGym’s SQL-to-PySpark transition series. Want to build a reusable template for aggregation-based problems or player activity analysis? Let’s co-create it!*


In [0]:
from pyspark.sql.functions import *
from pyspark.sql.window import *
from pyspark.sql.types import StringType, IntegerType , StructType , StructField , DateType
from datetime import date



In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DateType
from pyspark.sql.functions import min

# 1️⃣ Sample Data
data = [
    (1, 2, date(2016, 3, 1), 5),
    (1, 2, date(2016, 5, 2), 6),
    (2, 3, date(2017, 6, 25), 1),
    (3, 1, date(2016, 3, 2), 0),
    (3, 4, date(2018, 7, 3), 5)
]


# 2️⃣ Schema Definition
schema = StructType([
    StructField("player_id", IntegerType(), True),
    StructField("device_id", IntegerType(), True),
    StructField("event_date", DateType(), True),
    StructField("games_played", IntegerType(), True)
])

# 3️⃣ Create DataFrame
df = spark.createDataFrame(data, schema)

# 4️⃣ Register Temp View
df.createOrReplaceTempView("Activity")

# 5️⃣ SQL Query: First Login Date per Player
result = spark.sql("""
    SELECT player_id, MIN(event_date) AS first_login
    FROM Activity
    GROUP BY player_id
""")

# 6️⃣ Show Result
result.show()

In [0]:
df.display()

In [0]:
v_window = Window.partitionBy(col("player_id")).orderBy(col("event_date").asc())
v_row =row_number().over(v_window)
df.withColumn("row",v_row).filter(col("row")==1).selectExpr("player_id as player_id ","event_date as first_login").display()