## 534. Game Play Analysis III
### Table: Activity

| Column Name  | Type |
|--------------|------|
| player_id    | int  |
| device_id    | int  |
| event_date   | date |
| games_played | int  |

(player_id, event_date) is the primary key of this table.  
This table shows the activity of players of some game.  
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

---

Write an SQL query that reports for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.

---

### Activity table:

| player_id | device_id | event_date | games_played |
|-----------|-----------|------------|--------------|
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |

---

### Result table:

| player_id | event_date | games_played_so_far |
|-----------|------------|---------------------|
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |

---

**Explanation:**  
For the player with id 1, 5 + 6 = 11 games played by 2016-05-02, and 5 + 6 + 1 = 12 games played by 2017-06-25.  
For the player with id 3, 0 + 5 = 5 games played by 2018-07-03.  
Note that for each player we only care about the days when the player logged in.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DateType
from pyspark.sql.functions import col, sum as sum_, to_date
from pyspark.sql.window import Window

# Start Spark session
spark = SparkSession.builder.appName("GamesPlayedSoFar").getOrCreate()

# Define schema
schema = StructType([
    StructField("player_id", IntegerType(), True),
    StructField("device_id", IntegerType(), True),
    StructField("event_date", DateType(), True),
    StructField("games_played", IntegerType(), True)
])

# Sample data
from datetime import date
data = [
    (1, 2, date(2016, 3, 1), 5),
    (1, 2, date(2016, 5, 2), 6),
    (1, 3, date(2017, 6, 25), 1),
    (3, 1, date(2016, 3, 2), 0),
    (3, 4, date(2018, 7, 3), 5)
]

# Create DataFrame
df = spark.createDataFrame(data, schema)

# create temp view 
df.createOrReplaceTempView("Activity")


In [0]:
from pyspark.sql.functions import *
from pyspark.sql.window import *

#winspec
win_spec = Window.partitionBy(col("player_id")).orderBy(col("event_date"))
c_sum= sum(col("games_played")).over(win_spec)
df.withColumn("games_played_so_far",c_sum).select("player_id","event_date","games_played_so_far").display()

In [0]:
%sql
-- note below will give error in sql in data bricks but will work in sql server.
--Select player_id , event_date  , sum(games_played).over(partition by player_id  order  by event_date ) as games_played_so_far  from Activity 
with cte as (
  Select sum(games_played)over(partition by player_id  order  by event_date ) as games_played_so_far , player_id , event_date    from Activity 
)
select  player_id , event_date  ,games_played_so_far from cte 

In [0]:

# Define window spec
win_spec = Window.partitionBy("player_id").orderBy("event_date").rowsBetween(Window.unboundedPreceding, Window.currentRow)

# Compute cumulative sum
df_result = df.withColumn("games_played_so_far", sum_("games_played").over(win_spec)) \
              .select("player_id", "event_date", "games_played_so_far")

# Display result
display(df_result)