## 2066 - Account Balance
### Table: Transactions

| Column Name | Type |
|-------------|------|
| account_id  | int  |
| day         | date |
| type        | ENUM |
| amount      | int  |

(account_id, day) is the primary key for this table.  
Each row contains information about one transaction, including the transaction type, the day it occurred on, and the amount.  
type is an ENUM of the type ('Deposit','Withdraw')  

Write an SQL query to report the balance of each user after each transaction. You may assume that the balance of each account before any transaction is 0 and that the balance will never be below 0 at any moment.

Return the result table in ascending order by account_id, then by day in case of a tie.

---

### Transactions table:

| account_id | day        | type     | amount |
|------------|------------|----------|--------|
| 1          | 2021-11-07 | Deposit  | 2000   |
| 1          | 2021-11-09 | Withdraw | 1000   |
| 1          | 2021-11-11 | Deposit  | 3000   |
| 2          | 2021-12-07 | Deposit  | 7000   |
| 2          | 2021-12-12 | Withdraw | 7000   |

---

### Output:

| account_id | day        | balance |
|------------|------------|---------|
| 1          | 2021-11-07 | 2000    |
| 1          | 2021-11-09 | 1000    |
| 1          | 2021-11-11 | 4000    |
| 2          | 2021-12-07 | 7000    |
| 2          | 2021-12-12 | 0       |

---

**Explanation:**  
Account 1:  
- Initial balance is 0.  
- 2021-11-07 --> deposit 2000. Balance is 0 + 2000 = 2000.  
- 2021-11-09 --> withdraw 1000. Balance is 2000 - 1000 = 1000.  
- 2021-11-11 --> deposit 3000. Balance is 1000 + 3000 = 4000.  

Account 2:  
- Initial balance is 0.  
- 2021-12-07 --> deposit 7000. Balance is 0 + 7000 = 7000.  
- 2021-12-12 --> withdraw 7000. Balance is 7000 - 7000 = 0.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType
from pyspark.sql.functions import col, when, sum
from pyspark.sql.window import Window
from datetime import datetime

# Start Spark session
spark = SparkSession.builder.appName("TransactionBalance").getOrCreate()

# Define schema
schema = StructType([
    StructField("account_id", IntegerType(), True),
    StructField("day", DateType(), True),
    StructField("type", StringType(), True),
    StructField("amount", IntegerType(), True)
])

# Sample data
raw_data = [
    (1, "2021-11-07", "Deposit", 2000),
    (1, "2021-11-09", "Withdraw", 1000),
    (1, "2021-11-11", "Deposit", 3000),
    (2, "2021-12-07", "Deposit", 7000),
    (2, "2021-12-12", "Withdraw", 7000)
]

# ✅ Convert date strings to datetime.datetime objects
data = [(aid, datetime.strptime(day, "%Y-%m-%d"), ttype, amt) for aid, day, ttype, amt in raw_data]

# Create DataFrame
df = spark.createDataFrame(data, schema)
df.createOrReplaceTempView("Transactions")


In [0]:
df_2 = df.withColumn(
    "balance_2", when(col("type") == "Deposit", col("amount")).otherwise(-col("amount"))
).select("account_id", "day", "balance_2")
win_spec = Window.partitionBy("account_id").orderBy("day")
c_sum = sum("balance_2").over(win_spec)
df_2 = (
    df_2.withColumn("balance", c_sum)
    .select("account_id", "day", "balance")
    .orderBy(["account_id", "day"], ascending=[1, 1])
    .display()
)

In [0]:
%sql
with cte as (
  select case 
    when type   = 'Deposit' then amount 
    else -1 * amount
    end as balance
    , account_id ,day 
      from Transactions
),cte_2 as (

  select  account_id ,day ,sum(balance)over(partition by account_id order by day ) as balance from cte
)
Select * from cte_2
order by account_id ,day

In [0]:

# Transform amount based on type
df_transformed = df.withColumn("net_amount", when(col("type") == "Deposit", col("amount")).otherwise(-col("amount")))

# Define window for cumulative sum
window_spec = Window.partitionBy("account_id").orderBy("day").rowsBetween(Window.unboundedPreceding, Window.currentRow)

# Calculate balance
df_result = df_transformed.withColumn("balance", sum("net_amount").over(window_spec)) \
                          .select("account_id", "day", "balance") \
                          .orderBy("account_id", "day")

# Display result
display(df_result)