## Running total of stock prices

You are given a dataset containing daily stock prices. Write a PySpark program to calculate the running total of stock prices for each stock symbol in the dataset.


data = [ ("2024-09-01", "AAPL", 150), ("2024-09-02", "AAPL", 160), 
("2024-09-03", "AAPL", 170), ("2024-09-01", "GOOGL", 1200),
 ("2024-09-02", "GOOGL", 1250), ("2024-09-03", "GOOGL", 1300) ] 


**output:**
| date       | symbol | price | cumulative_price |
|------------|--------|-------|------------------|
| 2024-09-01 | AAPL   | 150   | 150              |
| 2024-09-02 | AAPL   | 160   | 310              |
| 2024-09-03 | AAPL   | 170   | 480              |
| 2024-09-01 | GOOGL  | 1200  | 1200             |
| 2024-09-02 | GOOGL  | 1250  | 2450             |
| 2024-09-03 | GOOGL  | 1300  | 3750             |

In [0]:
# sample data
data = [ ("2024-09-01", "AAPL", 150), ("2024-09-02", "AAPL", 160), ("2024-09-03", "AAPL", 170), ("2024-09-01", "GOOGL", 1200), ("2024-09-02", "GOOGL", 1250), ("2024-09-03", "GOOGL", 1300) ] 

columns = ["date", "symbol", "price"]

df = spark.createDataFrame(data, columns)
df.show()

+----------+------+-----+
|      date|symbol|price|
+----------+------+-----+
|2024-09-01|  AAPL|  150|
|2024-09-02|  AAPL|  160|
|2024-09-03|  AAPL|  170|
|2024-09-01| GOOGL| 1200|
|2024-09-02| GOOGL| 1250|
|2024-09-03| GOOGL| 1300|
+----------+------+-----+



In [0]:
from pyspark.sql.window import Window
from pyspark.sql.functions import col, sum 

# The built-in Python sum() is not designed for PySpark columns. Instead, use pyspark.sql.functions.sum(), which correctly handles column operations in a distributed manner

# default window frame is rowsBetween(Window.unboundedPreceding, Window.currentRow)
window = Window.partitionBy(col("symbol")).orderBy(col("price"))

stock_df = df.withColumn("cumulative_price", sum(col("price")).over(window))
stock_df.show()

+----------+------+-----+----------------+
|      date|symbol|price|cumulative_price|
+----------+------+-----+----------------+
|2024-09-01|  AAPL|  150|             150|
|2024-09-02|  AAPL|  160|             310|
|2024-09-03|  AAPL|  170|             480|
|2024-09-01| GOOGL| 1200|            1200|
|2024-09-02| GOOGL| 1250|            2450|
|2024-09-03| GOOGL| 1300|            3750|
+----------+------+-----+----------------+

