## 1308. Running Total for Different Genders
### Table: Scores

| Column Name   | Type    |
|---------------|---------|
| player_name   | varchar |
| gender        | varchar |
| day           | date    |
| score_points  | int     |

(gender, day) is the primary key for this table.  
A competition is held between females team and males team.  
Each row of this table indicates that a player_name and with gender has scored score_point in someday.  
Gender is 'F' if the player is in females team and 'M' if the player is in males team.

Write an SQL query to find the total score for each gender at each day.

Order the result table by gender and day

---

### Scores table:

| player_name | gender | day        | score_points |
|-------------|--------|------------|--------------|
| Aron        | F      | 2020-01-01 | 17           |
| Alice       | F      | 2020-01-07 | 23           |
| Bajrang     | M      | 2020-01-07 | 7            |
| Khali       | M      | 2019-12-25 | 11           |
| Slaman      | M      | 2019-12-30 | 13           |
| Joe         | M      | 2019-12-31 | 3            |
| Jose        | M      | 2019-12-18 | 2            |
| Priya       | F      | 2019-12-31 | 23           |
| Priyanka    | F      | 2019-12-30 | 17           |

---

### Result table:

| gender | day        | total |
|--------|------------|-------|
| F      | 2019-12-30 | 17    |
| F      | 2019-12-31 | 40    |
| F      | 2020-01-01 | 57    |
| F      | 2020-01-07 | 80    |
| M      | 2019-12-18 | 2     |
| M      | 2019-12-25 | 13    |
| M      | 2019-12-30 | 26    |
| M      | 2019-12-31 | 29    |
| M      | 2020-01-07 | 36    |

---

**Explanation:**  
For females team:  
First day is 2019-12-30, Priyanka scored 17 points and the total score for the team is 17.  
Second day is 2019-12-31, Priya scored 23 points and the total score for the team is 40.  
Third day is 2020-01-01, Aron scored 17 points and the total score for the team is 57.  
Fourth day is 2020-01-07, Alice scored 23 points and the total score for the team is 80.  

For males team:  
First day is 2019-12-18, Jose scored 2 points and the total score for the team is 2.  
Second day is 2019-12-25, Khali scored 11 points and the total score for the team is 13.  
Third day is 2019-12-30, Slaman scored 13 points and the total score for the team is 26.  
Fourth day is 2019-12-31, Joe scored 3 points and the total score for the team is 29.  
Fifth day is 2020-01-07, Bajrang scored 7 points and the total score for the team is 36.


In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DateType, IntegerType
from pyspark.sql.functions import sum

# Start Spark session
spark = SparkSession.builder.appName("GenderScoreAggregation").getOrCreate()

# Define schema
schema = StructType([
    StructField("player_name", StringType(), True),
    StructField("gender", StringType(), True),
    StructField("day", DateType(), True),
    StructField("score_points", IntegerType(), True)
])

# Sample data
from datetime import datetime

# Original data with date strings
raw_data = [
    ("Aron", "F", "2020-01-01", 17),
    ("Alice", "F", "2020-01-07", 23),
    ("Bajrang", "M", "2020-01-07", 7),
    ("Khali", "M", "2019-12-25", 11),
    ("Slaman", "M", "2019-12-30", 13),
    ("Joe", "M", "2019-12-31", 3),
    ("Jose", "M", "2019-12-18", 2),
    ("Priya", "F", "2019-12-31", 23),
    ("Priyanka", "F", "2019-12-30", 17)
]

# Convert date strings to datetime.date objects
data = [(name, gender, datetime.strptime(day, "%Y-%m-%d").date(), score) for name, gender, day, score in raw_data]

# Preview the result
for row in data:
    print(row)
# Create DataFrame
df = spark.createDataFrame(data, schema)
df.createOrReplaceTempView("Scores")


In [0]:
from pyspark.sql.functions import *
from pyspark.sql.window import *

win_func = Window.partitionBy(col("gender")).orderBy(col("day").asc())
c_sum = sum(col("score_points")).over(win_func)
df = df.withColumn("total", c_sum).select("gender", "day", "total").display()



In [0]:
%sql
Select * from Scores;
with cte as (
 Select sum(score_points)over(partition by gender order by day asc) as total, * from scores 
)
select gender , Day , total from cte

In [0]:

# SQL logic
query = """
SELECT gender, day, SUM(score_points) AS total
FROM Scores
GROUP BY gender, day
ORDER BY gender, day
"""

# Execute and display
result = spark.sql(query)
display(result)