1783. Grand Slam Titles
### Table: Players

| Column Name | Type    |
|-------------|---------|
| player_id   | int     |
| player_name | varchar |

player_id is the primary key for this table.  
Each row in this table contains the name and the ID of a tennis player.

---

### Table: Championships

| Column Name | Type |
|-------------|------|
| year        | int  |
| Wimbledon   | int  |
| Fr_open     | int  |
| US_open     | int  |
| Au_open     | int  |

year is the primary key for this table.  
Each row of this table containts the IDs of the players who won one each tennis tournament of the grand slam.

---

Write an SQL query to report the number of grand slam tournaments won by each player. Do not include the players who did not win any tournament.

Return the result table in any order.

---

### Players table:

| player_id | player_name |
|-----------|-------------|
| 1         | Nadal       |
| 2         | Federer     |
| 3         | Novak       |

### Championships table:

| year | Wimbledon | Fr_open | US_open | Au_open |
|------|-----------|---------|---------|---------|
| 2018 | 1         | 1       | 1       | 1       |
| 2019 | 1         | 1       | 2       | 2       |
| 2020 | 2         | 1       | 2       | 2       |

---

### Result table:

| player_id | player_name | grand_slams_count |
|-----------|-------------|-------------------|
| 2         | Federer     | 5                 |
| 1         | Nadal       | 7                 |

Player 1 (Nadal) won 7 titles: Wimbledon (2018, 2019), Fr_open (2018, 2019, 2020), US_open (2018), and Au_open (2018).  
Player 2 (Federer) won 5 titles: Wimbledon (2020), US_open (2019, 2020), and Au_open (2019, 2020).  
Player 3 (Novak) did not win anything, we did not include them in the result table.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import col, expr

# Start Spark session
spark = SparkSession.builder.appName("GrandSlamCount").getOrCreate()

# Define Players schema and data
players_schema = StructType([
    StructField("player_id", IntegerType(), True),
    StructField("player_name", StringType(), True)
])

players_data = [
    (1, "Nadal"),
    (2, "Federer"),
    (3, "Novak")
]

players_df = spark.createDataFrame(players_data, players_schema)
players_df.createOrReplaceTempView("Players")

# Define Championships schema and data
champ_schema = StructType([
    StructField("year", IntegerType(), True),
    StructField("Wimbledon", IntegerType(), True),
    StructField("Fr_open", IntegerType(), True),
    StructField("US_open", IntegerType(), True),
    StructField("Au_open", IntegerType(), True)
])

champ_data = [
    (2018, 1, 1, 1, 1),
    (2019, 1, 1, 2, 2),
    (2020, 2, 1, 2, 2)
]

champ_df = spark.createDataFrame(champ_data, champ_schema)
champ_df.createOrReplaceTempView("Championships")


In [0]:
from pyspark.sql.functions import *
df1 = champ_df.selectExpr("Wimbledon as player")
df2 = champ_df.selectExpr("Fr_open as player")
df3 = champ_df.selectExpr("US_open as player")
df4 = champ_df.selectExpr("Au_open as player")

df_player = df1.union(df2).union(df3).union(df4)
df_player.groupBy("player")\
    .agg(count("*").alias("won"))\
        .join(players_df , col("player_id") == col("player"),"inner")\
            .selectExpr("player_id" , "player_name"  ,"won as grand_slams_count")\
            .display()


In [0]:
%sql
SELECT Wimbledon AS player_id FROM Championships
    UNION ALL
    SELECT Fr_open FROM Championships
    UNION ALL
    SELECT US_open FROM Championships
    UNION ALL
    SELECT Au_open FROM Championships

In [0]:

# SQL logic
query = """
SELECT player_id, player_name, COUNT(*) AS grand_slams_count
FROM (
    SELECT Wimbledon AS player_id FROM Championships
    UNION ALL
    SELECT Fr_open FROM Championships
    UNION ALL
    SELECT US_open FROM Championships
    UNION ALL
    SELECT Au_open FROM Championships
) AS all_wins
JOIN Players USING (player_id)
GROUP BY player_id, player_name
"""

# Execute and display
result = spark.sql(query)
display(result)