#### Score Board
##### To make dataframe from csv file first create spark session 
  - You can create spark session ```SparkSession.builder.appName("BatterStats").getOrCreate()```
  - read csv file ```spark.read.csv("")```
##### For batter score board by team wise, groupby team and batter and aggerigate runs_batter and ballno
  - ```groupBy("team", "batter").agg(
    sum("runs_batter").alias("total_runs"),
    count("ballno").alias("total_balls")```
  - alias method shows the name that you want to display 
##### Use window function for partitioning dataframe 
  - Here, partition is by team ```Window.partitionBy("team")```
  - The ```row_number()``` function is a window function that assigns a unique sequential number to each row within its partition. It starts from 1 for the first row in each partition and increments by 1 for subsequent rows.
  - ````row_number().over(windowSpec)```` expression calculates the row number for each row based on the window specification and assigns it to the "rank" column.
##### Filter the dataframe 
  - ```batter_stats_with_rank["rank"] <= 5``` creates a Boolean expression that checks if each value in the "rank" column is less than or equal to 5. It returns a Boolean column with True for rows that satisfy the condition and False for rows that don't.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, count, row_number
from pyspark.sql.window import Window

spark = SparkSession.builder.appName("BatterStats").getOrCreate()
df = spark.read.csv("/FileStore/tables/cricket/335982.csv", header=True, inferSchema=True)

batter_stats = df.groupBy("team", "batter").agg(
    sum("runs_batter").alias("total_runs"),
    count("ballno").alias("total_balls")
)
windowSpec = Window.partitionBy("team").orderBy(batter_stats["total_runs"].desc())
batter_stats_with_rank = batter_stats.withColumn("rank", row_number().over(windowSpec))
top_batters = batter_stats_with_rank.filter(batter_stats_with_rank["rank"] <= 5)
teams = top_batters.select("team").distinct().collect()

for team in teams:
    team_name = team[0]
    team_df = top_batters.filter(top_batters.team == team_name).drop("rank")
    print(f"Team: {team_name}")
    team_df.show(truncate=False)


Team: Kolkata Knight Riders
+---------------------+---------------+----------+-----------+
|team                 |batter         |total_runs|total_balls|
+---------------------+---------------+----------+-----------+
|Kolkata Knight Riders|BB McCullum    |158       |77         |
|Kolkata Knight Riders|RT Ponting     |20        |20         |
|Kolkata Knight Riders|DJ Hussey      |12        |12         |
|Kolkata Knight Riders|SC Ganguly     |10        |12         |
|Kolkata Knight Riders|Mohammad Hafeez|5         |3          |
+---------------------+---------------+----------+-----------+

Team: Royal Challengers Bangalore
+---------------------------+----------+----------+-----------+
|team                       |batter    |total_runs|total_balls|
+---------------------------+----------+----------+-----------+
|Royal Challengers Bangalore|P Kumar   |18        |17         |
|Royal Challengers Bangalore|AA Noffke |9         |12         |
|Royal Challengers Bangalore|JH Kallis |8         