### What is agg() in PySpark?
The agg() function in PySpark is used to apply one or more aggregate functions (like sum, avg, min, max, count, etc.) to DataFrame columns.

It’s often used after groupBy(), but you can also use it without grouping to aggregate the entire DataFrame.

In [0]:
from pyspark.sql.functions import sum, avg, max, min, count

# Sample data
data = [
    ("A", 100, 10),
    ("A", 200, 20),
    ("B", 300, 30),
    ("B", 400, 40),
    ("C", 500, 50)
]

# Create DataFrame
df = spark.createDataFrame(data, ["Category", "Sales", "Quantity"])

df.display()

In [0]:
df.agg(
    sum("Sales").alias("Total_Sales"),
    avg("Quantity").alias("Average_Quantity")
).display()


In [0]:
df.groupBy("Category").agg(
    sum("Sales").alias("Total_Sales"),
    avg("Quantity").alias("Average_Quantity"),
    max("Sales").alias("Max_Sales"),
    min("Sales").alias("Min_Sales")
).display()


In [0]:
df.groupBy("Category").agg(
    {"Sales": "sum", "Quantity": "avg"}
).display()


### ✅ Key Points:

  agg() can be used with or without groupBy().

  Supports multiple aggregation functions in one call.

  Can alias columns for readability.

  Accepts both function calls and dictionary syntax.