**Problem:**  
You have a DataFrame `df` with the following data:

| product_id | category | price  |
|------------|----------|--------|
| 1          | A        | 100.0  |
| 2          | B        | NaN    |
| 3          | A        | 150.0  |
| 4          | C        | 200.0  |
| 5          | B        | NaN    |

**Task:** Fill the missing prices in the `price` column with the average price of their respective categories. If the category is not `A` or `B`, set the price to 0.


In [9]:
import pandas as pd
import numpy as np

products = {
    'product_id': [1, 2, 3, 4, 5],
    'category': ['A', 'B', 'A', 'C', 'B'],
    'price': [100.0, np.nan, 150.0, 200.0, np.nan]
}

df = pd.DataFrame(products)

means_category = df.groupby('category')['price'].transform('mean')

df['price'] = df['price'].fillna(df['category'].map(means_category))

df['price'] = df['price'].fillna(0)

print(df)

   product_id category  price
0           1        A  100.0
1           2        B    0.0
2           3        A  150.0
3           4        C  200.0
4           5        B    0.0


**PySpark Problem:**

**Problem:**  
You have a PySpark DataFrame `df` with the following columns:

| transaction_id | transaction_date | amount |
|----------------|------------------|--------|
| 1              | 2024-08-01       | 500    |
| 2              | 2024-08-02       | 300    |
| 3              | 2024-08-01       | 200    |
| 4              | 2024-08-03       | 700    |

**Task:** Calculate the total transaction amount for each day, and then find the day with the highest total transaction amount.


In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession.builder.appName("Day 2").getOrCreate()

transactions = [
    (1, '2024-08-01', 500),
    (2, '2024-08-02', 300),
    (3, '2024-08-01', 200),
    (4, '2024-08-03', 700)
]

columns = ["transaction_id", "transaction_date", "amount"]

df = spark.createDataFrame(transactions, columns)

df = df.withColumn("transaction_date", col("transaction_date").cast(DateType()))

daily_totals = df.groupBy("transaction_date").agg(sum("amount").alias("total_amount"))

max_total_amount = daily_totals.agg(max("total_amount")).collect()[0][0]

print(max_total_amount)