**Problem:**  
You have a DataFrame `df` with the following data:

| user_id | last_login          | subscription_end_date |
|---------|---------------------|-----------------------|
| 1       | 2024-08-10 10:00:00 | 2024-08-20            |
| 2       | NaT                 | 2024-08-15            |
| 3       | 2024-08-12 12:00:00 | NaT                   |

**Task:** Calculate the number of days between the `last_login` and the `subscription_end_date` for each user. Fill in the missing values for both columns with appropriate default values (e.g., 0 days for differences).


In [None]:
import pandas as pd

data = {
    'user_id': [1, 2, 3],
    'last_login': ['2024-08-10 10:00:00', pd.NaT, '2024-08-12 12:00:00'],
    'subscription_end_date': ['2024-08-20', '2024-08-15', pd.NaT]
}
df = pd.DataFrame(data)

df['last_login'] = pd.to_datetime(df['last_login'])
df['subscription_end_date'] = pd.to_datetime(df['subscription_end_date'])

df['last_login'] = df['last_login'].fillna(df['subscription_end_date'])
df['subscription_end_date'] = df['subscription_end_date'].fillna(df['last_login'])

df['days_difference'] = (df['subscription_end_date'] - df['last_login']).dt.days

df['days_difference'] = df['days_difference'].fillna(0).astype(int)

print(df)

**Problem:**  
You have a PySpark DataFrame `df` with the following columns:

| student_id | course  | grade |
|------------|---------|-------|
| 1          | Math    | A     |
| 2          | Science | B     |
| 1          | Math    | A     |
| 3          | History | C     |
| 2          | Science | B     |

**Task:** Write a PySpark query to count how many distinct students achieved each grade, and display a custom label for each grade (e.g., "Excellent" for A, "Good" for B, etc.).


In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession.builder.appName("Day_3").getOrCreate()

data = [
    (1, "Math", "A"),
    (2, "Science", "B"),
    (1, "Math", "A"),
    (3, "History", "C"),
    (2, "Science", "B")
]

columns = ["student_id", "course", "grade"]

df = spark.createDataFrame(data, columns)

df_unique = df.dropDuplicates(['student_id', 'grade'])

df_counts = df_unique.groupBy('grade').agg(countDistinct('student_id').alias('num_students'))

df_labeled = df_counts.withColumn(
    'grade_label',
    when(col('grade') == 'A', 'Excellent')
    .when(col('grade') == 'B', 'Good')
    .when(col('grade') == 'C', 'Average')
    .when(col('grade') == 'D', 'Below Avergae')
    .when(col('grade') == 'E', 'Poor')
    .when(col('grade') == 'F', 'Fail')
    .otherwise('Unknown')
)

result_df = df_labeled.select(col('grade_label'), col('num_students'))

result_df.show()

