Problem Statement:

To find the 2nd Wednesday of the current month and display it in Spark SQL with the columns DayDate, Weekday, and RowNum, you can use the following SQL query. This query will generate all dates for the current month, identify Wednesdays, rank them, and select the 2nd Wednesday...

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, date_format, expr, row_number
from pyspark.sql.window import Window
from datetime import datetime, timedelta

# Initialize Spark session
spark = SparkSession.builder \
    .appName("Find 2nd Wednesday") \
    .getOrCreate()

# Define the current date and the first and last dates of the current month
now = datetime.now()
start_date = datetime(now.year, now.month, 1)
end_date = (start_date + timedelta(days=31)).replace(day=1) - timedelta(days=1)

# Generate a list of dates for the current month
date_list = [(start_date + timedelta(days=x)).strftime('%Y-%m-%d') for x in range((end_date - start_date).days + 1)]

# Create DataFrame from the list
date_df = spark.createDataFrame([(date,) for date in date_list], ["DayDate"])

# Add a column for the day of the week as a string
date_df = date_df.withColumn("Weekday", date_format(col("DayDate"), "EEEE"))
# Show the DataFrame to verify
date_df.display()

DayDate,Weekday
2024-08-01,Thursday
2024-08-02,Friday
2024-08-03,Saturday
2024-08-04,Sunday
2024-08-05,Monday
2024-08-06,Tuesday
2024-08-07,Wednesday
2024-08-08,Thursday
2024-08-09,Friday
2024-08-10,Saturday


In [0]:
# Add a column for the day of the week number (1 = Monday, ..., 7 = Sunday)
date_df = date_df.withColumn("WeekDayNum", expr("dayofweek(to_date(DayDate, 'yyyy-MM-dd'))"))

# Filter for Wednesdays (WeekDayNum = 4, since dayofweek function returns 1 for Sunday)
wednesdays_df = date_df.filter(col("WeekDayNum") == 4)  # 4 represents Wednesday

# Rank Wednesdays and select the 2nd one
window_spec = Window.orderBy("DayDate")
wednesdays_ranked_df = wednesdays_df.withColumn("RowNum", row_number().over(window_spec))

# Select the 2nd Wednesday
second_wednesday_df = wednesdays_ranked_df.filter(col("RowNum") == 2)

# Show the result
second_wednesday_df.select("DayDate", "Weekday", "RowNum").display()

DayDate,Weekday,RowNum
2024-08-14,Wednesday,2


In [0]:
date_df_with_weekday.createOrReplaceTempView("DayDate")

In [0]:
%sql
-- Create a temporary view with all dates for the current month
-- Create a temporary view with all dates for the current month
WITH DateSeries AS (
    -- Generate all dates for the current month
    SELECT 
        DATE_ADD(DATE_TRUNC('month', CURRENT_DATE()), seq) AS DayDate
    FROM (
        SELECT posexplode(sequence(0, DAY(LAST_DAY(CURRENT_DATE())) - 1)) AS (seq, day)
    )
),
Wednesdays AS (
    -- Filter out only Wednesdays
    SELECT 
        DayDate,
        CASE DAYOFWEEK(DayDate)
            WHEN 1 THEN 'Sunday'
            WHEN 2 THEN 'Monday'
            WHEN 3 THEN 'Tuesday'
            WHEN 4 THEN 'Wednesday'
            WHEN 5 THEN 'Thursday'
            WHEN 6 THEN 'Friday'
            WHEN 7 THEN 'Saturday'
        END AS Weekday
    FROM DateSeries
    WHERE DAYOFWEEK(DayDate) = 4 -- 4 represents Wednesday (where Sunday = 1)
),
RankedWednesdays AS (
    -- Rank the Wednesdays
    SELECT 
        DayDate,
        Weekday,
        ROW_NUMBER() OVER (ORDER BY DayDate) AS RowNum
    FROM Wednesdays
)
-- Select the 2nd Wednesday
SELECT 
    DayDate,
    Weekday,
    RowNum
FROM RankedWednesdays
WHERE RowNum = 2;


DayDate,Weekday,RowNum
2024-08-14,Wednesday,2


Explanation:

Generate Dates:

DATE_TRUNC('month', CURRENT_DATE()): Gets the first day of the current month.
sequence(0, DAY(LAST_DAY(CURRENT_DATE())) - 1): Creates a sequence of integers representing days from 0 to the last day of the month.
DATE_ADD: Adds these days to the start date to generate all dates for the current month.
Identify Wednesdays:

DAYOFWEEK: Determines the weekday number. This function returns 1 for Sunday, 2 for Monday, ..., 7 for Saturday. Thus, 4 represents Wednesday.
CASE: Converts the weekday number to the day name.
Rank and Select:

ROW_NUMBER(): Ranks the Wednesdays in ascending order of date.
WHERE RowNum = 2: Filters to select the 2nd Wednesday.
Notes
Day of the Week: Adjust the CASE expression and DAYOFWEEK logic based on your Spark SQL configuration if needed.
Temporary View: If you run this in a notebook or interactive SQL environment, ensure you have the necessary permissions to create and query temporary views.
This query will return the 2nd Wednesday of the current month, showing the DayDate, the day of the week as Weekday, and the RowNum.