Problem Statement:
    
You are tasked with analyzing credit card issuance data to identify the range of issued amounts for each card. Specifically, calculate the difference between the maximum and minimum issuance amounts (issued_amount) for each credit card (card_name). This analysis should group data by the card name and compute the range using SQL-like operations in Spark SQL.

In [0]:
from pyspark.sql.types import *

# Define schema
schema = StructType([
    StructField("card_name", StringType(), nullable=False),
    StructField("issued_amount", IntegerType(), nullable=False),
    StructField("issue_month", IntegerType(), nullable=False),
    StructField("issue_year", IntegerType(), nullable=False)
])

# Create data
data = [
    ("Chase Freedom Flex", 55000, 1, 2021),
    ("Chase Freedom Flex", 60000, 2, 2021),
    ("Chase Freedom Flex", 65000, 3, 2021),
    ("Chase Freedom Flex", 70000, 4, 2021),
    ("Chase Sapphire Reserve", 170000, 1, 2021),
    ("Chase Sapphire Reserve", 175000, 2, 2021),
    ("Chase Sapphire Reserve", 180000, 3, 2021),
    ("Chase Sapphire Reserve", 185000, 4, 2021)
]

# Create DataFrame
df = spark.createDataFrame(data, schema=schema)

# display the DataFrame
df.display()


card_name,issued_amount,issue_month,issue_year
Chase Freedom Flex,55000,1,2021
Chase Freedom Flex,60000,2,2021
Chase Freedom Flex,65000,3,2021
Chase Freedom Flex,70000,4,2021
Chase Sapphire Reserve,170000,1,2021
Chase Sapphire Reserve,175000,2,2021
Chase Sapphire Reserve,180000,3,2021
Chase Sapphire Reserve,185000,4,2021


In [0]:
from pyspark.sql import functions as F

# Calculate max and min issued_amount for each card_name
cte1 = df.groupBy("card_name").agg(
        F.max("issued_amount").alias("max"),
        F.min("issued_amount").alias("min")
    )

# Calculate the difference
result = cte1.withColumn("difference", cte1["max"] - cte1["min"])

# Show the result
result.select("card_name", "difference").display()


card_name,difference
Chase Freedom Flex,15000
Chase Sapphire Reserve,15000


In [0]:
# Register DataFrame as a temporary view
df.createOrReplaceTempView("credit_card_issuance")

In [0]:

# Use SQL query
result_sql = spark.sql("""
    WITH cte1 AS (
        SELECT card_name, MAX(issued_amount) AS max, MIN(issued_amount) AS min
        FROM credit_card_issuance
        GROUP BY card_name
    )
    SELECT card_name, (max - min) AS difference
    FROM cte1
""")

# Show the result
result_sql.display()


card_name,difference
Chase Freedom Flex,15000
Chase Sapphire Reserve,15000


Explanation:

CTE (cte1):

Computes the maximum and minimum issued_amount for each card_name.

Final Query:

Calculates the difference (max - min) for each card_name and retrieves it