Problem Statement:

You’re a consultant for a major pizza chain that will be running a promotion where all 3-topping pizzas will be sold for a fixed price, and are trying to understand the costs involved.

Given a list of pizza toppings, consider all the possible 3-topping pizzas, and print out the total cost of those 3 toppings. Sort the results with the highest total cost on the top followed by pizza toppings in ascending order.

Break ties by listing the ingredients in alphabetical order, starting from the first ingredient, followed by the second and third.

In [0]:
from pyspark.sql.functions import *

# Define the pizza toppings and their costs
data = [
    ("Pepperoni", 0.50),
    ("Sausage", 0.70),
    ("Chicken", 0.55),
    ("Extra Cheese", 0.40),
]

# Create a DataFrame from the input data
toppings_df = spark.createDataFrame(data, ["topping_name", "ingredient_cost"])
toppings_df.display()

topping_name,ingredient_cost
Pepperoni,0.5
Sausage,0.7
Chicken,0.55
Extra Cheese,0.4


In [0]:
from pyspark.sql.functions import *

# Create combinations of three toppings
combinations = (
    toppings_df.alias("p1")
    .join(toppings_df.alias("p2"), col("p1.topping_name") < col("p2.topping_name"))
    .join(toppings_df.alias("p3"), col("p2.topping_name") < col("p3.topping_name"))
    .select(
        concat(
            col("p1.topping_name"),
            lit(","),
            col("p2.topping_name"),
            lit(","),
            col("p3.topping_name"),
        ).alias("pizza"),
        (
            col("p1.ingredient_cost")
            + col("p2.ingredient_cost")
            + col("p3.ingredient_cost")
        ).alias("total_cost"),
    )
)

# Round up the total cost
result = combinations.withColumn("total_cost", round(col("total_cost"), 2))
# Order by total cost descending
result.orderBy(col("total_cost").desc()).display()

pizza,total_cost
"Chicken,Pepperoni,Sausage",1.75
"Chicken,Extra Cheese,Sausage",1.65
"Extra Cheese,Pepperoni,Sausage",1.6
"Chicken,Extra Cheese,Pepperoni",1.45


In [0]:
toppings_df.createOrReplaceTempView("pizza_toppings")

In [0]:
%sql
WITH Toppings AS (
  SELECT
    topping_name,
    ingredient_cost
  FROM
    pizza_toppings
)
SELECT
  CONCAT(
    p1.topping_name,
    ',',
    p2.topping_name,
    ',',
    p3.topping_name
  ) AS pizza,
  ROUND(
    (
      p1.ingredient_cost + p2.ingredient_cost + p3.ingredient_cost
    ),
    2
  ) AS total_cost
FROM
  Toppings AS p1
  INNER JOIN Toppings AS p2 ON p1.topping_name < p2.topping_name
  INNER JOIN Toppings AS p3 ON p2.topping_name < p3.topping_name
ORDER BY
  total_cost DESC;

pizza,total_cost
"Chicken,Pepperoni,Sausage",1.75
"Chicken,Extra Cheese,Sausage",1.65
"Extra Cheese,Pepperoni,Sausage",1.6
"Chicken,Extra Cheese,Pepperoni",1.45


Explanation:

There are four different combinations of the three toppings. Cost of the pizza with toppings Chicken, Pepperoni and Sausage is $0.55 + $0.50 + $0.70 = $1.75.

Additionally, they are arranged alphabetically; in the dictionary, the chicken comes before pepperoni and pepperoni comes before sausage.

The dataset you are querying against may have different input & output - this is just an example!