## 2084 - Drop Type 1 Orders for Customers With Type 0 Orders

### Table: Orders

| Column Name | Type |
|-------------|------|
| order_id    | int  |
| customer_id | int  |
| order_type  | int  |

order_id is the primary key column for this table.  
Each row of this table indicates the ID of an order, the ID of the customer who ordered it, and the order type.  
The orders could be of type 0 or type 1.

Write an SQL query to report all the orders based on the following criteria:  
If a customer has at least one order of type 0, do not report any order of type 1 from that customer.  
Otherwise, report all the orders of the customer.

Return the result table in any order.

#### Example 1:

**Input:**  
Orders table:

| order_id | customer_id | order_type |
|----------|-------------|------------|
| 1        | 1           | 0          |
| 2        | 1           | 0          |
| 11       | 2           | 0          |
| 12       | 2           | 1          |
| 21       | 3           | 1          |
| 22       | 3           | 0          |
| 31       | 4           | 1          |
| 32       | 4           | 1          |

**Output:**

| order_id | customer_id | order_type |
|----------|-------------|------------|
| 31       | 4           | 1          |
| 32       | 4           | 1          |
| 1        | 1           | 0          |
| 2        | 1           | 0          |
| 11       | 2           | 0          |
| 22       | 3           | 0          |

**Explanation:**  
Customer 1 has two orders of type 0. We return both of them.  
Customer 2 has one order of type 0 and one order of type 1. We only return the order of type 0.  
Customer 3 has one order of type 0 and one order of type 1. We only return the order of type 0.  
Customer 4 has two orders of type 1. We return both of them.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.functions import col

# Start Spark session
spark = SparkSession.builder.appName("FilterOrders").getOrCreate()

# Define schema
schema = StructType([
    StructField("order_id", IntegerType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("order_type", IntegerType(), True)
])

# Sample data
data = [
    (1, 1, 0),
    (2, 1, 0),
    (11, 2, 0),
    (12, 2, 1),
    (21, 3, 1),
    (22, 3, 0),
    (31, 4, 1),
    (32, 4, 1)
]

# Create DataFrame
df = spark.createDataFrame(data, schema)
df.createOrReplaceTempView("Orders")


In [0]:
%sql
Select order_id , customer_id ,order_type  from orders where customer_id not in (Select distinct customer_id from orders where order_type = 0)
union 
Select order_id , customer_id ,order_type  from orders where customer_id not in (Select distinct customer_id from orders where order_type = 1)
union 
Select order_id , customer_id ,order_type  from orders where order_type = 0 and customer_id   in (Select distinct customer_id from orders where order_type = 1)

In [0]:

# SQL logic
query = """
Select order_id , customer_id ,order_type  from orders where customer_id not in (Select distinct customer_id from orders where order_type = 0)
union 
Select order_id , customer_id ,order_type  from orders where customer_id not in (Select distinct customer_id from orders where order_type = 1)
union 
Select order_id , customer_id ,order_type  from orders where order_type = 0 and customer_id   in (Select distinct customer_id from orders where order_type = 1)
"""

# Execute and display
result = spark.sql(query)
display(result)

In [0]:
from pyspark.sql.functions import *
# Subquery: customers with type 0
type0_customers = df.filter(col("order_type") == 0).select("customer_id").distinct()

# Subquery: customers with type 1
type1_customers = df.filter(col("order_type") == 1).select("customer_id").distinct()

# Part 1: customers who never ordered type 0
part1 = df.join(type0_customers, "customer_id", "left_anti")

# Part 2: customers who never ordered type 1
part2 = df.join(type1_customers, "customer_id", "left_anti")

# Part 3: customers who ordered both, but only include their type 0 orders
part3 = df.filter(col("order_type") == 0).join(type1_customers, "customer_id", "inner")

# Combine all parts
part1.union(part2).union(part3).display()
