# Join Strategies & Performance Tuning with PySpark (DataFrame-only, Serverless-friendly)

**Datasets:**
- `samples.tpch.customer`
- `samples.tpch.orders`
- `samples.tpch.lineitem`

In this notebook you will:
1. Perform star-schema joins
2. Inspect physical plans (`explain`)
3. Use broadcast joins
4. Use caching & reuse
5. Use `repartition` / `coalesce` with **DataFrame-only partition introspection**
6. Enable Adaptive Query Execution (AQE)


In [0]:
from pyspark.sql import functions as F

customer_df = spark.read.table("samples.tpch.customer")
orders_df   = spark.read.table("samples.tpch.orders")
lineitem_df = spark.read.table("samples.tpch.lineitem")

print("Customer count:", customer_df.count())
print("Orders count:", orders_df.count())
print("Lineitem count:", lineitem_df.count())

display(customer_df.limit(5))


Customer count: 750000
Orders count: 7500000
Lineitem count: 29999795


c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
412445,Customer#000412445,"0QAB3OjYnbP6mA0B,kgf",21,31-421-403-4333,5358.33,BUILDING,arefully blithely regular epi
412446,Customer#000412446,"5u8MSbyiC7J,7PuY4Ivaq1JRbTCMKeNVqg",20,30-487-949-7942,9441.59,MACHINERY,"sleep according to the fluffily even forges. fluffily careful packages after the ironic, silent deposi"
412447,Customer#000412447,HC4ZT62gKPgrjr ceoaZgFOunlUogr7GO,7,17-797-466-6308,7868.75,AUTOMOBILE,aggle blithely among the carefully express excus
412448,Customer#000412448,hJok1MMrDgH,6,16-541-510-4964,6060.98,MACHINERY,ly silent requests boost slyly. express courts sleep according to the fluf
412449,Customer#000412449,"zAt1nZNG01gOhIqgyDtDa S,Y0VSofZJs1dd",14,24-710-983-5536,4973.84,HOUSEHOLD,"refully final theodolites. final, slow excuses sleep quickly! quickly ironic idea"


In [0]:
lineitem_df.show()

+----------+---------+---------+------------+----------+---------------+----------+-----+------------+------------+----------+------------+-------------+-----------------+----------+--------------------+
|l_orderkey|l_partkey|l_suppkey|l_linenumber|l_quantity|l_extendedprice|l_discount|l_tax|l_returnflag|l_linestatus|l_shipdate|l_commitdate|l_receiptdate|   l_shipinstruct|l_shipmode|           l_comment|
+----------+---------+---------+------------+----------+---------------+----------+-----+------------+------------+----------+------------+-------------+-----------------+----------+--------------------+
|  16933317|   275823|    13339|           4|     13.00|       23384.53|      0.06| 0.02|           N|           O|1995-07-19|  1995-07-24|   1995-07-22|DELIVER IN PERSON|      MAIL|longside of the b...|
|  16933317|   757037|     7038|           5|     38.00|       41572.00|      0.02| 0.00|           N|           F|1995-06-06|  1995-08-07|   1995-06-25|      COLLECT COD|      SHIP|th

In [0]:
orders_df.show()

+----------+---------+-------------+------------+-----------+---------------+---------------+--------------+--------------------+
|o_orderkey|o_custkey|o_orderstatus|o_totalprice|o_orderdate|o_orderpriority|        o_clerk|o_shippriority|           o_comment|
+----------+---------+-------------+------------+-----------+---------------+---------------+--------------+--------------------+
|   5611649|   687736|            O|    51905.72| 1996-06-04|          5-LOW|Clerk#000002954|             0|ls! bold, regular...|
|   5611650|   513292|            O|    62845.59| 1997-02-28|          5-LOW|Clerk#000002092|             0|to boost ironical...|
|   5611651|   395308|            F|   226256.25| 1992-03-05|       1-URGENT|Clerk#000004123|             0|its. regular foxe...|
|   5611652|   423847|            O|   141103.54| 1995-08-07|       3-MEDIUM|Clerk#000003975|             0|onic accounts. fu...|
|   5611653|    90844|            O|   157430.11| 1996-08-14|          5-LOW|Clerk#0000047

## 1. Basic Star-Schema Join

We'll join:
- `customer` -> `orders` on `c_custkey = o_custkey`
- `orders` -> `lineitem` on `o_orderkey = l_orderkey`


In [0]:
# Join customer to 
cust_orders_df.show()

+---------+------------------+--------------------+-----------+---------------+---------+------------+--------------------+----------+---------+-------------+------------+-----------+---------------+---------------+--------------+--------------------+
|c_custkey|            c_name|           c_address|c_nationkey|        c_phone|c_acctbal|c_mktsegment|           c_comment|o_orderkey|o_custkey|o_orderstatus|o_totalprice|o_orderdate|o_orderpriority|        o_clerk|o_shippriority|           o_comment|
+---------+------------------+--------------------+-----------+---------------+---------+------------+--------------------+----------+---------+-------------+------------+-----------+---------------+---------------+--------------+--------------------+
|   715561|Customer#000715561|        ercONarY3you|         23|33-589-619-2261|  7158.20|  AUTOMOBILE|ar warhorses. bli...|  11419110|   715561|            F|   104040.78| 1992-03-09|       1-URGENT|Clerk#000001175|             0|es. final dolp

In [0]:
# Join customer to orders
cust_orders_df = (
    customer_df.alias("c")
    .join(orders_df.alias("o"), F.col("c.c_custkey") == F.col("o.o_custkey"), "inner")
)

# Join the result to lineitem
cust_orders_lineitem_df = (
    cust_orders_df.alias("co")
    .join(lineitem_df.alias("l"), F.col("co.o_orderkey") == F.col("l.l_orderkey"), "inner")
)

display(cust_orders_lineitem_df.select("c_custkey", "o_orderkey", "l_linenumber").limit(10))


c_custkey,o_orderkey,l_linenumber
749555,25246752,1
749555,25246752,2
749555,25246752,3
642439,25247749,1
642439,25247749,2
642439,25247749,3
642439,25247749,4
520025,25254565,1
520025,25254565,2
520025,25254565,3


## 2. Inspect the Physical Plan with `explain`

This shows:
- Join types (BroadcastHashJoin, SortMergeJoin, etc.)
- Shuffle operations
- Estimated statistics


In [0]:
cust_orders_lineitem_df.explain(mode="extended")


== Parsed Logical Plan ==
'Join Inner, '`==`('co.o_orderkey, 'l.l_orderkey)
:- 'SubqueryAlias co
:  +- 'Join Inner, '`==`('c.c_custkey, 'o.o_custkey)
:     :- 'SubqueryAlias c
:     :  +- 'UnresolvedRelation [samples, tpch, customer], [], false
:     +- 'SubqueryAlias o
:        +- 'UnresolvedRelation [samples, tpch, orders], [], false
+- 'SubqueryAlias l
   +- 'UnresolvedRelation [samples, tpch, lineitem], [], false

== Analyzed Logical Plan ==
c_custkey: bigint, c_name: string, c_address: string, c_nationkey: bigint, c_phone: string, c_acctbal: decimal(18,2), c_mktsegment: string, c_comment: string, o_orderkey: bigint, o_custkey: bigint, o_orderstatus: string, o_totalprice: decimal(18,2), o_orderdate: date, o_orderpriority: string, o_clerk: string, o_shippriority: int, o_comment: string, l_orderkey: bigint, l_partkey: bigint, l_suppkey: bigint, l_linenumber: int, l_quantity: decimal(18,2), l_extendedprice: decimal(18,2), l_discount: decimal(18,2), l_tax: decimal(18,2), ... 8 more fie

## 3. Aggregate Query as a Baseline

Example query:
- Revenue per customer (`c_custkey`)
- Using sum of `l_extendedprice * (1 - l_discount)`


In [0]:
baseline_revenue_df = (
    cust_orders_lineitem_df
    .groupBy("c_custkey")
    .agg(
        F.sum(
            F.col("l_extendedprice") * (1 - F.col("l_discount"))
        ).alias("customer_revenue")
    )
)

display(baseline_revenue_df.orderBy(F.col("customer_revenue").desc()).limit(20))


c_custkey,customer_revenue
721180,6882116.9796
299701,6866658.9107
382414,6858878.6429
321256,6821452.8305
484219,6667510.1891
179275,6618420.6473
292987,6574867.7884
253099,6550241.9577
333100,6537584.5063
370441,6519557.7886


## 4. Broadcast Join Optimization

- If one side of a join is **small enough**, we can broadcast it.
- Spark then avoids a shuffle on that side.

We'll:
- Broadcast the `customer` table when joining to `orders`.


In [0]:
from pyspark.sql.functions import broadcast

broadcast_cust_orders_df = (
    broadcast(customer_df.alias("c"))
    .join(orders_df.alias("o"), F.col("c.c_custkey") == F.col("o.o_custkey"), "inner")
)

broadcast_all_df = (
    broadcast_cust_orders_df.alias("co")
    .join(lineitem_df.alias("l"), F.col("co.o_orderkey") == F.col("l.l_orderkey"), "inner")
)

broadcast_all_df.explain(mode="extended")


== Parsed Logical Plan ==
'Join Inner, '`==`('co.o_orderkey, 'l.l_orderkey)
:- 'SubqueryAlias co
:  +- 'Join Inner, '`==`('c.c_custkey, 'o.o_custkey)
:     :- 'UnresolvedHint broadcast
:     :  +- 'SubqueryAlias c
:     :     +- 'UnresolvedRelation [samples, tpch, customer], [], false
:     +- 'SubqueryAlias o
:        +- 'UnresolvedRelation [samples, tpch, orders], [], false
+- 'SubqueryAlias l
   +- 'UnresolvedRelation [samples, tpch, lineitem], [], false

== Analyzed Logical Plan ==
c_custkey: bigint, c_name: string, c_address: string, c_nationkey: bigint, c_phone: string, c_acctbal: decimal(18,2), c_mktsegment: string, c_comment: string, o_orderkey: bigint, o_custkey: bigint, o_orderstatus: string, o_totalprice: decimal(18,2), o_orderdate: date, o_orderpriority: string, o_clerk: string, o_shippriority: int, o_comment: string, l_orderkey: bigint, l_partkey: bigint, l_suppkey: bigint, l_linenumber: int, l_quantity: decimal(18,2), l_extendedprice: decimal(18,2), l_discount: decimal(18

In [0]:
display(broadcast_all_df.limit(10))

c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment,o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment,l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
749555,Customer#000749555,",,tmYFLbwnxilL4",9,19-876-429-6070,337.66,BUILDING,ly along the blithely express account,25246752,749555,O,42896.38,1996-07-23,5-LOW,Clerk#000001259,0,nts. carefully ironic deposits wake doggedly.,25246752,564501,2035,1,5.0,7827.4,0.0,0.04,N,O,1996-10-05,1996-08-23,1996-10-25,NONE,REG AIR,ans use furiously
749555,Customer#000749555,",,tmYFLbwnxilL4",9,19-876-429-6070,337.66,BUILDING,ly along the blithely express account,25246752,749555,O,42896.38,1996-07-23,5-LOW,Clerk#000001259,0,nts. carefully ironic deposits wake doggedly.,25246752,563907,13908,2,13.0,25621.44,0.0,0.01,N,O,1996-08-04,1996-08-29,1996-08-21,TAKE BACK RETURN,TRUCK,c courts ser
749555,Customer#000749555,",,tmYFLbwnxilL4",9,19-876-429-6070,337.66,BUILDING,ly along the blithely express account,25246752,749555,O,42896.38,1996-07-23,5-LOW,Clerk#000001259,0,nts. carefully ironic deposits wake doggedly.,25246752,583213,8236,3,7.0,9073.33,0.05,0.03,N,O,1996-08-28,1996-10-09,1996-09-05,TAKE BACK RETURN,SHIP,quickly final requests. unusual fox
642439,Customer#000642439,"KIqbfbFsk,",10,20-333-225-2836,9292.43,HOUSEHOLD,he quickly regular deposits. permanently even pinto beans over the blithely ironic pla,25247749,642439,O,241064.3,1997-05-04,3-MEDIUM,Clerk#000000215,0,xes. quickly specia,25247749,725692,25693,1,12.0,20611.92,0.06,0.07,N,O,1997-06-23,1997-06-25,1997-07-05,TAKE BACK RETURN,REG AIR,unts. carefully regular dependencies wak
642439,Customer#000642439,"KIqbfbFsk,",10,20-333-225-2836,9292.43,HOUSEHOLD,he quickly regular deposits. permanently even pinto beans over the blithely ironic pla,25247749,642439,O,241064.3,1997-05-04,3-MEDIUM,Clerk#000000215,0,xes. quickly specia,25247749,695577,20604,2,36.0,56611.44,0.08,0.06,N,O,1997-05-26,1997-07-28,1997-06-23,COLLECT COD,AIR,lites unwind after the p
642439,Customer#000642439,"KIqbfbFsk,",10,20-333-225-2836,9292.43,HOUSEHOLD,he quickly regular deposits. permanently even pinto beans over the blithely ironic pla,25247749,642439,O,241064.3,1997-05-04,3-MEDIUM,Clerk#000000215,0,xes. quickly specia,25247749,559895,22407,3,49.0,95788.63,0.04,0.08,N,O,1997-05-21,1997-06-22,1997-06-03,NONE,MAIL,"ges wake slyly special, bold foxes. accou"
642439,Customer#000642439,"KIqbfbFsk,",10,20-333-225-2836,9292.43,HOUSEHOLD,he quickly regular deposits. permanently even pinto beans over the blithely ironic pla,25247749,642439,O,241064.3,1997-05-04,3-MEDIUM,Clerk#000000215,0,xes. quickly specia,25247749,139515,14520,4,40.0,62180.4,0.02,0.08,N,O,1997-08-12,1997-08-02,1997-09-09,DELIVER IN PERSON,MAIL,egular packages. careful
520025,Customer#000520025,05L0XQVLhEo2B5TqsU75LOeZspVRyqpJu2T,23,33-959-970-3504,4948.39,HOUSEHOLD,nusual deposits. furiously special foxes cajole slyly. blithely final accounts snooze s,25254565,520025,F,163130.29,1994-09-02,2-HIGH,Clerk#000000683,0,inal instructions wake across the ruthlessly even deposits. b,25254565,826465,1498,1,30.0,41742.6,0.0,0.03,R,F,1994-12-08,1994-10-05,1994-12-31,TAKE BACK RETURN,FOB,nside the furiously pending instructions.
520025,Customer#000520025,05L0XQVLhEo2B5TqsU75LOeZspVRyqpJu2T,23,33-959-970-3504,4948.39,HOUSEHOLD,nusual deposits. furiously special foxes cajole slyly. blithely final accounts snooze s,25254565,520025,F,163130.29,1994-09-02,2-HIGH,Clerk#000000683,0,inal instructions wake across the ruthlessly even deposits. b,25254565,382023,32024,2,9.0,9945.09,0.03,0.05,R,F,1994-10-28,1994-10-26,1994-10-29,NONE,FOB,ial packages are quickly bold
520025,Customer#000520025,05L0XQVLhEo2B5TqsU75LOeZspVRyqpJu2T,23,33-959-970-3504,4948.39,HOUSEHOLD,nusual deposits. furiously special foxes cajole slyly. blithely final accounts snooze s,25254565,520025,F,163130.29,1994-09-02,2-HIGH,Clerk#000000683,0,inal instructions wake across the ruthlessly even deposits. b,25254565,650440,25467,3,18.0,25027.38,0.04,0.05,A,F,1994-12-11,1994-11-14,1995-01-05,DELIVER IN PERSON,TRUCK,deas. even packages


## 5. Caching & Reuse

If you use the same intermediate result many times:
- Use `.cache()` or `.persist()` to avoid recompute + re-read.


In [0]:
# # Cache the heavy join
# broadcast_all_df_cached = broadcast_all_df.cache()

# # Trigger cache materialization
# broadcast_all_df_cached.count()

# # Re-use cached DF for multiple aggregations
# revenue_by_customer_df = (
#     broadcast_all_df_cached
#     .groupBy("c_custkey")
#     .agg(
#         F.sum(
#             F.col("l_extendedprice") * (1 - F.col("l_discount"))
#         ).alias("customer_revenue")
#     )
# )

# revenue_by_nation_df = (
#     broadcast_all_df_cached
#     .groupBy("c_nationkey")
#     .agg(
#         F.sum(
#             F.col("l_extendedprice") * (1 - F.col("l_discount"))
#         ).alias("nation_revenue")
#     )
# )

# display(revenue_by_customer_df.orderBy(F.col("customer_revenue").desc()).limit(10))
# display(revenue_by_nation_df.orderBy(F.col("nation_revenue").desc()).limit(10))


## 6. Repartitioning & Coalescing (DataFrame-only Partition Introspection)

- Use `repartition()` to **increase** parallelism or shuffle by keys.
- Use `coalesce()` to **decrease** number of partitions without a full shuffle.
- Instead of `df.rdd.getNumPartitions()`, we use `spark_partition_id()` to count partitions.


In [0]:
# Repartition by key used in downstream aggregations
repartitioned_df = broadcast_all_df.repartition(64, "c_custkey")  # 64 is just an example

repartitioned_with_pid = repartitioned_df.withColumn("partition_id", F.spark_partition_id())
num_parts_repart = (
    repartitioned_with_pid
    .select("partition_id")
    .agg(F.countDistinct("partition_id").alias("num_partitions"))
    .collect()[0]["num_partitions"]
)

print("Repartitioned partitions (via spark_partition_id):", num_parts_repart)


Repartitioned partitions (via spark_partition_id): 64


In [0]:
# Coalesce when writing out or for subsequent stages
coalesced_df = repartitioned_df.coalesce(8)
coalesced_with_pid = coalesced_df.withColumn("partition_id", F.spark_partition_id())
num_parts_coal = (
    coalesced_with_pid
    .select("partition_id")
    .agg(F.countDistinct("partition_id").alias("num_partitions"))
    .collect()[0]["num_partitions"]
)

print("Coalesced partitions (via spark_partition_id):", num_parts_coal)


Coalesced partitions (via spark_partition_id): 8


In [0]:
display(coalesced_df.limit(10))

c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment,o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment,l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
454852,Customer#000454852,"gL6hKYbG4X7,iEHGVZT8u ioBvA1AD",7,17-460-780-6993,2258.7,FURNITURE,beans. furiously special dependencies haggle quickly fluffily final accounts. quickly silent f,14281504,454852,O,50146.65,1997-03-31,4-NOT SPECIFIED,Clerk#000004953,0,heodolites? final pinto beans against the furiously pending accounts integrat,14281504,387220,24742,1,37.0,48366.77,0.04,0.08,N,O,1997-07-21,1997-06-16,1997-08-04,COLLECT COD,FOB,tornis are furiously slyly pending theodoli
647096,Customer#000647096,"rCrd8HI9f,gXX,zkkMsKXFarH77ffi5u",9,19-256-191-7999,-788.2,MACHINERY,arefully even theodolites haggle furiously blit,19755906,647096,O,72262.47,1997-06-23,2-HIGH,Clerk#000004996,0,ickly regular accounts. blithely final instructions are quickly carefull,19755906,685546,35547,1,23.0,35224.73,0.0,0.0,N,O,1997-09-11,1997-08-24,1997-09-29,DELIVER IN PERSON,AIR,accounts. furiously even ideas affix ag
647096,Customer#000647096,"rCrd8HI9f,gXX,zkkMsKXFarH77ffi5u",9,19-256-191-7999,-788.2,MACHINERY,arefully even theodolites haggle furiously blit,19755906,647096,O,72262.47,1997-06-23,2-HIGH,Clerk#000004996,0,ickly regular accounts. blithely final instructions are quickly carefull,19755906,965766,15767,2,22.0,40297.84,0.09,0.01,N,O,1997-09-01,1997-08-06,1997-09-19,TAKE BACK RETURN,TRUCK,rs among the furiously speci
204032,Customer#000204032,Ea4xmAX2d5WvL0A,21,31-371-244-4107,2436.79,BUILDING,the regularly final ideas. final accoun,20197252,204032,F,31128.18,1992-06-11,5-LOW,Clerk#000002613,0,ily unusual accounts nag furiously blithely regular accounts. realms are,20197252,374557,37065,1,13.0,21210.02,0.07,0.04,R,F,1992-08-22,1992-08-24,1992-08-29,COLLECT COD,SHIP,foxes. carefully ironic packages
204032,Customer#000204032,Ea4xmAX2d5WvL0A,21,31-371-244-4107,2436.79,BUILDING,the regularly final ideas. final accoun,20197252,204032,F,31128.18,1992-06-11,5-LOW,Clerk#000002613,0,ily unusual accounts nag furiously blithely regular accounts. realms are,20197252,176155,13665,2,9.0,11080.35,0.07,0.03,A,F,1992-06-19,1992-08-22,1992-07-17,TAKE BACK RETURN,REG AIR,tructions cajole. blith
282370,Customer#000282370,"ucdNAw 4CzhlcxEXMxIbjo,1xLy",18,28-300-145-5993,6916.59,FURNITURE,tructions snooze. packages sleep carefully regular ideas. carefully expre,18373699,282370,O,125030.95,1997-09-02,5-LOW,Clerk#000004692,0,liers about the slyly pending deposits nag slowly among the slyl,18373699,583257,33258,1,27.0,36186.21,0.03,0.01,N,O,1997-12-17,1997-10-24,1998-01-10,NONE,RAIL,ts along the dependencies sleep quickl
282370,Customer#000282370,"ucdNAw 4CzhlcxEXMxIbjo,1xLy",18,28-300-145-5993,6916.59,FURNITURE,tructions snooze. packages sleep carefully regular ideas. carefully expre,18373699,282370,O,125030.95,1997-09-02,5-LOW,Clerk#000004692,0,liers about the slyly pending deposits nag slowly among the slyl,18373699,784485,22031,2,8.0,12555.6,0.1,0.02,N,O,1997-10-31,1997-11-06,1997-11-18,DELIVER IN PERSON,RAIL,ges. bold accounts are. carefully fi
282370,Customer#000282370,"ucdNAw 4CzhlcxEXMxIbjo,1xLy",18,28-300-145-5993,6916.59,FURNITURE,tructions snooze. packages sleep carefully regular ideas. carefully expre,18373699,282370,O,125030.95,1997-09-02,5-LOW,Clerk#000004692,0,liers about the slyly pending deposits nag slowly among the slyl,18373699,824128,36645,3,44.0,46291.52,0.02,0.04,N,O,1997-12-05,1997-10-25,1998-01-04,TAKE BACK RETURN,AIR,s. quickly regular packages nag quickly fin
282370,Customer#000282370,"ucdNAw 4CzhlcxEXMxIbjo,1xLy",18,28-300-145-5993,6916.59,FURNITURE,tructions snooze. packages sleep carefully regular ideas. carefully expre,18373699,282370,O,125030.95,1997-09-02,5-LOW,Clerk#000004692,0,liers about the slyly pending deposits nag slowly among the slyl,18373699,105639,43146,4,19.0,31247.97,0.05,0.04,N,O,1997-12-26,1997-11-12,1998-01-11,DELIVER IN PERSON,TRUCK,sts detect fluffily eve
181666,Customer#000181666,3Rk8sg4XpeIsXHYqAjRAdp3ho7gciBdBnfOL,18,28-997-685-9940,1262.76,HOUSEHOLD,lms are across the carefully e,18452935,181666,O,101722.36,1998-04-11,1-URGENT,Clerk#000001616,0,"f the even, ironic requests. even",18452935,18232,18233,1,26.0,29905.98,0.04,0.07,N,O,1998-05-08,1998-06-23,1998-05-14,DELIVER IN PERSON,TRUCK,ngside of the daringly pending accounts.


## 7. Adaptive Query Execution (AQE)

AQE can:
- Automatically coalesce shuffle partitions
- Change join strategies at runtime
- Handle skew

Make sure it's enabled:
