**⭐ 1. What This Pattern Solves**

Right join: preserve all rows from right table (less common). Full outer join: preserve rows from both sides — used when you must detect unmatched rows on either side (e.g., reconciliation).

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT * FROM A RIGHT JOIN B ON A.k = B.k;

In [0]:
%sql
SELECT * FROM A FULL OUTER JOIN B ON A.k = B.k;

**⭐ 3. Core Idea**

Use how="right" or how="full". Full outer join returns matched and unmatched rows; easiest way to detect missing keys on either side.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
right_joined = df_left.join(df_right, on=join_keys, how="right")
full_joined  = df_left.join(df_right, on=join_keys, how="full")

**⭐ 5. Detailed Example**

In [0]:
A = spark.createDataFrame([(1,'a'),(2,'b')], ['id','va'])
B = spark.createDataFrame([(2,'B'),(3,'C')], ['id','vb'])

full = A.join(B, on='id', how='full') \
        .select('id','va','vb')
full.show()

**⭐ 6. Mini Practice Problems**

Full join yesterday_orders and today_orders to find added/removed orders.

Right join dim to facts to keep all dimensions and find missing facts.

Reconcile bank_statement and ledger with full join and create flags for unmatched rows.

**⭐ 7. Full Data Engineering Problem**

Daily reconciliation pipeline: full join bank_statement and payments on txn_id; produce three outputs — matched, only-in-bank, only-in-ledger. Include reason codes and write to separate partitions for investigation.

**⭐ 8. Time & Space Complexity**

Full/right join same shuffle overhead as other non-broadcast joins; potentially more memory overhead due to storing both side unmatched buffers until match resolution.

**⭐ 9. Common Pitfalls**

Full join produces many nulls — downstream code must handle them.

Using full join on high-cardinality keys without filtering → huge shuffle and storage.

Mistaking right join for left join — prefer left or full for clarity.