##### How to iterate & join List of columns using List Comprehension?

In [0]:
target_cols = ['S.No','Target_ID','Target_version_id','sales_id']

target_cols = ', '.join([f't.{col} = s.{col}' for col in target_cols])
target_cols

't.S.No = s.S.No, t.Target_ID = s.Target_ID, t.Target_version_id = s.Target_version_id, t.sales_id = s.sales_id'

In [0]:
target_cols = ['S.No','Target_ID','Target_version_id','sales_id','market_data_id','vehicle_delivery_start_date','vehicle_delivery_end_date','vehicle_delivery_payment_date','vehicle_spread','cluster_monitor','vehicle_price_determination_date','price_status']

In [0]:
# how to join source & target columns using join
# target_cols Specific columns
target_cols_updt_cols = ', '.join([f't.{col} = s.{col}' for col in target_cols])
target_cols_updt_cols

Out[2]: 't.S.No = s.S.No, t.Target_ID = s.Target_ID, t.Target_version_id = s.Target_version_id, t.sales_id = s.sales_id, t.market_data_id = s.market_data_id, t.vehicle_delivery_start_date = s.vehicle_delivery_start_date, t.vehicle_delivery_end_date = s.vehicle_delivery_end_date, t.vehicle_delivery_payment_date = s.vehicle_delivery_payment_date, t.vehicle_spread = s.vehicle_spread, t.cluster_monitor = s.cluster_monitor, t.vehicle_price_determination_date = s.vehicle_price_determination_date, t.price_status = s.price_status'

In [0]:
# List of column names from target_cols
# It creates a new list by looping through target_cols.
[col for col in target_cols]

Out[3]: ['S.No',
 'Target_ID',
 'Target_version_id',
 'sales_id',
 'market_data_id',
 'vehicle_delivery_start_date',
 'vehicle_delivery_end_date',
 'vehicle_delivery_payment_date',
 'vehicle_spread',
 'cluster_monitor',
 'vehicle_price_determination_date',
 'price_status']

In [0]:
# Create a list of source column references for SQL join/select
['s.{col}' for col in target_cols]

Out[4]: ['s.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}',
 's.{col}']

- creates a **new list** of **strings**.
- For **every column name** inside **target_cols**, it builds a **new string starting with "s."**.
- But this code has a **small issue**:
  - `'s.{col}'` will not substitute the value of **col**.
  - It will literally produce: **"s.{col}", "s.{col}", â€¦**

In [0]:
# Create a list of source column references for SQL join/select
[f"s.{col}" for col in target_cols]

Out[5]: ['s.S.No',
 's.Target_ID',
 's.Target_version_id',
 's.sales_id',
 's.market_data_id',
 's.vehicle_delivery_start_date',
 's.vehicle_delivery_end_date',
 's.vehicle_delivery_payment_date',
 's.vehicle_spread',
 's.cluster_monitor',
 's.vehicle_price_determination_date',
 's.price_status']

- when you want to **prefix columns** with an **alias**:

      df.select([f"s.{col}" for col in target_cols])

- This selects:
  - s.S.No
  - s.Target_ID
  - s.Target_version_id

In [0]:
# Create a list of target column references for SQL join/select
[f"t.{col}" for col in target_cols]

Out[6]: ['t.S.No',
 't.Target_ID',
 't.Target_version_id',
 't.sales_id',
 't.market_data_id',
 't.vehicle_delivery_start_date',
 't.vehicle_delivery_end_date',
 't.vehicle_delivery_payment_date',
 't.vehicle_spread',
 't.cluster_monitor',
 't.vehicle_price_determination_date',
 't.price_status']

In [0]:
# Create a list of SQL assignment expressions for each column in target_cols
[f"t.{col} = s.{col}" for col in target_cols]

Out[7]: ['t.S.No = s.S.No',
 't.Target_ID = s.Target_ID',
 't.Target_version_id = s.Target_version_id',
 't.sales_id = s.sales_id',
 't.market_data_id = s.market_data_id',
 't.vehicle_delivery_start_date = s.vehicle_delivery_start_date',
 't.vehicle_delivery_end_date = s.vehicle_delivery_end_date',
 't.vehicle_delivery_payment_date = s.vehicle_delivery_payment_date',
 't.vehicle_spread = s.vehicle_spread',
 't.cluster_monitor = s.cluster_monitor',
 't.vehicle_price_determination_date = s.vehicle_price_determination_date',
 't.price_status = s.price_status']

- This is a **list comprehension** that creates a **list of SQL expressions**.
- For **every column name** in **target_cols**, it makes a **string** of this form:

      t.columnName = s.columnName

- Usually used in PySpark **merge or join** conditions.

      merge_condition = " AND ".join([f"t.{col} = s.{col}" for col in target_cols])

  - This creates a merge condition that **compares columns** of target alias **t** with columns of source alias **s**.