PySpark, the transform() function is a clean and elegant way to apply custom DataFrame transformations.
It allows you to chain multiple transformations together in a readable and modular way.

I'll explain step by step with sample DataFrame examples.

### 1. What is transform() in PySpark?

The transform() function was introduced in PySpark 3.1.0+.

It takes a function as an argument.

That function accepts a DataFrame and returns a DataFrame.

You can chain multiple transforms together to make your code cleaner.

Syntax:

- df.transform(func)


Where:

- df → input DataFrame.

func → custom transformation function.

### 2. Create Sample DataFrame

In [1]:
from pyspark.sql.functions import col

data = [
    (1, "Alice", 2000),
    (2, "Bob", 1500),
    (3, "Charlie", 3000),
    (4, "David", 2500),
]

columns = ["id", "name", "salary"]

df = spark.createDataFrame(data, columns)
df.show()


StatementMeta(, b64d458f-99fc-46b0-8ea5-03dfe9976f2b, 3, Finished, Available, Finished)

+---+-------+------+
| id|   name|salary|
+---+-------+------+
|  1|  Alice|  2000|
|  2|    Bob|  1500|
|  3|Charlie|  3000|
|  4|  David|  2500|
+---+-------+------+



### 3. Example 1 — Simple Transformation

Let's create a function to increase salary by 10%:

In [2]:
def increase_salary(df):
    return df.withColumn("updated_salary", col("salary") * 1.10)
    
df_transformed = df.transform(increase_salary)
df_transformed.show()


StatementMeta(, b64d458f-99fc-46b0-8ea5-03dfe9976f2b, 4, Finished, Available, Finished)

+---+-------+------+------------------+
| id|   name|salary|    updated_salary|
+---+-------+------+------------------+
|  1|  Alice|  2000|            2200.0|
|  2|    Bob|  1500|1650.0000000000002|
|  3|Charlie|  3000|3300.0000000000005|
|  4|  David|  2500|            2750.0|
+---+-------+------+------------------+



### 4. Example 2 — Chain Multiple Transforms

You can chain multiple transformations for better readability.

In [3]:
def filter_high_salary(df):
    return df.filter(col("salary") > 2000)

def add_bonus(df):
    return df.withColumn("bonus", col("salary") * 0.2)

df_transformed = (
    df
    .transform(filter_high_salary)
    .transform(add_bonus)
)

df_transformed.show()


StatementMeta(, b64d458f-99fc-46b0-8ea5-03dfe9976f2b, 5, Finished, Available, Finished)

+---+-------+------+-----+
| id|   name|salary|bonus|
+---+-------+------+-----+
|  3|Charlie|  3000|600.0|
|  4|  David|  2500|500.0|
+---+-------+------+-----+



### 5. Example 3 — Use Lambda with transform()

Instead of defining separate functions, you can use lambda functions:

In [4]:
df_transformed = (
    df
    .transform(lambda df: df.withColumn("tax", col("salary") * 0.05))
    .transform(lambda df: df.withColumn("net_salary", col("salary") - col("tax")))
)

df_transformed.show()


StatementMeta(, b64d458f-99fc-46b0-8ea5-03dfe9976f2b, 6, Finished, Available, Finished)

+---+-------+------+-----+----------+
| id|   name|salary|  tax|net_salary|
+---+-------+------+-----+----------+
|  1|  Alice|  2000|100.0|    1900.0|
|  2|    Bob|  1500| 75.0|    1425.0|
|  3|Charlie|  3000|150.0|    2850.0|
|  4|  David|  2500|125.0|    2375.0|
+---+-------+------+-----+----------+



### 6. When to Use transform()

✅ Best use cases:

- When you want clean, chainable transformations.

- When the same transformations need to be reused across multiple DataFrames.

- When you want to apply conditional transformations.

### 7. Final Combined Example

In [5]:
df_final = (
    df
    .transform(lambda df: df.withColumn("updated_salary", col("salary") * 1.10))
    .transform(lambda df: df.filter(col("updated_salary") > 2500))
    .transform(lambda df: df.withColumn("bonus", col("updated_salary") * 0.15))
)

df_final.show()


StatementMeta(, b64d458f-99fc-46b0-8ea5-03dfe9976f2b, 7, Finished, Available, Finished)

+---+-------+------+------------------+------------------+
| id|   name|salary|    updated_salary|             bonus|
+---+-------+------+------------------+------------------+
|  3|Charlie|  3000|3300.0000000000005|495.00000000000006|
|  4|  David|  2500|            2750.0|             412.5|
+---+-------+------+------------------+------------------+



### Summary Table
| **Aspect**  | **Without `transform()`** | **With `transform()`**  |
| ----------- | ------------------------- | ----------------------- |
| Code Style  | Procedural, less readable | Functional, chainable   |
| Reusability | Harder to reuse functions | Easy to reuse functions |
| Readability | Cluttered                 | Clean & modular         |
