In PySpark, the replace() function is used to replace values in a DataFrame. It is very useful when you want to substitute specific values in one or multiple columns.

I'll explain the syntax, use cases, and provide sample PySpark DataFrames with examples.

### Syntax
DataFrame.replace(to_replace, value=None, subset=None)

| Parameter       | Description                                                                                                 |
| --------------- | ----------------------------------------------------------------------------------------------------------- |
| **to\_replace** | The value(s) to replace. It can be a single value, list, dict, or regex.                                    |
| **value**       | The new value(s) to replace with.                                                                           |
| **subset**      | Optional. Specifies the column(s) where replacements should occur. If not provided, applies to all columns. |


In [1]:
data = [
    (1, "A", "Pending"),
    (2, "B", "Completed"),
    (3, "C", "Pending"),
    (4, "D", "Cancelled"),
    (5, "E", "Pending")
]

df = spark.createDataFrame(data, ["ID", "Category", "Status"])
df.show()

StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 3, Finished, Available, Finished)

+---+--------+---------+
| ID|Category|   Status|
+---+--------+---------+
|  1|       A|  Pending|
|  2|       B|Completed|
|  3|       C|  Pending|
|  4|       D|Cancelled|
|  5|       E|  Pending|
+---+--------+---------+



### Example 1 — Replace a Single Value in All Columns

In [2]:
df1 = df.replace("Pending", "In Progress")
df1.show()

StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 4, Finished, Available, Finished)

+---+--------+-----------+
| ID|Category|     Status|
+---+--------+-----------+
|  1|       A|In Progress|
|  2|       B|  Completed|
|  3|       C|In Progress|
|  4|       D|  Cancelled|
|  5|       E|In Progress|
+---+--------+-----------+



### Example 2 — Replace Multiple Values in a Single Column

In [3]:
df2 = df.replace(
    to_replace=["Pending", "Cancelled"],
    value="Not Completed",
    subset=["Status"]
)
df2.show()

StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 5, Finished, Available, Finished)

+---+--------+-------------+
| ID|Category|       Status|
+---+--------+-------------+
|  1|       A|Not Completed|
|  2|       B|    Completed|
|  3|       C|Not Completed|
|  4|       D|Not Completed|
|  5|       E|Not Completed|
+---+--------+-------------+



### Example 3 — Replace Different Values with Different Values

In [5]:
df3 = df.replace(
    to_replace={"Pending": "In Progress", "Cancelled": "Terminated"}
)
df3.show()


StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 7, Finished, Available, Finished)

+---+--------+-----------+
| ID|Category|     Status|
+---+--------+-----------+
|  1|       A|In Progress|
|  2|       B|  Completed|
|  3|       C|In Progress|
|  4|       D| Terminated|
|  5|       E|In Progress|
+---+--------+-----------+



### Example 4 — Replace Values in Multiple Columns

In [4]:
df4 = df.replace(
    to_replace={"A": "Alpha", "B": "Beta"},
    subset=["Category"]
)
df4.show()


StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 6, Finished, Available, Finished)

+---+--------+---------+
| ID|Category|   Status|
+---+--------+---------+
|  1|   Alpha|  Pending|
|  2|    Beta|Completed|
|  3|       C|  Pending|
|  4|       D|Cancelled|
|  5|       E|  Pending|
+---+--------+---------+



### Example 5 — Replace Numeric Values

In [6]:
df5 = df.replace(
    to_replace={1: 100, 3: 300, 5: 500},
    subset=["ID"]
)
df5.show()


StatementMeta(, 70aed4c7-1399-4997-b33b-00d78dbe2b9d, 8, Finished, Available, Finished)

+---+--------+---------+
| ID|Category|   Status|
+---+--------+---------+
|100|       A|  Pending|
|  2|       B|Completed|
|300|       C|  Pending|
|  4|       D|Cancelled|
|500|       E|  Pending|
+---+--------+---------+



### 🔹 Summary Table
| Use Case                       | Code Example                             |
| ------------------------------ | ---------------------------------------- |
| Replace single value           | `df.replace("A", "Alpha")`               |
| Replace multiple values (same) | `df.replace(["A","B"], "NewVal")`        |
| Replace different values       | `df.replace({"A":"Alpha","B":"Beta"})`   |
| Replace in specific column     | `df.replace("A", "Alpha", ["Category"])` |
| Replace numbers                | `df.replace({1:100, 2:200}, ["ID"])`     |


### Key Notes

- Works on string, numeric, and null values.

- Supports single, multiple, and dictionary-based replacements.

- Can limit replacement to specific columns using subset.

- If no subset is given, replacement applies to all columns.