### Explain PySpark UDF with the help of an example.

- PySpark **UDF (User-Defined Function)** allows you to create custom functions to perform transformations on data in a DataFrame.
- You define a function in Python and then register it as a UDF so that it can be used with PySpark.
- These are particularly useful when you need to perform complex transformations or calculations that aren't readily available in PySpark's built-in functions.

#### Key Points about UDFs:
- You can use a `UDF` to apply any custom logic to a column in a PySpark DataFrame.
- The function needs to be registered as a UDF before it can be used.
- `UDFs` can be slow because they don't benefit from PySpark's internal optimizations

#### Steps to create and use a UDF:
1. Define a Python function.
2. Convert the function into a UDF using `pyspark.sql.functions.udf.`
3. Apply the UDF to a DataFrame

**Example:**

Let’s assume we have a DataFrame with a column called "name" and we want to create a UDF to convert the names to uppercase.

In [0]:
# sample data
data = [
    ("Rohish", 1),
    ("Rajesh", 2),
    ("Chetan", 3)
]

columns = ["name", "id"]

df = spark.createDataFrame(data, columns)
df.show()

+------+---+
|  name| id|
+------+---+
|Rohish|  1|
|Rajesh|  2|
|Chetan|  3|
+------+---+



In [0]:
# Step 1: Define a Python function
def convert_to_upper(name):
    return name.upper()

In [0]:
# Step 2: Register the function as a UDF
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

# Register the udf: Here, udf() wraps your Python function and tells Spark that it returns a string (StringType()).
convert_upper_udf = udf(convert_to_upper, StringType())

In [0]:
# Step 3: Apply the UDF to the DataFrame
from pyspark.sql.functions import col

upper_df = df.withColumn("name", convert_upper_udf(col("name")))
upper_df.show()

+------+---+
|  name| id|
+------+---+
|ROHISH|  1|
|RAJESH|  2|
|CHETAN|  3|
+------+---+



In [0]:
# with PySpark in build function
from pyspark.sql.functions import col, upper

df.withColumn("name", upper(col("name"))).show()