### Renaming a column in a dataframe

In PySpark, there are multiple ways to rename columns in a DataFrame. Below are a few methods to achieve this, each explained with sample data and code

In [0]:
# sample data
data = [
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35)
]

columns = ["id", "name", "age"]

df = spark.createDataFrame(data, columns)
df.show()

+---+-------+---+
| id|   name|age|
+---+-------+---+
|  1|  Alice| 30|
|  2|    Bob| 25|
|  3|Charlie| 35|
+---+-------+---+



**Method 1: using withColumnRenamed():**

used to rename one or more columns in a DataFrame. It returns a new DataFrame with the specified column(s) renamed while keeping all other columns unchanged.

df.withColumnRenamed(existing_name, new_name)

In [0]:
# Rename a single column:
df1 = df.withColumnRenamed("name", "first_name")
df1.show()

+---+----------+---+
| id|first_name|age|
+---+----------+---+
|  1|     Alice| 30|
|  2|       Bob| 25|
|  3|   Charlie| 35|
+---+----------+---+



In [0]:
# Rename multiple columns (chaining):
df2 = df.withColumnRenamed("name", "first_name") \
        .withColumnRenamed("id", "emp_id")
df2.show()

+------+----------+---+
|emp_id|first_name|age|
+------+----------+---+
|     1|     Alice| 30|
|     2|       Bob| 25|
|     3|   Charlie| 35|
+------+----------+---+



**Method 2: Using selectExpr():**

You can rename columns by using selectExpr(), where the column renaming is expressed as SQL-like aliases.

In [0]:
# renaming columns using selectExpr()
df3 = df.selectExpr("id", "name as full_name", "age as years")
df3.show()

+---+---------+-----+
| id|full_name|years|
+---+---------+-----+
|  1|    Alice|   30|
|  2|      Bob|   25|
|  3|  Charlie|   35|
+---+---------+-----+



**Method 3: Using toDF()**

You can also rename all the columns in a Dataframe by passing a new list of column names to the toDF() function.

In [0]:
# Renaming DataFrame Columns
df4 = df.toDF("ID", "Full_Name", "Age")
df4.show()

+---+---------+---+
| ID|Full_Name|Age|
+---+---------+---+
|  1|    Alice| 30|
|  2|      Bob| 25|
|  3|  Charlie| 35|
+---+---------+---+



**Method 4: Using alias() (for a single column within select())**

if you are using select() function to choose specific columns, you can rename them using alias()

In [0]:
df5 = df.select(df.id, df.name.alias("emp_name"))
df5.show()

+---+--------+
| id|emp_name|
+---+--------+
|  1|   Alice|
|  2|     Bob|
|  3| Charlie|
+---+--------+



#### Conclusion:

There are various ways to rename columns in PySpark DataFrames, and the method you choose depends on your specific use case. Here's a summary of the approaches:

- **withColumnRenamed():** Useful for renaming one or more columns individually.

- **selectExpr():** SQL-like renaming, good for multiple columns.

- **toDF():** Rename all columns in one go.

- **alias():** Renaming in a select() query for specific columns.