### Adding a New Column to a DataFrame

Write code to add a new column, `Country`, to the DataFrame using different methods in PySpark. 

The Country column values should be based on the following mapping:
- New York: USA
- London: UK
- Sydney: Australia


In [0]:
# Sample data
from pyspark.sql import SparkSession

# initilize spark session
spark = SparkSession.builder.appName("DataFrame Example").getOrCreate()

# Sample data
data = [("John", 28, "New York"), ("Sarah", 24, "London"), ("Michael", 30, "Sydney")]
columns = ["Name", "Age", "City"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

df.show()

+-------+---+--------+
|   Name|Age|    City|
+-------+---+--------+
|   John| 28|New York|
|  Sarah| 24|  London|
|Michael| 30|  Sydney|
+-------+---+--------+



%md
**Using `withColumn(`) and `lit`:** Adding a contant value 

In [0]:
country_df = df.withColumn("Country", lit("Unknown"))
country_df.show()

+-------+---+--------+-------+
|   Name|Age|    City|Country|
+-------+---+--------+-------+
|   John| 28|New York|Unknown|
|  Sarah| 24|  London|Unknown|
|Michael| 30|  Sydney|Unknown|
+-------+---+--------+-------+



**Using `Dictionary` and `replace()`**

In [0]:
from pyspark.sql.functions import col

# Define the mapping dictionary
city_to_country = {
    "New York": "USA",
    "London": "UK",
    "Sydney": "Australia"
}

df_with_country = df.replace(city_to_country, subset=["City"]).withColumnRenamed("City", "Country")
df_with_country.show()

+-------+---+---------+
|   Name|Age|  Country|
+-------+---+---------+
|   John| 28|      USA|
|  Sarah| 24|       UK|
|Michael| 30|Australia|
+-------+---+---------+



**Using `selectExpr()`**

In [0]:
country_df_2 = df.selectExpr("*", "'Unknown' as Contry")
country_df_2.show()

+-------+---+--------+-------+
|   Name|Age|    City| Contry|
+-------+---+--------+-------+
|   John| 28|New York|Unknown|
|  Sarah| 24|  London|Unknown|
|Michael| 30|  Sydney|Unknown|
+-------+---+--------+-------+



%md
**Using `select`**

In [0]:
country_df_3 = df.select("*", lit("Unknown").alias("Country"))
country_df_3.show()

+-------+---+--------+-------+
|   Name|Age|    City|Country|
+-------+---+--------+-------+
|   John| 28|New York|Unknown|
|  Sarah| 24|  London|Unknown|
|Michael| 30|  Sydney|Unknown|
+-------+---+--------+-------+



**Adding a new column using `withColumn(`) with `when()` and `otherwise()` to check the conditions** 

In [0]:
from pyspark.sql.functions import col, lit, when

country_df_4 = df.withColumn("Country", when(col("City")=='New York', "USA") \
                                    .when(col("City")=='London', "UK") \
                                    .when(col("City")=='Sydney', "Australia") \
                                    .otherwise("Unknown"))

country_df_4.show()

+-------+---+--------+---------+
|   Name|Age|    City|  Country|
+-------+---+--------+---------+
|   John| 28|New York|      USA|
|  Sarah| 24|  London|       UK|
|Michael| 30|  Sydney|Australia|
+-------+---+--------+---------+

