## PySpark withColumn() Usage

PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more.

In [0]:
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.dbutils.restartPython()

#### Load libraries

In [0]:
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import IntegerType, DateType, StringType, StructType, StructField, ArrayType, MapType, DoubleType
from pyspark.sql.functions import lit, col, expr, when

#### Create Spark session

In [0]:
spark = SparkSession.builder.appName('PySpark withColumn() Usage').getOrCreate()

In [0]:
data = [
  ('John', '', 'Smith', '36636', 'M', 2500.0),
  ('Jane', '', 'Doe', '42114', 'F', 500.0),
  ('Richard', 'Laurence', 'Marquette', '97086', 'M', 1500.0),
  ('Israel', '', 'Israeli', '', 'M', 3000.0),
  ('Edward', 'III', '', 'SL4', 'M', 5000.0)
]
 
schema = StructType([
  StructField('firstname', StringType(),True),
  StructField('middlename', StringType(),True),
  StructField('lastname', StringType(),True),
  StructField('zip', StringType(), True),
  StructField('gender', StringType(), True),
  StructField('salary', DoubleType(), True)
])

columns = schema.fieldNames()

df = spark.createDataFrame(data=data, schema=schema)
df.printSchema()
df.show(truncate=False)

#### Change DataType

In [0]:
df.withColumn('salary',col('salary').cast(IntegerType())).show()

#### Update The Value of an Existing Column

In [0]:
df.withColumn('salary',col('salary')*100).show()

#### Create a Column from an Existing

In [0]:
df.withColumn('taxes',col('salary')*0.2).show()

#### Add a New Column

In [0]:
df.withColumn('country', lit('USA')).show()

#### Rename Column

In [0]:
df.withColumnRenamed('gender','sex').show(truncate=False)

#### Drop Column

In [0]:
df.drop('salary').show()

#### The end of the notebook