### ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Reemplazar valores de una columna

In [0]:
from pyspark.sql.functions import lit, concat, col

employee_data = [(10,"Raj","Kumar","1999","100","M",2000),
                 (20,"Rahul","Rajan","2002","200","f",2000),
                 (30,"Raghav","Manish","2010","100",None,2000),
                 (40,"Raja","Singh","2004","100","F",2000),
                 (50,"Rama","Krish","2008","400","M",2000),
                 (60,"Rasul","Kutty","2014","500","M",2000),
                 (70,"Kumar","Chand","2004","600","M",2000)
                ]
employee_schema = ["employee_id","first_name","last_name","doj",
                   "employee_dept_id","gender","salary"]

df = spark.createDataFrame(data=employee_data, schema=employee_schema)
df.printSchema()

root
 |-- employee_id: long (nullable = true)
 |-- first_name: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- doj: string (nullable = true)
 |-- employee_dept_id: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- salary: long (nullable = true)



Con el método .withColumn podemos reemplazar los valores de una columna

#### Mismo tipo de dato

La columna debe llevar el mismo nombre que la columna original

In [0]:
df.withColumn('first_name',lit('Chile')).show(truncate=False)

+-----------+----------+---------+----+----------------+------+------+
|employee_id|first_name|last_name|doj |employee_dept_id|gender|salary|
+-----------+----------+---------+----+----------------+------+------+
|10         |Chile     |Kumar    |1999|100             |M     |2000  |
|20         |Chile     |Rajan    |2002|200             |f     |2000  |
|30         |Chile     |Manish   |2010|100             |null  |2000  |
|40         |Chile     |Singh    |2004|100             |F     |2000  |
|50         |Chile     |Krish    |2008|400             |M     |2000  |
|60         |Chile     |Kutty    |2014|500             |M     |2000  |
|70         |Chile     |Chand    |2004|600             |M     |2000  |
+-----------+----------+---------+----+----------------+------+------+



#### Distinto tipo de dato

- La columna debe llevar el mismo nombre que la columna original
- También es posible cambiar los valores a otro tipo de dato. En este caso, el tipo de dato original era STRING y se reemplazó por un valor de tipo de dato INTEGER.

In [0]:
df_modif = df.withColumn('first_name',lit(100))

df_modif.printSchema()

df_modif.show(truncate=False)

root
 |-- employee_id: long (nullable = true)
 |-- first_name: integer (nullable = false)
 |-- last_name: string (nullable = true)
 |-- doj: string (nullable = true)
 |-- employee_dept_id: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- salary: long (nullable = true)

+-----------+----------+---------+----+----------------+------+------+
|employee_id|first_name|last_name|doj |employee_dept_id|gender|salary|
+-----------+----------+---------+----+----------------+------+------+
|10         |100       |Kumar    |1999|100             |M     |2000  |
|20         |100       |Rajan    |2002|200             |f     |2000  |
|30         |100       |Manish   |2010|100             |null  |2000  |
|40         |100       |Singh    |2004|100             |F     |2000  |
|50         |100       |Krish    |2008|400             |M     |2000  |
|60         |100       |Kutty    |2014|500             |M     |2000  |
|70         |100       |Chand    |2004|600             |M     |2000  |
+