In [None]:
In PySpark,fillna() from DataFrame class or fill() from DataFrameNaFunctions is used to replace NULL/None values on all 
or selected multiple columns with either zero(0), empty string, space, or any constant literal values.

In [None]:
If you use fillna() on a column that doesn’t exist in the DataFrame, it will not raise an error. 
The method will simply have no effect on the DataFrame.

we can specify different replacement values for different columns when using fillna().
Example
df.fillna({"zipcode":0,"population":50000}) will fill in missing values in the “zipcode” column with 0 and in the “population” column with 50000.

Filling missing values with fillna() can be resource-intensive for large DataFrames. It’s essential to consider the performance impact, especially when working with big data. 
Optimize your code and use appropriate caching or storage strategies to improve performance.

In [3]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]").appName("SparkByExamples.com").getOrCreate()

filePath="/home/jovyan/work/data/small_zipcode.csv"
df = spark.read.options(header='true', inferSchema='true').csv(filePath)

df.printSchema()
df.show(truncate=False)

#Replace 0 for null for all integer columns
df.fillna(value=0).show()
#Replace 0 for null on only population column 
df.fillna(value=0,subset=["population"]).show()
#Replace 0 for null for all integer columns
df.na.fill(value=0).show()

#Replace 0 for null on only population column 
df.na.fill(value=0,subset=["population"]).show()

# Replace Null/None Value with Empty String

df.fillna(value="").show()
df.na.fill(value="").show()

df.fillna("unknown",["city"]).fillna("",["type"]).show()

df.fillna({"city": "unknown", "type": ""}).show()

df.na.fill("unknown",["city"]).na.fill("",["type"]).show()

df.na.fill({"city": "unknown", "type": ""}).show()

root
 |-- id: integer (nullable = true)
 |-- zipcode: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- population: integer (nullable = true)

+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type    |city               |state|population|
+---+-------+--------+-------------------+-----+----------+
|1  |704    |STANDARD|null               |PR   |30100     |
|2  |704    |null    |PASEO COSTA DEL SUR|PR   |null      |
|3  |709    |null    |BDA SAN LUIS       |PR   |3700      |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS  |TX   |84000     |
|5  |76177  |STANDARD|null               |TX   |null      |
+---+-------+--------+-------------------+-----+----------+

+---+-------+--------+-------------------+-----+----------+
| id|zipcode|    type|               city|state|population|
+---+-------+--------+-------------------+-----+----------+
|  1|    704|STANDARD|               null|   P