PySpark flatMap() is a transformation operation that flattens the RDD/DataFrame (array/map DataFrame columns)     
after applying the function on every element and returns a new PySpark RDD/DataFrame.

In [1]:
import pyspark
from pyspark.sql import SparkSession

In [2]:
spark = SparkSession.builder.appName("FlatMap-Example").getOrCreate()

In [3]:
data = ["Project Gutenberg’s",
        "Alice’s Adventures in Wonderland",
        "Project Gutenberg’s",
        "Adventures in Wonderland",
        "Project Gutenberg’s"]

In [4]:
rdd = spark.sparkContext.parallelize(data)

In [5]:
for element in rdd.collect():
    print(element)

Project Gutenberg’s
Alice’s Adventures in Wonderland
Project Gutenberg’s
Adventures in Wonderland
Project Gutenberg’s


In [6]:
rdd2 = rdd.flatMap(lambda x: x.split(" "))
for element in rdd.collect():
    print(element)

Project Gutenberg’s
Alice’s Adventures in Wonderland
Project Gutenberg’s
Adventures in Wonderland
Project Gutenberg’s


Unfortunately, PySpark DataFame doesn’t have flatMap() transformation however,      
DataFrame has explode() SQL function that is used to flatten the column. Below is a complete example.

In [7]:
from pyspark.sql.functions import explode

In [8]:
arrayData = [
        ('James',['Java','Scala'],{'hair':'black','eye':'brown'}),
        ('Michael',['Spark','Java',None],{'hair':'brown','eye':None}),
        ('Robert',['CSharp',''],{'hair':'red','eye':''}),
        ('Washington',None,None),
        ('Jefferson',['1','2'],{})]

In [9]:
df = spark.createDataFrame(data=arrayData,
                        schema = ['name', 'knownLanguages', 'properties'])

In [13]:
df.show()

+----------+--------------+--------------------+
|      name|knownLanguages|          properties|
+----------+--------------+--------------------+
|     James| [Java, Scala]|[eye -> brown, ha...|
|   Michael|[Spark, Java,]|[eye ->, hair -> ...|
|    Robert|    [CSharp, ]|[eye -> , hair ->...|
|Washington|          null|                null|
| Jefferson|        [1, 2]|                  []|
+----------+--------------+--------------------+



In [10]:
df2 = df.select(df.name, explode(df.knownLanguages))

In [11]:
df2.printSchema()

root
 |-- name: string (nullable = true)
 |-- col: string (nullable = true)



In [12]:
df2.show()

+---------+------+
|     name|   col|
+---------+------+
|    James|  Java|
|    James| Scala|
|  Michael| Spark|
|  Michael|  Java|
|  Michael|  null|
|   Robert|CSharp|
|   Robert|      |
|Jefferson|     1|
|Jefferson|     2|
+---------+------+

