## PySpark map() Transformation

PySpark `map()` is an RDD transformation that is used to apply the transformation function (`lambda`) on every element of RDD/DataFrame and returns a new RDD.

In [0]:
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.dbutils.restartPython()

#### Load libraries

In [0]:
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import IntegerType, DateType, StringType, StructType, StructField, ArrayType, MapType, DoubleType
from pyspark.sql.functions import lit, col, expr, when, sum, avg, max, min, mean, count

#### Create Spark session

In [0]:
spark = SparkSession.builder.appName('PySpark map() Transformation').getOrCreate()

#### Example with RDD

In [0]:
data = [
  'Some',
  'random',
  'data'
]

rdd = spark.sparkContext.parallelize(data)

In [0]:
rdd2 = rdd.map(lambda x: (x,1))

for element in rdd2.collect():
  print(element)

#### Example with DataFrame

In [0]:
data = [
  ('James','Smith','M',30),
  ('Anna','Rose','F',36),
  ('Robert','Williams','M',21), 
]

columns = ['firstname','lastname','gender','salary']

df = spark.createDataFrame(data=data, schema = columns)
df.show()

In [0]:
# You can refer column names like x["firstname"] or x.firstname
rdd2 = df.rdd.map(lambda x: (f'{x[0]},{x[1]}',x[2],x[3]*10))  
df2=rdd2.toDF(['name','gender','new_salary'])
df2.show()

In [0]:
from random import randint
# Or call a function
def func1(x):
    firstName = x.firstname
    lastName = x.lastname
    name = f'{firstName},{lastName}'
    gender = x.gender.lower()
    salary = x.salary * 10
    return (name, gender, salary)

rdd3=df.rdd.map(lambda x: func1(x))
df3=rdd3.toDF(['name','gender','new_salary'])
df3.show()

#### The end of the notebook