## Row using on DataFrame and RDD

pyspark.sql.Row which is represented as a record/row in DataFrame, one can create a Row object by using named arguments, or create a custom Row like class.  

Let's see how to use Row class with named argument and defining realtime class and using it on DataFrame & RDD.

In [0]:
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.dbutils.restartPython()

#### Load libraries

In [0]:
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import IntegerType, DateType, StringType, StructType, StructField

#### Create Spark session

In [0]:
spark = SparkSession.builder.appName('Row using on DataFrame and RDD').getOrCreate()

#### Create a Row Object

In [0]:
row=Row('Elijah',36)
print(f'{row[0]}, {row[1]}')

In [0]:
row=Row(name='Gregory', age=9)
print(f'{row.name}, {row.age}') 

#### Create Custom Class from Row

In [0]:
Person = Row('name', 'age')
p1=Person('Elijah',36)
p2=Person('Gregory',9)
print(f'{p1.name}, {p2.name}')

#### Using Row class on PySpark RDD

In [0]:
data = [
  Row(name='John,,Doe',lang=['Python','R','SQL'],country='USA'), 
  Row(name='Richard,Lionheart,',lang=['C#','C','C++'],country='England'),
  Row(name='Oscar,Fingal O\'Flahertie Wills,Wilde',lang=['Kotlin','JavaScript'],country='Ireland')
]

In [0]:
rdd=spark.sparkContext.parallelize(data)
print(rdd.collect())

In [0]:
# Let’s collect the data and access the data using its properties.
collData=rdd.collect()
for row in collData:
    print(f'{row.name}, {row.lang}')

In [0]:
# Using Person class
Person=Row('name','lang','country')
data = [
  Person('John,,Doe',['Python','R','SQL'],'USA'), 
  Person('Richard,Lionheart,',['C#','C','C++'],'England'),
  Person('Oscar,Fingal O\'Flahertie Wills,Wilde',['Kotlin','JavaScript'],'Ireland')
]
rdd=spark.sparkContext.parallelize(data)
print(rdd.collect())

#### Using Row class on PySpark DataFrame

In [0]:
data = [
  Row(name='John,,Doe',lang=['Python','R','SQL'],country='USA'), 
  Row(name='Richard,Lionheart,',lang=['C#','C','C++'],country='England'),
  Row(name='Oscar,Fingal O\'Flahertie Wills,Wilde',lang=['Kotlin','JavaScript'],country='Ireland')
]

In [0]:
df=spark.createDataFrame(data)
df.printSchema()
df.show()

In [0]:
data = [
  Row('John,,Doe',['Python','R','SQL'],'USA'), 
  Row('Richard,Lionheart,',['C#','C','C++'],'England'),
  Row('Oscar,Fingal O\'Flahertie Wills,Wilde',['Kotlin','JavaScript'],'Ireland')
]

In [0]:
# If you want to apply specific column names, or they are not specified for input
columns = ['name','languagesAtSchool','currentState']
df=spark.createDataFrame(data).toDF(*columns)
df.printSchema()

#### Create Nested Struct Using Row Class

In [0]:
data=[
  Row(name='John',prop=Row(hair='black',eye='brown')),
  Row(name='Marie',prop=Row(hair='blond',eye='black'))
]

df=spark.createDataFrame(data)
df.printSchema()

#### The end of the notebook