## PySpark sort() and orderBy() Usage

PySpark `sort()` returns a new DataFrame sorted by the specified column(s).   
`orderBy()` is an alias for `sort()` function

In [0]:
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.dbutils.restartPython()

#### Load libraries

In [0]:
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import IntegerType, DateType, StringType, StructType, StructField, ArrayType, MapType, DoubleType
from pyspark.sql.functions import lit, col, expr, when

#### Create Spark session

In [0]:
spark = SparkSession.builder.appName('PySpark sort() and orderBy() Usage').getOrCreate()

In [0]:
data = [
    ('Sam', 'Software Engineer', 'US', 5000),
    ('Adam', 'Data Scientist', 'US', 6000),
    ('Jonas', 'Sales Person', 'Wales', 5000),
    ('Peter', 'CTO', 'Ireland', 10000),
    ('Ann', 'Data Analyst', 'Australia', 6000),
    ('Ralph', 'CEO', 'Germany', 15000),
    ('Lekhana', 'Advertising', 'England', 4500),
    ('Tomas', 'Marketing', 'Hungary', 4500),
    ('Nick', 'Data Engineer', 'Ireland', 5000),
    ('Wade', 'Data Engineer', 'Scotland', 5500)
]

columns = ['name', 'job', 'country', 'salary']

df = spark.createDataFrame(data = data, schema = columns)

df.printSchema()
df.show(truncate=False)

#### Sorting the data frame by a single column

In [0]:
df.sort(['salary'], ascending = [True]).show(truncate=False)

In [0]:
df.sort(['salary'], ascending = [1]).show(truncate=False)

#### Sorting the data frame by more than one column

In [0]:
df.sort(col('Job').desc(),col('Salary').asc()).show()

In [0]:
df.sort(['Job','Salary'], ascending=[0,1]).show()

#### The end of the notebook