## Overview of Functions

Let us get an overview of different functions that are available to process data in columns.
* While Data Frame APIs work on the Data Frame, at times we might want to apply functions on column values.
* Functions to process column values are available under `pyspark.sql.functions`. These are typically used in select or withColumn on top of Data Frame.
* There are approximately 300 pre-defined functions available for us.
* Some of the important functions can be broadly categorized into String Manipulation, Date Manipulation, Numeric Functions and Aggregate Functions.
* String Manipulation Functions
  * Concatenating Strings - `concat`
  * Getting Length - `length`
  * Trimming Strings - `trim`,` rtrim`, `ltrim`
  * Padding Strings - `lpad`, `rpad`
  * Extracting Strings - `split`, `substring`
* Date Manipulation Functions
  * Date Arithmetic - `date_add`, `date_sub`, `datediff`, `add_months`
  * Date Extraction - `dayofmonth`, `month`, `year`
  * Get beginning period - `trunc`, `date_trunc`
* Numeric Functions - `abs`, `greatest`
* Aggregate Functions - `sum`, `min`, `max`

### Tasks
Let us perform a task to understand how functions are typically used.

* Project full name by concatenating first name and last name along with other fields excluding first name and last name.

In [None]:
import org.apache.spark.sql.functions.{lit, concat}

employeesDF.
    withColumn("full_name", concat($"first_name", lit(", "), $"last_name")).
    drop("first_name", "last_name").
    show

In [None]:
employeesDF.
    select($"employee_id",
           concat($"first_name", lit(", "), $"last_name").alias("full_name"),
           $"salary",
           $"nationality"
          ).
    show

In [None]:
employeesDF.
    selectExpr("employee_id",
               "concat(first_name, ', ', last_name) AS full_name",
               "salary",
               "nationality"
              ).
    show

**We will explore most of the functions as we get into the data processing at a later point in time**