## Trimming Characters from Strings
Let us go through how to trim unwanted characters using Spark Functions.

* We typically use trimming to remove unnecessary characters from fixed length records.
* Fixed length records are extensively used in Mainframes and we might have to process it using Spark.
* As part of processing we might want to remove leading or trailing characters such as 0 in case of numeric types and space or some standard character in case of alphanumeric types.
* As of now Spark trim functions take the column as argument and remove leading or trailing spaces. However, we can use `expr` or `selectExpr` to use Spark SQL based trim functions to remove leading or trailing spaces or any other such characters.
  * Trim spaces towards left - `ltrim`
  * Trim spaces towards right - `rtrim`
  * Trim spaces on both sides - `trim`

### Tasks - Trimming Strings

Let us understand how to use trim functions to remove spaces on left or right or both.
* Create a Dataframe with one column and one record.
* Apply trim functions to trim spaces.

In [1]:
l = [("   Hello.    ",) ]

In [2]:
df = spark.createDataFrame(l).toDF("dummy")

In [3]:
df.show()

[Stage 0:>                                                          (0 + 1) / 1]

+-------------+
|        dummy|
+-------------+
|   Hello.    |
+-------------+



                                                                                

In [4]:
from pyspark.sql.functions import col, ltrim, rtrim, trim

In [5]:
df.withColumn("ltrim", ltrim(col("dummy"))). \
  withColumn("rtrim", rtrim(col("dummy"))). \
  withColumn("trim", trim(col("dummy"))). \
  show()

+-------------+----------+---------+------+
|        dummy|     ltrim|    rtrim|  trim|
+-------------+----------+---------+------+
|   Hello.    |Hello.    |   Hello.|Hello.|
+-------------+----------+---------+------+



In [6]:
from pyspark.sql.functions import expr

In [7]:
spark.sql('DESCRIBE FUNCTION rtrim').show(truncate=False)

+-----------------------------------------------------------------------------+
|function_desc                                                                |
+-----------------------------------------------------------------------------+
|Function: rtrim                                                              |
|Class: org.apache.spark.sql.catalyst.expressions.StringTrimRight             |
|Usage: 
    rtrim(str) - Removes the trailing space characters from `str`.
  |
+-----------------------------------------------------------------------------+



In [8]:
# if we do not specify trimStr, it will be defaulted to space
df.withColumn("ltrim", expr("ltrim(dummy)")). \
  withColumn("rtrim", expr("rtrim('.', rtrim(dummy))")). \
  withColumn("trim", trim(col("dummy"))). \
  show()

+-------------+----------+--------+------+
|        dummy|     ltrim|   rtrim|  trim|
+-------------+----------+--------+------+
|   Hello.    |Hello.    |   Hello|Hello.|
+-------------+----------+--------+------+



22/02/26 03:15:10 WARN Analyzer$ResolveFunctions: Two-parameter TRIM/LTRIM/RTRIM function signatures are deprecated. Use SQL syntax `TRIM((BOTH | LEADING | TRAILING)? trimStr FROM str)` instead.


In [9]:
spark.sql('DESCRIBE FUNCTION trim').show(truncate=False)

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|function_desc                                                                                                                                                                                                                                                                             

In [10]:
df.withColumn("ltrim", expr("trim(LEADING ' ' FROM dummy)")). \
  withColumn("rtrim", expr("trim(TRAILING '.' FROM rtrim(dummy))")). \
  withColumn("trim", expr("trim(BOTH ' ' FROM dummy)")). \
  show()

+-------------+----------+--------+------+
|        dummy|     ltrim|   rtrim|  trim|
+-------------+----------+--------+------+
|   Hello.    |Hello.    |   Hello|Hello.|
+-------------+----------+--------+------+

