### Add leading zero to a column using PySpark API's.

#### Creating Datframe from sample data

In [0]:
fruits=[["Apples",20],["Banana",10],["Mangos",25],["Orange",10],["Papaya",2]]
df=spark.createDataFrame(fruits,["Fruits","quantity"])
df.show()

+------+--------+
|Fruits|quantity|
+------+--------+
|Apples|      20|
|Banana|      10|
|Mangos|      25|
|Orange|      10|
|Papaya|       2|
+------+--------+



### Using concat and substring API

In [0]:
from pyspark.sql import functions as f

concat_col = f.concat(f.lit('00'), f.col('quantity'))
substring_col = f.substring(concat_col, -3, 3)

(
  df.withColumn("with_concat", concat_col)
  .withColumn("substring", substring_col)
).show()

+------+--------+-----------+---------+
|Fruits|quantity|with_concat|substring|
+------+--------+-----------+---------+
|Apples|      20|       0020|      020|
|Banana|      10|       0010|      010|
|Mangos|      25|       0025|      025|
|Orange|      10|       0010|      010|
|Papaya|       2|        002|      002|
+------+--------+-----------+---------+



Here we used 'concat' and 'substring' PySpark API's to achieve the output that we need. 
- Apply concat function on the column 
- Apply substring to get the final result 

Reference:
 - https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.functions.concat.html
 - https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.functions.substring.html

### Using format_string API

In [0]:
format_col = f.format_string('%03d',f.col('quantity'))
df.withColumn("new_quantity", format_col).show()

+------+--------+------------+
|Fruits|quantity|new_quantity|
+------+--------+------------+
|Apples|      20|         020|
|Banana|      10|         010|
|Mangos|      25|         025|
|Orange|      10|         010|
|Papaya|       2|         002|
+------+--------+------------+



'format_string' API formats the given value in the printf-style and returns the resultant value as a string output.

Reference:
  - https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.functions.format_string.html

### Using lpad API

In [0]:
lpad_col = f.lpad(f.col('quantity'),3,"0")
df.withColumn("new_quantity", lpad_col).show()

+------+--------+------------+
|Fruits|quantity|new_quantity|
+------+--------+------------+
|Apples|      20|         020|
|Banana|      10|         010|
|Mangos|      25|         025|
|Orange|      10|         010|
|Papaya|       2|         002|
+------+--------+------------+



Like in SQL, 'lpad' API simply pads a string column to the with of the specified length.

Reference:
  - https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.functions.lpad.html