## PySpark JSON Functions

PySpark JSON functions are used to query or extract the elements from JSON string of DataFrame column by path, convert it to struct, map type e.t.c.

`from_json()` – Converts JSON string into Struct type or Map type.  
`to_json()` – Converts MapType or Struct type to JSON string.  
`json_tuple()` – Extract the Data from JSON and create them as a new columns.  
`get_json_object()` – Extracts JSON element from a JSON string based on json path specified.  
`schema_of_json()` – Create schema string from JSON string.

In [0]:
dbutils.library.restartPython() # Removes Python state, but some libraries might not work without calling this command.dbutils.restartPython()

#### Load libraries

In [0]:
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import IntegerType, DateType, StringType, StructType, StructField, ArrayType, MapType, DoubleType, MapType
from pyspark.sql.functions import lit, col, expr, when, sum, avg, max, min, mean, count, from_json, to_json, json_tuple, get_json_object, schema_of_json

#### Create Spark session

In [0]:
spark = SparkSession.builder.appName('PySpark JSON Functions').getOrCreate()

In [0]:
jsonString = """{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}"""
df = spark.createDataFrame([(1, jsonString)],["id","value"])
df.show(truncate=False)

#### from_json()

Converts JSON string into Struct type or Map type

In [0]:
df2=df.withColumn('value',from_json(df.value,MapType(StringType(),StringType())))
df2.printSchema()
df2.show(truncate=False)

#### to_json()

Converts DataFrame columns MapType or Struct type to JSON string

In [0]:
df3 = df2.withColumn('value',to_json(col('value')))
df3.printSchema()
df3.show(truncate=False)

#### json_tuple()

Is used the query or extract the elements from JSON column and create the result as a new columns.

In [0]:
df4 = (
  df.select(
    col('id'),
    json_tuple(col('value'),'Zipcode','ZipCodeType','City','State')
  )
  .toDF('id','Zipcode','ZipCodeType','City','State')
)
df4.printSchema()
df4.show(truncate=False)

#### get_json_object()

Is used to extract the JSON string based on path from the JSON column.

In [0]:
df.select(
  col('id'),
  get_json_object(col('value'),'$.ZipCodeType').alias('ZipCodeType')
).show(truncate=False)

#### schema_of_json() 

Is used to create schema string from JSON string column.

In [0]:
# spaek.range(start,end,step,numSlices) - Creates a new RDD of int containing elements from start to end (exclusive), 
# increased by step every element. Can be called the same way as python’s built-in range() function. 
# If called with a single argument, the argument is interpreted as end, and start is set to 0.

schemaStr = spark.range(1).select(schema_of_json(lit(jsonString))).collect()[0][0]
print(schemaStr)

#### The end of the notebook