##### 1) Single Line JSON file
- **Each JSON object** is written on a **single line**.
- **Single-line JSON** = **one JSON object per line**.
- Is **append-friendly**.

##### a) Basic way (default options)
- **Auto-infers schema**.
- Automatically **parses JSON**.

     # Method 01
     spark.read.json('/Volumes/@azureadb/pyspark/training/read_json/singleline01.json')

     # Method 02
     spark.read \
          .format("json") \
          .load("/Volumes/@azureadb/pyspark/training/read_json/singleline01.json")

     # Method 03
     spark.read \
          .format("json") \
          .option("multiLine", False) \
          .load("/Volumes/@azureadb/pyspark/training/read_json/singleline01.json")

In [0]:
# Read JSON file into dataframe
df_wo_head_infer = spark.read.json('/Volumes/@azureadb/pyspark/training/read_json/singleline01.json')
display(df_wo_head_infer)

City,Country,Decommisioned,Description,EstimatedPopulation,Latitude,LocationType,Longitude,ProductEast,ProductNorth,ProductNumber,ProductSouth,Region,RegionType,SalesRegion,State,TaxReturnsFiled,TotalSalary,ZipCodeType,Zipcode
Bengaluru,India,False,Main sales hub for South India,8500000,12.9716,Primary,77.5946,198.75,222.5,101,100,Bangalore Urban,Metro,South,KA,Yes,95000000,STANDARD,560001
New Delhi,India,False,Corporate and government sales region,16700000,28.6139,Primary,77.209,525.25,250.0,102,199,Delhi NCR,Metro,North,DL,Yes,120000000,STANDARD,110001
Mumbai,India,False,Financial capital sales zone,20400000,19.076,Primary,72.8777,980.0,780.0,103,170,Mumbai Metropolitan,Metro,West,MH,Yes,150000000,STANDARD,400001
Chennai,India,False,Main sales hub for South India,8500000,456.9716,Primary,77.5946,150.75,212.5,104,200,Chennai Urban,Metro,South,TN,Yes,75000000,STANDARD,560011
Kolkatta,India,False,Corporate and government sales region,36700670,431.6139,Primary,77.209,125.25,355.0,105,120,Kolkatta NCR,Metro,North,WB,Yes,120000000,STANDARD,110011
Nasik,India,False,Financial capital sales zone,98400000,190.76,Primary,272.8777,140.0,180.0,106,90,Nasik Metropolitan,Metro,West,MH,Yes,1230000000,STANDARD,400021
Mysore,India,False,Main sales hub for South India,8500000,12.9716,Primary,991.5946,190.75,897.5,107,100,Mysore Urban,Metro,South,KA,Yes,95000000,STANDARD,560031
Vellore,India,False,Corporate and government sales region,56700000,28.6139,Primary,277.209,175.25,250.0,108,220,Vellore NCR,Metro,North,TN,Yes,120000000,STANDARD,110041
Vijag,India,False,IT capital sales zone,25400000,19.076,Primary,572.8777,190.0,489.0,109,95,Vijag Metropolitan,Metro,West,AP,Yes,180000000,STANDARD,400051
Amaravathi,India,False,Self capital sales zone,20400000,19.076,Primary,672.8777,124.0,280.0,110,78,Amaravathi Metropolitan,Metro,West,AP,Yes,190000000,STANDARD,400061


- **Do NOT** use **multiLine=true** for **single-line JSON**.
- **NOT** needed for **single-line JSON**.
        
        .option("multiLine", "true")

##### b) Explicit schema (Best practice)

- **Faster** than `inference`.
- **Prevents datatype / schema inference issues**.

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DoubleType, BooleanType

# Define custom schema
schema = StructType([
      StructField("ProductNumber", IntegerType(), True),
      StructField("Zipcode", IntegerType(), True),
      StructField("ZipCodeType", StringType(), True),
      StructField("City", StringType(), True),
      StructField("State", StringType(), True),
      StructField("LocationType", StringType(), True),
      StructField("Latitude", DoubleType(), True),
      StructField("Longitude", DoubleType(), True),
      StructField("ProductSouth", IntegerType(), True),
      StructField("ProductNorth", DoubleType(), True),
      StructField("ProductEast", DoubleType(), True),
      StructField("SalesRegion", StringType(), True),
      StructField("Country", StringType(), True),
      StructField("RegionType", StringType(), True),
      StructField("Region", StringType(), True),
      StructField("Decommisioned", BooleanType(), True),
      StructField("TaxReturnsFiled", StringType(), True),
      StructField("EstimatedPopulation", IntegerType(), True),
      StructField("TotalSalary", IntegerType(), True),
      StructField("Description", StringType(), True)
  ])

In [0]:
df_with_schema = spark.read.schema(schema).json("/Volumes/@azureadb/pyspark/training/read_json/singleline01.json")
display(df_with_schema)

ProductNumber,Zipcode,ZipCodeType,City,State,LocationType,Latitude,Longitude,ProductSouth,ProductNorth,ProductEast,SalesRegion,Country,RegionType,Region,Decommisioned,TaxReturnsFiled,EstimatedPopulation,TotalSalary,Description
101,560001,STANDARD,Bengaluru,KA,Primary,12.9716,77.5946,100,222.5,198.75,South,India,Metro,Bangalore Urban,False,Yes,8500000,95000000,Main sales hub for South India
102,110001,STANDARD,New Delhi,DL,Primary,28.6139,77.209,199,250.0,525.25,North,India,Metro,Delhi NCR,False,Yes,16700000,120000000,Corporate and government sales region
103,400001,STANDARD,Mumbai,MH,Primary,19.076,72.8777,170,780.0,980.0,West,India,Metro,Mumbai Metropolitan,False,Yes,20400000,150000000,Financial capital sales zone
104,560011,STANDARD,Chennai,TN,Primary,456.9716,77.5946,200,212.5,150.75,South,India,Metro,Chennai Urban,False,Yes,8500000,75000000,Main sales hub for South India
105,110011,STANDARD,Kolkatta,WB,Primary,431.6139,77.209,120,355.0,125.25,North,India,Metro,Kolkatta NCR,False,Yes,36700670,120000000,Corporate and government sales region
106,400021,STANDARD,Nasik,MH,Primary,190.76,272.8777,90,180.0,140.0,West,India,Metro,Nasik Metropolitan,False,Yes,98400000,1230000000,Financial capital sales zone
107,560031,STANDARD,Mysore,KA,Primary,12.9716,991.5946,100,897.5,190.75,South,India,Metro,Mysore Urban,False,Yes,8500000,95000000,Main sales hub for South India
108,110041,STANDARD,Vellore,TN,Primary,28.6139,277.209,220,250.0,175.25,North,India,Metro,Vellore NCR,False,Yes,56700000,120000000,Corporate and government sales region
109,400051,STANDARD,Vijag,AP,Primary,19.076,572.8777,95,489.0,190.0,West,India,Metro,Vijag Metropolitan,False,Yes,25400000,180000000,IT capital sales zone
110,400061,STANDARD,Amaravathi,AP,Primary,19.076,672.8777,78,280.0,124.0,West,India,Metro,Amaravathi Metropolitan,False,Yes,20400000,190000000,Self capital sales zone


##### c) `Single-line JSON` with `NULL` values

In [0]:
df_with_schema_null = spark.read.schema(schema).json("/Volumes/@azureadb/pyspark/training/json/read_json/singleline_null.json")
display(df_with_schema_null)

ProductNumber,Zipcode,ZipCodeType,City,State,LocationType,Latitude,Longitude,ProductSouth,ProductNorth,ProductEast,SalesRegion,Country,RegionType,Region,Decommisioned,TaxReturnsFiled,EstimatedPopulation,TotalSalary,Description
101,,STANDARD,,KA,Primary,12.9716,77.5946,,222.5,198.75,South,India,Metro,Bangalore Urban,False,Yes,8500000.0,95000000.0,Main sales hub for South India
102,110001.0,,New Delhi,DL,Primary,,77.209,199.0,250.0,525.25,North,India,,Delhi NCR,False,Yes,16700000.0,120000000.0,Corporate and government sales region
103,400001.0,STANDARD,Mumbai,,Primary,19.076,72.8777,170.0,780.0,980.0,West,,Metro,Mumbai Metropolitan,False,Yes,20400000.0,150000000.0,Financial capital sales zone
104,560011.0,STANDARD,Chennai,TN,Primary,,77.5946,200.0,212.5,,South,India,Metro,Chennai Urban,False,Yes,8500000.0,75000000.0,Main sales hub for South India
105,110011.0,STANDARD,Kolkatta,WB,Primary,431.6139,77.209,,355.0,125.25,,India,Metro,Kolkatta NCR,False,Yes,36700670.0,120000000.0,Corporate and government sales region
106,,STANDARD,Nasik,MH,Primary,,272.8777,90.0,180.0,140.0,West,India,,Nasik Metropolitan,False,Yes,98400000.0,1230000000.0,Financial capital sales zone
107,560031.0,STANDARD,,KA,Primary,12.9716,,100.0,897.5,190.75,,India,Metro,Mysore Urban,False,Yes,8500000.0,,Main sales hub for South India
108,110041.0,STANDARD,Vellore,TN,Primary,28.6139,277.209,220.0,250.0,175.25,North,India,Metro,Vellore NCR,False,Yes,,120000000.0,Corporate and government sales region
109,400051.0,STANDARD,Vijag,AP,Primary,19.076,572.8777,95.0,489.0,190.0,West,India,Metro,Vijag Metropolitan,False,Yes,25400000.0,180000000.0,IT capital sales zone
110,400061.0,,,AP,Primary,19.076,672.8777,78.0,280.0,124.0,West,India,Metro,Amaravathi Metropolitan,False,Yes,20400000.0,190000000.0,Self capital sales zone
