# Flattening JSON Files using DuckDB

This [article](https://duckdb.org/2023/03/03/json.html) inspired me to try out DuckDB's JSON handling.  Specifically, I wanted to explore if I can use DuckDB to "flatten" JSON data into a tabular format.  That artical itself did not help me in that regard, but this feature [request](https://github.com/duckdb/duckdb/issues/2922) led me to discover the unnest() function, which is similar to Snowflake's lateral flatten() function.

In [1]:
import duckdb

In [2]:
duckdb.load_extension('json')
duckdb.install_extension('json')

#### Redacted JSON sample file:

```
{
	"digitalInsuranceBuildData": {
		"vin": "1FBZX2ZMREDACTED",
		"buildDate": "2015-12-20",
		"year": 2016,
		"make": "FORD",
		"model": "TRANSIT",
		"vehicleEngineerDescription": "FORD  TRANSIT T350 WAGON LOW ROOF LONG WB 60/40 CARGO DR",
		"plantName": "KANSAS CITY ASSY",
		"trimLevel": "XLT",
		"dealer": {
			"countryCode": "USA"
		},
		"configuration": {
			"siriusXm": {
				"capable": true
			},
			"sync": {
				"capable": true
			}
		},
		"invoice": {
			"currencyCode": "USD",
			"orders": [
				{
					"orderCode": "X2Z",
					"orderType": "BODY"
				}
			],
			"price": {
				"manufacturerSuggestedRetailPrice": 39180.0
			}
		},
		"features": [
			{
				"featureWersCode": "A22AA",
				"engineerDescription": "LESS D PILLAR ASSIST HANDLE",
				"familyEngineerDescription": "D PILLAR ASSIST HANDLE"
			},
            {
				"featureWersCode": "EN-RM",
				"engineerDescription": "3.7L 4V-DAMB PFI V6 NA GAS",
				"familyEngineerDescription": "ENGINE-CAR/LT TRK",
				"featureGroupType": "ENGINE"
			},
			{
				"featureWersCode": "A4MAA",
				"engineerDescription": "LESS DIESEL PARTICULATE FILTER",
				"familyEngineerDescription": "DIESEL PARTICULATE FILTERS"
			},
            {
				"featureWersCode": "TR-C3",
				"engineerDescription": "6 SPD AUTO TRANS (6R80)",
				"familyEngineerDescription": "TRANSMISSION-CAR/LT TRK",
				"featureGroupType": "TRANSMISSION"
			},
			{
				"featureWersCode": "YZKAB",
				"engineerDescription": "FLEET",
				"familyEngineerDescription": "FLEET"
			},
            {
				"featureWersCode": "DR--B",
				"engineerDescription": "2 WHL L/H REAR DRIVE",
				"familyEngineerDescription": "DRIVE-CAR/LT TRK",
				"featureGroupType": "DRIVETRAIN"
			}
		]
	}
}
```

Using the above JSON data as reference, we can "flatten" the JSON data using dot/. notation to reference attributes and also using the [unnest()](https://duckdb.org/docs/sql/query_syntax/unnest) function:

In [12]:
duckdb.sql(
"""
SELECT
    distinct substring(digitalInsuranceBuildData.vin,1,10) as vin10
    -- digitalInsuranceBuildData.vin
    , digitalInsuranceBuildData.year
    , digitalInsuranceBuildData.make
    , digitalInsuranceBuildData.model
    , digitalInsuranceBuildData.trimLevel
    , digitalInsuranceBuildData.vehicleEngineerDescription
    , digitalInsuranceBuildData.plantName
    , digitalInsuranceBuildData.invoice.currencyCode
    , digitalInsuranceBuildData.invoice.price.manufacturerSuggestedRetailPrice
    -- Use unnest when you have an attribute ("features") that maps to an array of JSON and then need to unnest it
    , unnest(digitalInsuranceBuildData.features).featureWersCode as featureWersCode
    , unnest(digitalInsuranceBuildData.features).engineerDescription as engineerDescription
    , unnest(digitalInsuranceBuildData.features).familyEngineerDescription as familyEngineerDescription
    , unnest(digitalInsuranceBuildData.features).featureGroupType as featureGroupType
from
    'json/*.json'
order by
    digitalInsuranceBuildData.vin
"""
).df()

Unnamed: 0,vin10,year,make,model,trimLevel,vehicleEngineerDescription,plantName,currencyCode,manufacturerSuggestedRetailPrice,featureWersCode,engineerDescription,familyEngineerDescription,featureGroupType
0,1FADP3F21E,2014,FORD,FOCUS,SE,FORD FOCUS ZX4 4-DR SEDAN SE,MICHIGAN ASSEMBLY PLANT,USD,22480.0,CBFAA,LESS DR ENTRY REMOTE CNTR UNIT,DOOR ENTRY REMOTE CNTR UNIT,
1,1FADP3F21E,2014,FORD,FOCUS,SE,FORD FOCUS ZX4 4-DR SEDAN SE,MICHIGAN ASSEMBLY PLANT,USD,22480.0,YKTAA,LESS MK CB-DVD PLAY/VID SCR,MK CB-DVD PLAYER/VIDEO SCREEN,
2,1FADP3F21E,2014,FORD,FOCUS,SE,FORD FOCUS ZX4 4-DR SEDAN SE,MICHIGAN ASSEMBLY PLANT,USD,22480.0,CLFB5,FRNT BUMPER-PAINTED BODY COLOR,BUMPER-FRONT,
3,1FADP3F21E,2014,FORD,FOCUS,SE,FORD FOCUS ZX4 4-DR SEDAN SE,MICHIGAN ASSEMBLY PLANT,USD,22480.0,AD2AB,FEATURE CONTENT GROUP 1,FEATURE CONTENT GROUP,
4,1FADP3F21E,2014,FORD,FOCUS,SE,FORD FOCUS ZX4 4-DR SEDAN SE,MICHIGAN ASSEMBLY PLANT,USD,22480.0,GTDAB,SPEED CONTROL,SPEED CONTROL,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9680,NM0AE8F70J,2018,FORD,TRANSIT CONNECT,TITANIUM,FORD TRANSIT CONNECT WAGON XLT SWB,VALENCIA ASSY,USD,26930.0,J3FAD,12 AUX POWER POINT-REAR,POWER POINT PLUG - REAR,
9681,NM0AE8F70J,2018,FORD,TRANSIT CONNECT,TITANIUM,FORD TRANSIT CONNECT WAGON XLT SWB,VALENCIA ASSY,USD,26930.0,JDJAA,LESS REAR FOG LAMPS,LAMPS-REAR FOG,
9682,NM0AE8F70J,2018,FORD,TRANSIT CONNECT,TITANIUM,FORD TRANSIT CONNECT WAGON XLT SWB,VALENCIA ASSY,USD,26930.0,JCFAC,HEADER MNT COURTESY/2 MAPLIGHT,HEADER LIGHTS-INTERIOR MAP,
9683,NM0AE8F70J,2018,FORD,TRANSIT CONNECT,TITANIUM,FORD TRANSIT CONNECT WAGON XLT SWB,VALENCIA ASSY,USD,26930.0,CAAAC,EXT DOOR HANDLE-BODY COLOR,DOOR HANDLE-EXTERIOR,
