# INGESTING SEMI-STRUCTURED JSON

> Ingesting semi-structured data like JSON enables efficient parsing and transformation of complex, nested input into structured Delta tables for advanced analytics in the Lakehouse.

![image.png](attachment:image.png)

> Ingesting semi-structured data, such as JSON files, is a common task for data engineers, especially when dealing with event data, logs, or data from APIs.

- Before we get into how to work with JSON in Databricks, let's first review the basic structure of a JSON file.
JSON data is made up of JSON objects, which are typically enclosed in curly brackets.
- Within the curly brackets, JSON objects contain key-value pairs. 
- Each key is always a string enclosed in quotation marks. Each key contains a value.

![image.png](attachment:image.png)

> The value of a key can be a string, number, boolean, array, or another JSON object, or even null.
- These objects can be flat, meaning all key-value pairs are at one level, or they can be nested, where values themselves are JSON objects. The complexity depends on how the data is structured in the source.
- Understanding this format is important because it affects how we parse and transform the data during ingestion.

![image.png](attachment:image.png)

- Now, when working with JSON data, it's common that after ingestion one or more columns in your table might contain JSON-formatted strings as values.
> So the question becomes, how do you work with columns that store JSON formatted strings?

- This is a common scenario when JSON isn't fully parsed during ingestion, or when JSON data is embedded within another field, like a log message or a nested structure. 

- We'll explore techniques to parse, extract, and manipulate those JSON strings using SQL or DataFrame operations, so you can flatten and or access the nested fields just like regular columns.

![image.png](attachment:image.png)


![image-2.png](attachment:image-2.png)

> Another method to work with a JSON-formatted string column is to convert the column to a STRUCT data type.

Here are a few key points to remember:

- You can parse JSON data into a STRUCT type by defining a schema.
- The STRUCT enforces the JSON schema, ensuring data types and structure are consistent.
- Querying a STRUCT is more efficient than working with a raw JSON-formatted STRING.