<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://blog.scholarnest.com/wp-content/uploads/2023/03/scholarnest-academy-scaled.jpg" alt="ScholarNest Academy" style="width: 1400px">
</div>

#####Cleanup previous runs

In [0]:
%run ../utils/cleanup

#####Setup

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS dev;
CREATE DATABASE IF NOT EXISTS dev.demo_db;
CREATE OR REPLACE TABLE dev.demo_db.people_tbl(
  id INT,
  firstName STRING,
  lastName STRING
) USING DELTA;

####Schema Validations

#####Statements
1. INSERT
2. OVERWRITE
3. MERGE
4. DataFrame Append

#####Validation Scenarions
1. Column matching approach
2. New Columns
3. Data Type Mismatch (Not allowed in any case)

#####Schema Validations Summary
1. INSERT &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&nbsp;- Column matching by position, New columns not allowed
2. OVERWRITE &emsp;&emsp;&emsp;&emsp;&ensp;- Column matching by position, New columns not allowed
3. MERGE INSERT &emsp;&emsp;&emsp;&nbsp;- Column matching by name, New columns ignored
4. DataFrame Append &emsp;&nbsp;- Column matching by name, New columns not allowed
5. Data Type Mismatch &emsp;- Not allowed in any case

#####1. INSERT - Column matching by position (matching names not mandatory)
This has a potential to corrupt your data

In [0]:
%sql
INSERT INTO dev.demo_db.people_tbl
SELECT id, fname, lname
FROM json.`/mnt/files/dataset_ch7/people.json`

#####2. INSERT - New columns not allowed

In [0]:
%sql
INSERT INTO dev.demo_db.people_tbl
SELECT id, fname, lname, dob
FROM json.`/mnt/files/dataset_ch7/people.json`

#####3. OVERWRITE - New columns not allowed

In [0]:
%sql
INSERT OVERWRITE dev.demo_db.people_tbl
SELECT id, fname, lname, dob
FROM json.`/mnt/files/dataset_ch7/people.json`

#####4. MERGE - Column matching by name (matching by position not allowed)

In [0]:
%sql
SELECT id, fname, lname FROM json.`/mnt/files/dataset_ch7/people_2.json`

In [0]:
%sql
MERGE INTO dev.demo_db.people_tbl tgt
USING (SELECT id, fname, lname FROM json.`/mnt/files/dataset_ch7/people_2.json`) src
ON tgt.id = src.id
WHEN NOT MATCHED THEN
    INSERT *  

#####5. MERGE - New columns silently ignored

In [0]:
%sql
SELECT id, fname firstName, lname lastName, dob FROM json.`/mnt/files/dataset_ch7/people_2.json`

In [0]:
%sql
MERGE INTO dev.demo_db.people_tbl tgt
USING (SELECT id, fname firstName, lname lastName, dob FROM json.`/mnt/files/dataset_ch7/people_2.json`) src
ON tgt.id = src.id
WHEN NOT MATCHED THEN
    INSERT *

In [0]:
%sql
select * from dev.demo_db.people_tbl

#####6. Dataframe append - Column matching by name (matching by position not allowed)

In [0]:
%python
people_schema = "id INT, fname STRING, lname STRING"
people_df =  spark.read.format("json").schema(people_schema).load("/mnt/files/dataset_ch7/people_2.json")
people_df.write.format("delta").mode("append").saveAsTable("dev.demo_db.people_tbl")

#####7. Dataframe append - New columns not allowed

In [0]:
%python
people_schema = "id INT, firstName STRING, lastName STRING, dob STRING"
people_df =  spark.read.format("json").schema(people_schema).load("/mnt/files/dataset_ch7/people_2.json")
people_df.write.format("delta").mode("append").saveAsTable("dev.demo_db.people_tbl")

&copy; 2021-2023 ScholarNest Technologies Pvt. Ltd. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
Databricks, Databricks Cloud and the Databricks logo are trademarks of the <a href="https://www.databricks.com/">Databricks Inc</a>.<br/>
<br/>
<a href="https://www.scholarnest.com/privacy/">Privacy Policy</a> | 
<a href="https://www.scholarnest.com/terms/">Terms of Use</a> | <a href="https://www.scholarnest.com/contact/">Contact Us</a>