## Configuration

Add Delta Lake Package and Configure spark.sql.extensions and spark.sql.catalog.spark_catalog

In [None]:
%%configure -f
{ "conf": {"spark.jars.packages": "io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8",
           "spark.sql.extensions":"io.delta.sql.DeltaSparkSessionExtension",
           "spark.sql.catalog.spark_catalog":"org.apache.spark.sql.delta.catalog.DeltaCatalog"
          }
}

## Generate MockData using MockNeat
- Use Mockneat for Random Data Generation
- Generate Customer Data using Mocknet Library
- Configuration:
   - numberOfRecords - number of records to generate

In [None]:
import net.andreinc.mockneat.MockNeat
import net.andreinc.mockneat.abstraction.MockUnit
import net.andreinc.mockneat.types.enums.RandomType
import java.time.LocalDate
import scala.reflect.ClassTag

val mockNeat = MockNeat.threadLocal()

/**
* Customer Business Model
**/
case class Customer(var customerId: Int, var customerName: String, var firstName: String,
                    var lastName: String, var userName: String, var registrationDate: String)
//configure base on your need
// this program will run on driver side limit by driver memory
val DateStart = LocalDate.of(2014, 1, 1)
val DateEnd = LocalDate.of(2016, 1, 1)
// number of mock data to be generated
val numberOfRecords1 = 10
val startIndex1 = 1
val endIndex1 = startIndex1 + numberOfRecords1

val customerData = (startIndex1 to endIndex1).map(i=>{
    Customer(i,
             mockNeat.names().full().get(),
             mockNeat.names().first().get(),
             mockNeat.names().last().get(),
             mockNeat.users().get(),
             mockNeat.localDates.between(DateStart, DateEnd).mapToString().get())
})

## Delta Lake Save Data and Print Schema
- Configuration
    - adsl2Path - path where we would like to save delta lake data, It can be a full path or relative path. [More details](https://learn.microsoft.com/en-us/azure/hdinsight/overview-azure-storage#hdinsight-storage-architecture)

In [None]:
// define Delta Lake Path
val adsl2Path = "/tmp/customerdata3"

//create data frame
val df = sc.parallelize(customerData).toDF
df.write.mode("append").format("delta").save(adsl2Path)
// print schema of the dataframe
df.printSchema

## Missing Schema Enforcement
Removed UserName from existing model and added a new column age

In [None]:
/**
* Customer new Business Model
* removed user name and added age column
**/
case class CustomerNew(var customerId: Int, var customerName: String, var firstName: String,
                    var lastName: String, var registrationDate: String, var age:Int)
//configure base on your need
// this program will run on driver side limit by driver memory
val DateStart = LocalDate.of(2014, 1, 1)
val DateEnd = LocalDate.of(2016, 1, 1)
// number of mock data to be generated
val newStartIndex2 = endIndex1+1
val numberOfRecords2 = 10
val newendIndex2 = newStartIndex2 + numberOfRecords2


val customerNewData = (newStartIndex2 to newendIndex2).map(i=>{
    CustomerNew(i,
             mockNeat.names().full().get(),
             mockNeat.names().first().get(),
             mockNeat.names().last().get(),
             mockNeat.localDates.between(DateStart, DateEnd).mapToString().get(),
             mockNeat.ints().range(10, 100).get())
})

// create datafarme from mock data
val df = sc.parallelize(customerNewData).toDF
//save it in delta format
df.write.option("mergeSchema", "true").mode("append").format("delta").save(adsl2Path)

## Read Delta Format data

In [None]:
// we can use Spark read or delta table
val df = spark.read.format("parquet").load(adsl2Path)
df.show(20)

// you can use delta table to read (auto refresh) data
import io.delta.tables._
val dt: io.delta.tables.DeltaTable = DeltaTable.forPath(adsl2Path)
dt.toDF.show(20)