## Configuration

Add Delta Lake Package and Configure spark.sql.extensions and spark.sql.catalog.spark_catalog

In [None]:
%%configure -f
{ "conf": {"spark.jars.packages": "io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8",
           "spark.sql.extensions":"io.delta.sql.DeltaSparkSessionExtension",
           "spark.sql.catalog.spark_catalog":"org.apache.spark.sql.delta.catalog.DeltaCatalog"
          }
}

## Generate MockData using MockNeat
- Use Mockneat for Random Data Generation
- Generate Customer Data using Mocknet Library
- Configuration:
   - numberOfRecords - number of records to generate

In [None]:
import net.andreinc.mockneat.MockNeat
import net.andreinc.mockneat.abstraction.MockUnit
import net.andreinc.mockneat.types.enums.RandomType
import java.time.LocalDate
import scala.reflect.ClassTag

val mockNeat = MockNeat.threadLocal()

/**
* Customer Business Model
**/
case class Customer(var customerId: Int, var customerName: String, var firstName: String,
                    var lastName: String, var userName: String, var registrationDate: String)
//configure base on your need
// this program will run on driver side limit by driver memory
val DateStart = LocalDate.of(2014, 1, 1)
val DateEnd = LocalDate.of(2016, 1, 1)
val numberOfRecords = 10

val customerData = (1 to numberOfRecords).map(i=>{
    Customer(i,
             mockNeat.names().full().get(),
             mockNeat.names().first().get(),
             mockNeat.names().last().get(),
             mockNeat.users().get(),
             mockNeat.localDates.between(DateStart, DateEnd).mapToString().get())
})

## Change the Path where you want to save your data
- Configuration
    - adsl2Path - path where we would like to save delta lake data, It can be a full path or relative path. [More details](https://learn.microsoft.com/en-us/azure/hdinsight/overview-azure-storage#hdinsight-storage-architecture)

In [None]:
// define Delta Lake Path
val adsl2Path = "/tmp/customerdata2"

//create data frame
val df = sc.parallelize(customerData).toDF
df.write.mode("overwrite").format("delta").save(adsl2Path)
// print schema of the dataframe
df.printSchema