# Example Objective

The objective of this example code is to demonstrate how you can use Spark, Iceberg, and Delta together:

- Store master data into Spark table default to parquet format
- Store sales transactions data in the Delta table 
- Store return transactions data in the Iceberg table

And finally, join these three tables to query some business insight. 

# Configuration

- Add Delta Lake Connector Libraries
- Add Iceberg Connector runtime jar
- Add Iceberg and Delta Lake SQL extensions

In this example, we are using the Hadoop warehouse as a catalog for the Iceberg, and the catalog name is `demo_iceberg`. The base path for the warehouse directory is configured by `spark.sql.catalog.iceberg.warehouse`

The Delta catalog adds support for Delta tables to Spark’s built-in catalog (using spark.sql.catalog.spark_catalog), and delegates to the built-in catalog for non-Delta tables. Spark default warehouse location is your primary container `/apps/spark/warehouse`. Delta data also will be stored at Spark's default warehouse location.

In [None]:
%%configure -f
{ "conf": {"spark.jars.packages": "org.apache.iceberg:iceberg-spark-runtime-3.1_2.12:1.1.0,io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8",
           "spark.sql.extensions":"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension",
           "spark.sql.catalog.spark_catalog":"org.apache.spark.sql.delta.catalog.DeltaCatalog",
           "spark.sql.catalog.demo_iceberg":"org.apache.iceberg.spark.SparkCatalog",
           "spark.sql.catalog.demo_iceberg.type":"hadoop",
           "spark.sql.catalog.demo_iceberg.warehouse":"/iceberg/warehouse"
          }
}

## Model Objects
- Product - Product Master information - Store this information in Spark Table
- Product sales transactions- Product sales quantity and transaction date - Store this information in Delta Table
- Product return transactions - Product return quantity and transaction date - Store this information in Iceberg Table

In [None]:
// product model
case class Product(id:Int, name: String, price:Float)
//product sales
case class ProductSales(productId:Int, soldQty:Int, saleDate:String)
//product return model
case class ProductReturn(productId:Int, returnQty:Int, returnDate:String)

# Generate Master Data
- Customer
- Product

In [None]:
import net.andreinc.mockneat.MockNeat
import net.andreinc.mockneat.abstraction.MockUnit
import net.andreinc.mockneat.types.enums.RandomType
import java.text.DecimalFormat
import java.time.LocalDate

//configure base on your need
// this program will run on driver side limit by driver memory
val dateStart = LocalDate.of(2014, 1, 1)
val dateEnd = LocalDate.of(2015, 1, 1)
val numProducts=10
val numOfSalesTx = 10000
val numOfReturnTx = 1000
// constants ends

val df = new DecimalFormat("0.00");
val mockNeat = MockNeat.threadLocal()

val productData = (1 to numProducts).map(i=>{
    Product(i,s"Product${i}",df.format(mockNeat.floats().range(10.0f, 20.0f).get()).toFloat)
})

## Generate Transaction Data
- Product sales transactions
- Product return transactions

In [None]:
import scala.collection.JavaConverters._

//index MockUnit for Product
val prodIndex = mockNeat.ints().range(0,numProducts-1)

val productSalesData = (1 to numOfSalesTx).map(i=>{
    ProductSales(productData(prodIndex.get()).id, mockNeat.ints().range(1,5).get(),
                 mockNeat.localDates.between(dateStart, dateEnd).mapToString().get())
})

val productReturnData = (1 to numOfReturnTx).map(i=>{
    ProductReturn(productData(prodIndex.get()).id, mockNeat.ints().range(1,5).get(),
                  mockNeat.localDates.between(dateStart, dateEnd).mapToString().get())
})


# Save Data
- Product - Store this information in Spark Table
- Product Sales Transactions -Store this information in Delta Table
- Product Return Transactions -Store this information in Iceberg Table

In [None]:
sc.parallelize(productData).toDF.write.mode("overwrite").saveAsTable("product")
sc.parallelize(productSalesData).toDF.write.format("delta").mode("overwrite").saveAsTable("salestx")
sc.parallelize(productReturnData).toDF.write.format("iceberg").mode("overwrite").saveAsTable("demo_iceberg.returntx")

## Iceberg Table Metadata Log Entries 

In [None]:
%%sql
select * from demo_iceberg.returntx.metadata_log_entries

## Iceberg Table History and Application ID

In [None]:
%%sql
select
    h.made_current_at,
    s.operation,
    h.snapshot_id,
    h.is_current_ancestor,
    s.summary['spark.app.id']
from demo_iceberg.returntx.history h
join demo_iceberg.returntx.snapshots s
  on h.snapshot_id = s.snapshot_id
order by made_current_at

## Iceberg Data Files

In [None]:
%%sql
SELECT * FROM demo_iceberg.returntx.files

# Read Delta Sales and Iceberg Return Transactions
- Read sales transactions from Delta Table
- Read return transactions from Iceberg Table
- Read product master from Spark Parquet

In [None]:
val salesTxDF = spark.sql("SELECT * FROM salestx")
val returnTxDF = spark.sql("SELECT * FROM demo_iceberg.returntx")
val productDF = spark.sql("SELECT * FROM product")
// join sales and return Tx DataFarme

val salesReturnTxDF = salesTxDF.join(returnTxDF,salesTxDF("saleDate")===returnTxDF("returnDate") &&  salesTxDF("productId")===returnTxDF("productId"),"fullouter").
    withColumn("date",when(col("saleDate")===null,col("returnDate")).otherwise(col("saleDate"))).
        select("salestx.productId","date","soldQty","returnQty").
            na.fill(0).
            withColumn("absSoldQty",col("soldQty")-col("returnQty"))
// join with product
val producSaleReturnTx = salesReturnTxDF.
                            join(productDF,salesReturnTxDF("productId")===productDF("id"),"inner").
                                withColumn("salesDollar",col("absSoldQty")*col("price"))
producSaleReturnTx.registerTempTable("prodsalereturn")

## Summary By Product Sold Qty

In [None]:
%%sql
SELECT name, Sum(absSoldQty) FROM prodsalereturn GROUP By name

## Summary By Dollar Sales By Date

In [None]:
%%sql
SELECT date, Sum(salesDollar) FROM prodsalereturn GROUP By date