
# ![Delta Lake Tiny Logo](https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png)  2/ SILVER table: store the content of our events in a structured table

<img style="float:right; height: 230px; margin: 0px 30px 0px 30px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/product/streaming-sessionization/sessionization_silver.png">

**Scala version:** This notebook implement the same logic as [the python]($../02-Delta-session-SILVER), but using Scala.

We can create a new silver table containing all our data.

This will allow to store all our data in a proper table, with the content of the json stored in a columnar format. 

Should our message content change, we'll be able to adapt the transformation of this job to always allow SQL queries over this SILVER table.

If we realized our logic was flawed from the begining, it'll also be easy to start a new cluster to re-process the entire table with a better transformation!

<!-- tracking, please Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-engineering&org_id=4214571749987147&notebook=%2Fscala%2F02-Delta-session-SILVER-scala&demo_name=streaming-sessionization&event=VIEW&path=%2F_dbdemos%2Fdata-engineering%2Fstreaming-sessionization%2Fscala%2F02-Delta-session-SILVER-scala&version=1">

In [0]:
%run ../_resources/00-setup-scala $reset_all_data=false

In [0]:
waitForTable("events_raw") // Wait until the previous table is created to avoid error if all notebooks are started at once

// For the sake of the example we'll get the schema from a json row. In a real deployment we could query a schema registry.
val rowExample = """{"user_id": "5ee7ba5f-77b2-47e4-8061-dd89f19626f3", "platform": "other", "event_id": "03c3d410-f01f-4f51-8ee0-7fab9be96855", "event_date": 1669301257, "action": "view", "uri": "https://databricks.com/home.htm"}"""
val json_schema = schema_of_json(rowExample)

val stream = spark
  .readStream
  .table("events_raw")
  // === Our transformation, easy to adapt if our logic changes ===
  .withColumn("json", from_json($"value", json_schema))
  .select("json.*")
  // Drop null events
  .where("event_id is not null and user_id is not null and event_date is not null")
  .withColumn("event_datetime", to_timestamp(from_unixtime($"event_date")))

display(stream)

In [0]:
stream
  .withWatermark("event_datetime", "1 hours")
  .dropDuplicates("event_id")
  .writeStream
  .trigger(Trigger.ProcessingTime("20 seconds"))
  .option("checkpointLocation", s"$volumeFolder/checkpoints/silver")
  .option("mergeSchema", "true")
  .table("events")

waitForTable("events")

In [0]:
%sql
SELECT * FROM events;

Let's make sure we don't have any duplicate nor null event (they've been filtered out)

In [0]:
%sql
SELECT count(*) event_count, event_id FROM events
  GROUP BY event_id
    HAVING event_count > 1 or event_id is null
  ORDER BY event_id;

#### Let's display a real-time view of our traffic using our stream, grouped by platform, for the last minute

In [0]:
spark.readStream.table("events").createOrReplaceTempView("events_stream")

In [0]:
%sql
-- Visualization: bar plot with X=start Y=count (SUM, group by platform)
WITH event_monitoring AS
  (SELECT WINDOW(event_datetime, "10 seconds") w, count(*) c, platform FROM events_stream WHERE CAST(event_datetime as INT) > CAST(CURRENT_TIMESTAMP() as INT)-120 GROUP BY w, platform)
SELECT w.*, c, platform FROM event_monitoring 

Let's find our TOP 10 more active pages, updated in real time with a streaming query:

In [0]:
%sql
-- Visualization: pie with X=URL Y=count (SUM)
select count(*) as count, uri from events_stream group by uri order by count desc limit 10;

In [0]:
stopAllStreams(sleepTime=120)

### We now have our silver table ready to be used!

Let's compute our sessions based on this table with  **[a Gold Table]($./03-Delta-session-GOLD-scala)**


**[Go Back]($./01-Delta-session-BRONZE-scala)**