# Defining the Test source

## Adding an abstraction layer for testability 

By defining the ingestion source in an external table, we can easily switch from the production source to a test one.

This lets you easily replace an ingestion from a Kafka server in production by a small csv file in your test. 

This notebook correspond to the TEST stream (the **blue** input source on the left)

<img width="1000px" src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/dlt-advanecd/DLT-advanced-unit-test-1.png"/>

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-engineering&org_id=1765512908890676&notebook=%2Fingestion_profile%2FDLT-ingest_test&demo_name=dlt-unit-test&event=VIEW&path=%2F_dbdemos%2Fdata-engineering%2Fdlt-unit-test%2Fingestion_profile%2FDLT-ingest_test&version=1">

## Test Source for customer dataset


This notebook will be used in test only. We'll generate a fixed test dataset and use this test data for our unit tests.

Note that we'll have to run the test pipeline with a full refresh to reconsume all the data.

In [0]:
import dlt
spark.conf.set("pipelines.incompatibleViewCheck.enabled", "false")
@dlt.view(comment="Raw user data - Test")
def raw_user_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "json")
      .option("cloudFiles.schemaHints", "id int")
      .load(f"/Volumes/main/dbdemos_dlt_unit_test/raw_data/test/users_json/*.json"))

In [0]:
@dlt.view(comment="Raw spend data - Test")
def raw_spend_data():
  return (spark.readStream.format("cloudFiles")
    .option("cloudFiles.format","csv")
    .option("cloudFiles.schemaHints", "id int, age int, annual_income float, spending_core float")
    .load(f"/Volumes/main/dbdemos_dlt_unit_test/raw_data/test/spend_csv/*.csv"))