# Unit Tests for Pipeline
This notebook contains the unit tests for the Pipeline

## Import dependencies for JUnit Output
We'll install the PyPI library and restart Python so we can use it. Alternatively, install this in the cluster.

In [2]:
dbutils.library.installPyPI('unittest-xml-reporting')
dbutils.library.restartPython()

## Imports

In [4]:
from pyspark.sql import Row

import unittest
import xmlrunner
import uuid
import io

## Call the Pipeline notebook

The next cell will execute the Pipeline notebook. This will create all of the classes and other definitions in the scope of this notebook so we can reference them later.

In [6]:
%run ./pipeline

## Create some mock classes
To promote well structured and testable code, we have mocked the DBUtils and a FileAccess class so we can abstract the implementation of environment specific details from the system under test. In this case, we're interested in testing the **Pipeline** class and not whether Databricks can read data from storage.

In [8]:
class MockDbUtilsSecrets:
  def __init__(self, secrets: dict) -> None:
    self.secrets = secrets
    
  def get(self, scope: str, key: str) -> str:
    return self.secrets.get(scope, {}).get(key, '')

class MockDbUtils:
  def __init__(self, secrets: dict) -> None:
    self.secrets = MockDbUtilsSecrets(secrets)
    
class MockFileAccess(FileAccess):
  def read(self, path: str) -> DataFrame:
    self.source = self.spark.createDataFrame([Row(id=1, name='Item 1', value=str(uuid.uuid4()))])
    return self.source.cache()
    
  def write(self, path: str, df: DataFrame) -> None:
    self.target = df
    

## Create our tests class
The following class contains our unit tests we wish to execute to prove the Pipeline class functions as expeceted.

In [10]:
class PipelineTests(unittest.TestCase):
  @classmethod
  def setUpClass(cls):
    dbutils = MockDbUtils({
    })
    files = MockFileAccess(spark, dbutils)
    pipeline = Pipeline(spark, files)
    pipeline.run()    
    cls.source = files.source.collect()
    cls.records = files.target.collect()
    
  def test_record_count(self):
    self.assertEqual(1, len(self.records), 'There should be a single row')
    
  def test_name_transformation(self):
    self.assertEqual('ITEM 1', self.records[0].name, 'The name should be uppercase')
    
  def test_input_and_output_value_matches(self):
    self.assertEqual(self.source[0].value, self.records[0].value, 'The value should match')
    
  def test_record_is_processed(self):
    self.assertEqual(1, self.records[0].processed, 'The record should be processed')


## Execute the Tests
The following cell creates a TestSuite and then executes that to create JUnit XML output. This output is written to the DBFS file system to facilitate CI/CD pipelines.

The process to include in CI/CD is:
* deploy the pipeline, driver and tests notebooks in a folder
* execute the tests notebook
* collect the JUnit XML output
* publish the test results

In [12]:
loader = unittest.TestLoader()
suite = unittest.TestSuite()
tests = loader.loadTestsFromTestCase(PipelineTests)
suite.addTests(tests)

out = io.BytesIO()
runner = xmlrunner.XMLTestRunner(out)
runner.run(suite)
out.seek(0)

with open('/dbfs/tmp/pipeline/junit/TEST-Pipeline.xml', 'wb') as f:
  f.write(out.read())

In [13]:
out.seek(0)
print(out.read().decode('utf-8'))

In [14]:
PipelineTests.__dict__