
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# 2.5 Lab - Create and Execute Unit Tests

### Estimated Duration: 15-20 minutes

By the end of this lab, you will have practiced creating and executing unit tests for the modularized functions that were created in the previous lab.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:


1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## A. View the Functions in the Python File

1. From the **./Course Notebooks/M02 - CI** folder, navigate to the file **[./src_lab/lab_functions/transforms.py]($./src_lab/lab_functions/transforms.py)**. This Python file contains the modularized functions from the previous lab. 

    Confirm that the file contains the `convert_miles_to_km` and `uppercase_column_names` functions.


**Code in the transforms.py file:**
```
from pyspark.sql import functions as F


def convert_miles_to_km(df, new_column_name, miles_column):
    return df.withColumn(new_column_name, F.round(F.col(miles_column) * 1.60934, 2))


def uppercase_columns_names(df):
    return df.select([F.col(col).alias(col.upper()) for col in df.columns])
```

## B. Create Unit Tests

Create two unit tests, one for each of the functions in the file above. 

It's typically easier to develop the unit tests within the notebook (or locally) and then move them to a separate **.py** file later to execute them with `pytest`.

1. Start by importing the `transforms` functions from the `lab_functions` module located in the `src_lab` directory, making them available for use in the current notebook.

**HINT:** The **src_lab** folder is in the same directory as this notebook. You don't have to use `sys.path.append()` to append the python path. The current path is appended by default.

In [0]:
from src_lab.lab_functions import transforms

2. Complete the unit test `test_uppercase_columns_function` function to test the custom `transforms.uppercase_column_names()` function. 

    Use the starter code below to help guide you. After you are done, run the unit test function and confirm that it does not return an error.

**NOTE:** There are a variety of ways to test this function to. We will keep it simple for this lab.

**SOLUTION:** Solution can be found in the **[./tests_lab/lab_unit_test_solution.py]($./tests_lab/lab_unit_test_solution.py)** file.

In [0]:
def test_uppercase_columns_function():

    ## Fake DataFrame with random column names
    data = [(1, 5.0, 1, 1, 1, 1)]
    columns = ["id", "trip_distance", "My_Column", "WithNumbers123", "WithSymbolX@#", "With Space"]
    df = spark.createDataFrame(data, columns)

    ## Apply the transforms.uppercase_columns_names function to return the actual column names
    actual_df = transforms.uppercase_columns_names(df)
    actual_columns = actual_df.columns

    ## Create a list of the expected column names
    expected_columns = ['ID', 'TRIP_DISTANCE', 'MY_COLUMN', 'WITHNUMBERS123', 'WITHSYMBOLX@#', "WITH SPACE"]

    ## Perform a test of the actual columns names and expected column names using a simple python assert statement
    assert actual_columns == expected_columns
    print('Test Passed!')

test_uppercase_columns_function()

Test Passed!


3. Complete the unit test `test_convert_miles_to_km_function` to test the custom `transforms.convert_milles_to_km` function. Use the `pyspark.testing.utils.assertDataFrameEqual` function to test the actual DataFrame against the expected DataFrame.

    Use the starter code below to help guide you. After you are done, run the unit tests and confirm that it does not return an error.

**NOTE:** There are a variety of unit tests you can run on the function. This is a simple example that tests the function on positive and null values. We should also test this function on negative values, but we will ignore those for this lab for simplicity.

**HINT:** [pyspark.testing.assertDataFrameEqual](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.testing.assertDataFrameEqual.html)

In [0]:

from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, DoubleType
from pyspark.testing.utils import assertDataFrameEqual


def test_convert_miles_to_km_function():
    # Prepare a DataFrame with sample data
    data = [(1.0,), (5.5,), (None,)]
    schema = StructType([
        StructField("trip_distance_miles", DoubleType(), True)  # Allow null values by setting nullable=True
    ])
    actual_df = spark.createDataFrame(data, schema)


    ## Apply the function on the sample data and store the actual DataFrame
    actual_df = transforms.convert_miles_to_km(df = actual_df, 
                                               new_column_name="trip_distance_km",   ## Name of the new column
                                               miles_column="trip_distance_miles")   ## Name of the source miles column


    ## Create an expected DataFrame with a defined schema using StructField DoubleType for each column
    data = [
        (1.0, 1.61),   # Row with values
        (5.5, 8.85),   # Row with values
        (None, None) # Row with null values
    ]

    ## Define schema
    schema = StructType([
        StructField("trip_distance_miles", DoubleType(), True),
        StructField("trip_distance_km", DoubleType(), True)
    ])

    ## Create expected DataFrame
    expected_df = spark.createDataFrame(data, schema)


    ## Compare the actual and expected DataFrames using assertDataFrameEqual
    assertDataFrameEqual(actual_df, expected_df)
    print('Test Passed!')


## Run the unit test
test_convert_miles_to_km_function()

Test Passed!


## C. Use `pytest` to Execute the Unit Tests

Next, use `pytest` to execute the unit tests. For this portion of the lab, you can do one of the following:

**C1. DURING A LIVE CLASS**
- Use `pytest` to execute the unit tests in the solution Python file that is already provided for you: **./tests_lab/lab_unit_test_solution.py**.

**C2. CHALLENGE (COMPLETE AFTER CLASS)**
- Migrate your unit tests from above into a new **your-file-name.py** file in the **tests_lab/** folder, and then use `pytest` to execute your file. Make sure to add your `pytest` fixture to create a Spark session and import the necessary packages to run the unit tests.

1. Import the `pytest` package version 8.3.4.

In [0]:
!pip install pytest==8.3.4

Collecting pytest==8.3.4
  Obtaining dependency information for pytest==8.3.4 from https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl.metadata
  Using cached pytest-8.3.4-py3-none-any.whl.metadata (7.5 kB)
Collecting iniconfig (from pytest==8.3.4)
  Obtaining dependency information for iniconfig from https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl.metadata
  Using cached iniconfig-2.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting pluggy<2,>=1.5 (from pytest==8.3.4)
  Obtaining dependency information for pluggy<2,>=1.5 from https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl.metadata
  Using cached pluggy-1.6.0-py3-none-any.whl.metadata (4.8 kB)
Using cached pytest-8.3.4-py3-none-any.whl (343 kB)
Using cached pluggy-1.6.0-py3-


2. If you are creating your own **.py** file for the challenge you can enable the autoreload extension to reload any imported modules automatically so that the command runs pick up those updates as you make them in the .py file. 

    Use the following commands in any notebook cell or Python file to enable the autoreload extension.

    Documentation: [Autoreload for Python modules](https://docs.databricks.com/en/files/workspace-modules.html#autoreload-for-python-modules)

In [0]:
%load_ext autoreload
%autoreload 2

3. Execute `pytest` on the **./tests_lab/lab_unit_test_solution.py** file. Run the cell and confirm both unit tests pass.

**NOTE:** If you are completing the challenge, modify the path to test your specific **.py** file.

In [0]:
import pytest
import sys

sys.dont_write_bytecode = True

retcode = pytest.main(["./tests_lab/lab_unit_test_solution.py", "-v", "-p", "no:cacheprovider"])

assert retcode == 0, "The pytest invocation failed. See the log for details."

platform linux -- Python 3.11.11, pytest-8.3.4, pluggy-1.6.0 -- /local_disk0/.ephemeral_nfs/envs/pythonEnv-4689f5dc-32bc-4a42-9c3e-d36098b3d3f1/bin/python
rootdir: /Workspace/Users/labuser10685686_1750443211@vocareum.com/devops-essentials-for-data-engineering-2.0.4/DevOps Essentials for Data Engineering
configfile: pytest.ini
[1mcollecting ... [0mcollected 2 items

tests_lab/lab_unit_test_solution.py::test_uppercase_columns_function [32mPASSED[0m[32m [ 50%][0m
tests_lab/lab_unit_test_solution.py::test_convert_miles_to_km_function [32mPASSED[0m[32m [100%][0m




&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>
