# Data Quality in Databricks Workflows (jobs) with Pydantic

> ⚠️ This is a draft of the agenda of the future meetup

Thursday, February 20, 2025


➡️ [Meetup Announcement](https://www.meetup.com/warsaw-data-engineering/events/306200574/)

This meetup is a continuation of the two past events.

W poprzednim odcinku:

1. Projekt Databricks zarządzany przez Databricks Asset Bundles (DAB)
1. Pierwszy projekt z pydantic (libka w Pythonie), a drugi to "hello world" Databricks Asset Bundle project z przykładowym job'em.

Agenda:

1. **10 minut** Ogłoszenia. Czas na szalone pomysły na przyszłe meetupy 👻
    * News (new versions, new features, etc.)
1. **55 minut** Live coding session, a w nim:
    * Przypomnimy sobie osiągnięcie poprzednich meetupów: Databricks Asset Bundle (DAB) z Databricks job z pojedynczym notebookiem z libką w Pythonie z Pydantic. Korzystamy z uv do zarządzania libką w Pythonie.
    * Główny cel meetupu: Stworzymy UDFa do walidacji rekordów, którego "uzbroimy" w pydantic'a. To miał być główny cel poprzedniego meetupu, ale nie wyszło i będzie ponownie 🤷‍♂️
1. **10 minut** Q&A i zbieranie pomysłów na kolejne edycje

Całkowity czas trwania meetupu: **1h 15min**


## Event Question

O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje 🙏

1. staram się nadarzyć za tym co Jacek mówi i czegoś się dowiedzieć
1. framework w Python - best practices
1. DAB
1. continue exploring quality with dqx or dlt publish to different schemas as good standard for medalion
1. Plan rozwoju, doświadczenia zawodowe wymiataczy technologicznych
1. Jakieś zaawansowane data quality w DBR; może jakaś analiza wykorzystania narzędzi typu Polars/DuckDB dla jedno-node’owych klastrów?
1. Jeżeli uda mi się dołączyć, to będzie fajnie posłuchać dalszej cześci poprzedniego meetup'u :)
1. Podłączenie Master data w transformacjach Databricks
1. Pydantic
1. everything about databricks
1. Chcę rozwijać swoje umiejętności w databricks a ta seria spotkań to coś czego szukałem

# 📢 News

Things worth watching out for...


## New Versions

What has changed in the tooling space we keep an eye on since we last met?

* Databricks CLI
* [uv 0.5.29](https://github.com/astral-sh/uv/releases/tag/0.5.29)
* [MLflow 2.20.2](https://github.com/mlflow/mlflow/releases/tag/v2.20.2)
    * released this 2 days ago with 176 commits to master since this release 🤨
* [awscli 2.24.0](https://github.com/aws/aws-cli/releases/tag/2.24.0)


## DQX by Databricks Labs

https://github.com/databrickslabs/dqx


# 👀 In the spotlight: `uv`

It is one of the regular sections in our schedule until we run out of...interest to dig deeper and learn more.


## uv init


The very recent change was to add `--bare` option to `uv init`. Why is this important?

<br>

```py
uv init --bare
```


* [Working on projects](https://docs.astral.sh/uv/guides/projects/)
* [Creating projects](https://docs.astral.sh/uv/concepts/projects/init/)
* [uv init](https://docs.astral.sh/uv/reference/cli/#uv-init)


# Live Coding Session


## ✅ Create Databricks Project

Databricks Asset Bundles (DAB) enters the scene 🎬

`databricks bundle init default-python`

* Name: `pydantic_workflow` (in `demo` directory)
* Python included
* No DLT pipelines

Learn more:

1. [Databricks Asset Bundles development](https://docs.databricks.com/en/dev-tools/bundles/work-tasks.html)


### databricks.yml and the resources

Review the following:

1. `databricks.yml`
    * Make sure that `workspace/host` section points at the proper Databricks workspace
1. `resources/*.yml` (included in `databricks.yml`)


### Clean Up

> ⚠️ NOTE
>
> This step is not required at such an early stage of Databricks project's development.
> You may skip it.

Remove the following (unnecessary) files and directories:

1. `rm pytest.ini requirements-dev.txt setup.py`
1. `rm -rf fixtures scratch dist`


### Validate Bundle

`databricks bundle validate`


```
Name: pydantic_workflow
Target: dev
Workspace:
  Host: https://curriculum-dev.cloud.databricks.com
  User: jacek@japila.pl
  Path: /Workspace/Users/jacek@japila.pl/.bundle/pydantic_workflow/dev

Validation OK!
```


### Deploy Bundle

`databricks bundle deploy`


Unless you removed the project sources, you should see the following logs while deploying the bundle:

<br>

```
❯ databricks bundle deploy
Building uv_workflows...
Uploading uv_workflows-0.0.1+20250109.152923-py3-none-any.whl...
...
```

This `Building` step is triggered because `setup.py` is in the main directory.


```
❯ databricks bundle deploy
Error: no files match pattern: dist/*.whl
  at resources.jobs.pydantic_workflow_job.tasks[1].libraries[0].whl
  in resources/pydantic_workflow.job.yml:35:15
```


That's expected since there's no Python wheel to be deployed.

Referenced in `resources/pydantic_workflow.job.yml`

An easy fix is to comment out `main_task` task.

```
❯ databricks bundle deploy
Uploading bundle files to /Workspace/Users/jacek@japila.pl/.bundle/pydantic_workflow/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```


### Run Job

`databricks bundle run pydantic_workflow_job`

> ⚠️ Hint
>
> Use auto-completion while typing `databricks bundle` commands (incl. the names of resources).

It should work just fine.

The notebook uses the Python code directly (they're in the same directory). All seems OK. Why bother with `uv`?! 🤔


### PERMISSION_DENIED: You are not authorized to create clusters

```
❯ databricks bundle run pydantic_workflow_job
Run URL: https://curriculum-dev.cloud.databricks.com/?o=3551974319838082#job/52151941639258/run/148305795734637

2025-02-16 19:37:26 "[dev jacek] pydantic_workflow_job" RUNNING
2025-02-16 19:37:28 "[dev jacek] pydantic_workflow_job" INTERNAL_ERROR FAILED Task notebook_task failed with message: Unexpected user error while preparing the cluster for the job. Cause: PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator. This caused all downstream tasks to get skipped.
Task notebook_task FAILED:
run failed with error message
 Unexpected user error while preparing the cluster for the job. Cause: PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator.


Error: Task notebook_task failed!
Error:
run failed with error message
 Unexpected user error while preparing the cluster for the job. Cause: PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator.
Trace:

Error: failed to reach TERMINATED or SKIPPED, got INTERNAL_ERROR: Task notebook_task failed with message: Unexpected user error while preparing the cluster for the job. Cause: PERMISSION_DENIED: You are not authorized to create clusters. Please contact your administrator. This caused all downstream tasks to get skipped.
```


If you run into `PERMISSION_DENIED: You are not authorized to create clusters` (shown above), replace `job_cluster_key` in the job definition file with `existing_cluster_id: [server_id]`.

Or simply talk to the Databricks admins.

```
❯ databricks bundle run pydantic_workflow_job
Run URL: https://curriculum-dev.cloud.databricks.com/?o=3551974319838082#job/52151941639258/run/379631815214233

2025-02-16 19:43:47 "[dev jacek] pydantic_workflow_job" RUNNING
2025-02-16 19:49:48 "[dev jacek] pydantic_workflow_job" TERMINATED SUCCESS
```


### Motivation

(The Leading Idea of This Meetup Series)

Let's pause for a moment and try to answer the following question:

> The bundle works (deploys and runs), so why use `uv`, `poetry`, or any other Python build tool?! What are we missing?


Possible answers:

1. We want to execute tests before deployment (and other CI/CD-like management tasks to be executed locally)
1. More importantly, this [python_wheel_task](https://docs.databricks.com/api/workspace/jobs/create#tasks-python_wheel_task) in `pydantic_workflow_job` definition could be a separate project (with its own lifecycle, independent of the DAB project)


## ✅ Create uv Project

`uv init --bare`

[Develop a Python wheel file using Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/python-wheel.html) (esp. [Step 4: Update the project’s bundle to use Poetry](https://docs.databricks.com/en/dev-tools/bundles/python-wheel.html))

> By default, the bundle template specifies building the Python wheel file using `setuptools` along with the files `setup.py` and `requirements-dev.txt`.

[Databricks Asset Bundle configuration](https://docs.databricks.com/en/dev-tools/bundles/settings.html) (esp. [artifacts mapping](https://docs.databricks.com/en/dev-tools/bundles/settings.html#artifacts))

> The top-level artifacts mapping specifies one or more artifacts that are automatically built during bundle deployments and can be used later in bundle runs.

Learn more:

1. [Working on projects](https://docs.astral.sh/uv/guides/projects/)
1. [Building your package](https://docs.astral.sh/uv/guides/publish/#building-your-package)

```
❯ uv init --bare
Initialized project `pydantic-workflow`
```

### pyproject.toml

Review `pyproject.toml`


### Build Python wheel

`uv build --wheel`

> **build**  Build Python packages into source distributions and wheels

`uv build --help` (esp. `uv build --wheel`)

Learn more in [Building your package](https://docs.astral.sh/uv/guides/publish/#building-your-package)

```
❯ uv build --wheel
Building wheel...
running egg_info
creating src/pydantic_workflow.egg-info
writing src/pydantic_workflow.egg-info/PKG-INFO
writing dependency_links to src/pydantic_workflow.egg-info/dependency_links.txt
writing top-level names to src/pydantic_workflow.egg-info/top_level.txt
writing manifest file 'src/pydantic_workflow.egg-info/SOURCES.txt'
reading manifest file 'src/pydantic_workflow.egg-info/SOURCES.txt'
writing manifest file 'src/pydantic_workflow.egg-info/SOURCES.txt'
running bdist_wheel
running build
running build_py
creating build/lib/pydantic_workflow
copying src/pydantic_workflow/__init__.py -> build/lib/pydantic_workflow
copying src/pydantic_workflow/main.py -> build/lib/pydantic_workflow
running egg_info
writing src/pydantic_workflow.egg-info/PKG-INFO
writing dependency_links to src/pydantic_workflow.egg-info/dependency_links.txt
writing top-level names to src/pydantic_workflow.egg-info/top_level.txt
reading manifest file 'src/pydantic_workflow.egg-info/SOURCES.txt'
writing manifest file 'src/pydantic_workflow.egg-info/SOURCES.txt'
installing to build/bdist.macosx-10.9-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.9-x86_64/wheel
creating build/bdist.macosx-10.9-x86_64/wheel/pydantic_workflow
copying build/lib/pydantic_workflow/__init__.py -> build/bdist.macosx-10.9-x86_64/wheel/./pydantic_workflow
copying build/lib/pydantic_workflow/main.py -> build/bdist.macosx-10.9-x86_64/wheel/./pydantic_workflow
running install_egg_info
Copying src/pydantic_workflow.egg-info to build/bdist.macosx-10.9-x86_64/wheel/./pydantic_workflow-0.1.0-py3.11.egg-info
running install_scripts
creating build/bdist.macosx-10.9-x86_64/wheel/pydantic_workflow-0.1.0.dist-info/WHEEL
creating '/Users/jacek/dev/learn-databricks/demo/pydantic_workflow/dist/.tmp-ssvefrd_/pydantic_workflow-0.1.0-py3-none-any.whl' and adding 'build/bdist.macosx-10.9-x86_64/wheel' to it
adding 'pydantic_workflow/__init__.py'
adding 'pydantic_workflow/main.py'
adding 'pydantic_workflow-0.1.0.dist-info/METADATA'
adding 'pydantic_workflow-0.1.0.dist-info/WHEEL'
adding 'pydantic_workflow-0.1.0.dist-info/top_level.txt'
adding 'pydantic_workflow-0.1.0.dist-info/RECORD'
removing build/bdist.macosx-10.9-x86_64/wheel
Successfully built dist/pydantic_workflow-0.1.0-py3-none-any.whl
```

## ✅ Integrate DAB and uv


Remember the error? Time to fix it in a more professional way 😉 Before, we simply commented out the task that uses the wheel.

<br>

```
❯ databricks bundle deploy
Error: no files match pattern: dist/*.whl
  at resources.jobs.pydantic_workflow_job.tasks[1].libraries[0].whl
  in resources/pydantic_workflow.job.yml:35:15
```


### databricks.yml and artifacts

Add the following `artifacts` section to `databricks.yml`.

<br>

```
artifacts:
  pydantic_workflow_wheel:
    type: whl
    build: uv build --wheel
    path: .
```

Learn more:

1. [artifacts](https://docs.databricks.com/en/dev-tools/bundles/settings.html#artifacts) mapping
1. [Databricks Asset Bundle configuration](https://docs.databricks.com/en/dev-tools/bundles/settings.html)


### resources/pydantic_workflow.job.yml

Uncomment `main_task` in `resources/pydantic_workflow.job.yml` (that uses `libraries` with the wheel).


### Redeploy DAB

`databricks bundle deploy` to re-deploy the bundle. This time the lib's built by `uv` ❤️

With the changes, you should see `Building pydantic_workflow_wheel...` message while `databricks bundle deploy`.

```
❯ databricks bundle deploy
Building pydantic_workflow_wheel...
Uploading pydantic_workflow-0.1.0-py3-none-any.whl...
Uploading bundle files to /Workspace/Users/jacek@japila.pl/.bundle/pydantic_workflow/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```


### Run Job

`databricks bundle run pydantic_workflow_job`

There should be two tasks executed properly (incl. `main_task` with the uv-managed Python wheel).

```
❯ databricks bundle run pydantic_workflow_job
Run URL: https://curriculum-dev.cloud.databricks.com/?o=3551974319838082#job/52151941639258/run/1079588996512936

2025-02-16 20:43:50 "[dev jacek] pydantic_workflow_job" RUNNING
2025-02-16 20:44:23 "[dev jacek] pydantic_workflow_job" TERMINATED SUCCESS
Output:
=======
Task notebook_task:

=======
Task main_task:
/databricks/python/lib/python3.12/site-packages/databricks/sdk/service/jobs.py:60: SyntaxWarning: invalid escape sequence '\.'
  """The sequence number of this run attempt for a triggered job run. The initial attempt of a run
/databricks/python/lib/python3.12/site-packages/databricks/sdk/service/jobs.py:2570: SyntaxWarning: invalid escape sequence '\.'
  """The sequence number of this run attempt for a triggered job run. The initial attempt of a run
/databricks/python/lib/python3.12/site-packages/databricks/sdk/service/jobs.py:3431: SyntaxWarning: invalid escape sequence '\.'
  """The sequence number of this run attempt for a triggered job run. The initial attempt of a run


+--------------------+---------------------+-------------+-----------+----------+-----------+
|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip|
+--------------------+---------------------+-------------+-----------+----------+-----------+
| 2016-02-13 21:47:53|  2016-02-13 21:57:15|          1.4|        8.0|     10103|      10110|
| 2016-02-13 18:29:09|  2016-02-13 18:37:23|         1.31|        7.5|     10023|      10023|
| 2016-02-06 19:40:58|  2016-02-06 19:52:32|          1.8|        9.5|     10001|      10018|
| 2016-02-12 19:06:43|  2016-02-12 19:20:54|          2.3|       11.5|     10044|      10111|
| 2016-02-23 10:27:56|  2016-02-23 10:58:33|          2.6|       18.5|     10199|      10022|
+--------------------+---------------------+-------------+-----------+----------+-----------+
only showing top 5 rows
```


### Review

Open up the workspace and review `main_task` definition. There should be our uv-built wheel under **Dependent libraries**.


![](./pydantic_workflow_main_task_python_wheel.png)


> ⚠️ FIXME
>
> The following image is outdated.


![](./uv_workflow_job.png)


### (Optional) Destroy Bundle

This is an optional step to remove everything in the Databricks workspace so we can start afresh.

```
❯ databricks bundle destroy --auto-approve
The following resources will be deleted:
  delete job pydantic_workflow_job

All files and directories at the following location will be deleted: /Workspace/Users/jacek@japila.pl/.bundle/pydantic_workflow/dev

Deleting files...
Destroy complete!
```


## ✅ Add Pydantic to the mix


### Pin Python

`uv python pin`

Pin the Python version to match Databricks Runtime's Python (to avoid errors due to Python mis-configuration).

> ⚠️ Note
>
> [Databricks Runtime 16.2](https://docs.databricks.com/en/release-notes/runtime/16.2.html#system-environment) runs with Python 3.12.3.


Unless done already, install the desired Python version with `uv python install`.

```
❯ uv python install 3.12.3
Installed Python 3.12.3 in 9.11s
 + cpython-3.12.3-macos-x86_64-none
```

```
❯ uv python pin 3.12.3
Pinned `.python-version` to `3.12.3`
```

### Add Pydantic Dependency

`uv add`

Pydantic is the main dependency.


```
❯ uv add pydantic
Resolved 5 packages in 346ms
Installed 4 packages in 7ms
 + annotated-types==0.7.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + typing-extensions==4.12.2
```

Review `pyproject.toml`.

There should be the following new section:

<br>

```
dependencies = [
    "pydantic>=2.10.6",
]
```

### Build Project

`uv build --wheel` for a test run.


### Write Tests

There's no better way to validate our code than the accompanying tests.

I'd not be surprised if you always start a project with tests first (see [Test-driven development](https://en.wikipedia.org/wiki/Test-driven_development)).


### Add Test Dependencies

Add a couple of development (test) dependencies to the project with `uv add --dev`.

```
❯ uv add --dev pyspark pytest
Resolved 12 packages in 17.60s
      Built pyspark==3.5.4
Prepared 6 packages in 10.33s
Installed 6 packages in 24ms
 + iniconfig==2.0.0
 + packaging==24.2
 + pluggy==1.5.0
 + py4j==0.10.9.7
 + pyspark==3.5.4
 + pytest==8.3.4
```


Review `pyproject.toml`.

There should be the following new section:

<br>

```
[dependency-groups]
dev = [
    "pyspark>=3.5.4",
    "pytest>=8.3.4",
]
```


### Run Tests

There are no tests yet, but with `pytest` defined as a dev dependency you should still be able to run `uv run pytest`.


```
❯ uv run pytest
================================================= test session starts =================================================
platform darwin -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/jacek/dev/learn-databricks/demo/pydantic_workflow
configfile: pyproject.toml
collected 0 items

================================================ no tests ran in 0.00s ================================================
```


It works! 🥳

### Test-Driven Development

Create `tests` directory with the following `test_trip.py` file.

<br>

```
from pydantic_workflow.trip import Trip


def test_valid_trip():
    Trip(id=10)
```

Execute `uv run pytest` (that should fail as there's no `Trip` class yet and, most likely, the sources live under `src` directory).

```
❯ uv run pytest
================================================= test session starts =================================================
platform darwin -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/jacek/dev/learn-databricks/demo/pydantic_workflow
configfile: pyproject.toml
collected 0 items / 1 error

======================================================= ERRORS ========================================================
_________________________________________ ERROR collecting tests/test_trip.py _________________________________________
ImportError while importing test module '/Users/jacek/dev/learn-databricks/demo/pydantic_workflow/tests/test_trip.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../.local/share/uv/python/cpython-3.12.3-macos-x86_64-none/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_trip.py:1: in <module>
    from pydantic_workflow.trip import Trip
E   ModuleNotFoundError: No module named 'pydantic_workflow'
=============================================== short test summary info ===============================================
ERROR tests/test_trip.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 1 error in 0.06s ===================================================
```

### ModuleNotFoundError: No module named 'pydantic_workflow'


By default, uv assumes the sources are in the main directory (not `src` as Databricks Asset Bundles does).

Learn more in [Packaged applications](https://docs.astral.sh/uv/concepts/projects/init/#packaged-applications).

Add the following to `pyproject.toml`:

<br>

```
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
```


Execute `uv run pytest` that should fail due to `Trip` class missing (`src` directory with the sources is not an issue anymore).

```
❯ uv run pytest
      Built pydantic-workflow @ file:///Users/jacek/dev/learn-databricks/demo/pydantic_workflow
Installed 1 package in 3ms
================================================= test session starts =================================================
platform darwin -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/jacek/dev/learn-databricks/demo/pydantic_workflow
configfile: pyproject.toml
collected 0 items / 1 error

======================================================= ERRORS ========================================================
_________________________________________ ERROR collecting tests/test_trip.py _________________________________________
ImportError while importing test module '/Users/jacek/dev/learn-databricks/demo/pydantic_workflow/tests/test_trip.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../.local/share/uv/python/cpython-3.12.3-macos-x86_64-none/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_trip.py:1: in <module>
    from pydantic_workflow.trip import Trip
E   ModuleNotFoundError: No module named 'pydantic_workflow.trip'
=============================================== short test summary info ===============================================
ERROR tests/test_trip.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 1 error in 0.06s ===================================================
```

### Create Trip Pydantic Model

`Trip` class will be a Pydantic model.

This will be used to validate incoming records from the pre-installed `samples.nyctaxi.trips` delta table.


Add the following to `src/pydantic_workflow/trip.py`:

<br>

```py
from pydantic import BaseModel


class Trip(BaseModel):
    id: int
```


Run the tests.

<br>

```
❯ uv run pytest
================================================= test session starts =================================================
platform darwin -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/jacek/dev/learn-databricks/demo/pydantic_workflow
configfile: pyproject.toml
collected 1 item

tests/test_trip.py .                                                                                            [100%]

================================================== 1 passed in 0.89s ==================================================
```


It works! 🥳

### Use Pydantic Validation

Let's extend `Trip` class to match the schema of the trips from `samples.nyctaxi.trips` table and accept records with `pickup_zip` and `dropoff_zip` different.

Learn more in [Validators](https://docs.pydantic.dev/latest/concepts/validators/) (particularly [After validators](https://docs.pydantic.dev/latest/concepts/validators/#model-after-validator))

```py
from typing_extensions import Self

from pydantic import BaseModel, model_validator

from datetime import datetime


class Trip(BaseModel):
    id: int
    tpep_pickup_datetime: datetime = datetime.now()
    tpep_dropoff_datetime: datetime = datetime.now()
    trip_distance: float = -1.0
    fare_amount: float = -1.0
    pickup_zip: int = -1
    dropoff_zip: int = -1

    @model_validator(mode='after')
    def enforce_different_zips(self) -> Self:
        if self.pickup_zip == self.dropoff_zip:
            raise ValueError('pickup_zip and dropoff_zip must be different')
        return self
```

The test should fail now. That's expected, though! 😜

Let's fix it.

### Test Invalid Trips

Extend the test to assert that only valid trips are accepted.

```py
from pydantic_workflow.trip import Trip

import pytest


def test_valid_trip():
    Trip(id=10, pickup_zip=10103, dropoff_zip=10110)


def test_invalid_trip():
    with pytest.raises(ValueError):
        Trip(id=10, pickup_zip=10023, dropoff_zip=10023)
```

```
❯ uv run pytest
================================================= test session starts =================================================
platform darwin -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/jacek/dev/learn-databricks/demo/pydantic_workflow
configfile: pyproject.toml
collected 2 items

tests/test_trip.py ..                                                                                           [100%]

================================================== 2 passed in 0.09s ==================================================
```


It works! 🥳

### (Optional) Run Job

Re-deploy the bundle and run the job to validate that the changes didn't get into our way.

<br>

```
❯ databricks bundle run pydantic_workflow_job
Run URL: https://curriculum-dev.cloud.databricks.com/?o=3551974319838082#job/706677441584698/run/274782797890396

2025-02-16 22:55:55 "[dev jacek] pydantic_workflow_job" RUNNING
2025-02-16 22:56:20 "[dev jacek] pydantic_workflow_job" TERMINATED SUCCESS
Output:
=======
Task notebook_task:

=======
Task main_task:
+--------------------+---------------------+-------------+-----------+----------+-----------+
|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip|
+--------------------+---------------------+-------------+-----------+----------+-----------+
| 2016-02-13 21:47:53|  2016-02-13 21:57:15|          1.4|        8.0|     10103|      10110|
| 2016-02-13 18:29:09|  2016-02-13 18:37:23|         1.31|        7.5|     10023|      10023|
| 2016-02-06 19:40:58|  2016-02-06 19:52:32|          1.8|        9.5|     10001|      10018|
| 2016-02-12 19:06:43|  2016-02-12 19:20:54|          2.3|       11.5|     10044|      10111|
| 2016-02-23 10:27:56|  2016-02-23 10:58:33|          2.6|       18.5|     10199|      10022|
+--------------------+---------------------+-------------+-----------+----------+-----------+
only showing top 5 rows
```


## Create PySpark UDF

That's the gist of this meetup.

From [Scalar UDFs](https://docs.databricks.com/en/udf/index.html#scalar-udfs):

> Scalar UDFs operate on a single row and return a single value for each row.

Learn more in [What are user-defined functions (UDFs)?](https://docs.databricks.com/en/udf/index.html) (specifically [Scalar UDFs](https://docs.databricks.com/en/udf/index.html#scalar-udfs)).


# 💡 Ideas for Future Events

1. [Delta Live Tables](https://docs.databricks.com/en/delta-live-tables/index.html) with uv and pydantic
1. Explore more [Pydantic](https://docs.pydantic.dev/latest/) features
1. Create a new DAB template with `uv` as the project management tool (based on `default-python` template). Start from `databricks bundle init --help`.
