# Databricks SDK

This page considers details on working with databricks SDK.

In [27]:
from databricks.sdk import WorkspaceClient

## Workspace client

The most popular way to communicate with databricks workspace is through a `databricks.sdk.WorkspaceClient`. To create it, you must set up Databricks authentification:

- Through setting `~/.databrickscfg` file. It may look like this:
- Through defining environment variables. The most popular are:
    - `DATABRICKS_HOST`: set your databricks host.
    - `DATABRICKS_TOKEN`: set your access token.

The default `.databrickscfg` file may look like this: 

```
[DEFAULT]
host = https:////dbc-<some unique for workspace>.cloud.databricks.com
token = <here is your token>
```

- The profile name `DEFAULT` is important. You can specify a different name, but this will be used by default.
- The `host` you can copy from the browser url line (just host, without path).
- The `token` you can get through databricks UI: settings->developer->Access tokens->Manage.

**Note.** If you have problems with authentication, check the environment variables. Some tools, such as the VSCode Databricks extension, may define some default values starting with `DATABRICKS_...`. Also, check the `~/.ipython/profile_default/startup` if there are some startup scripts that can invisibly change the behavior of the IPython.

---

If everything cofigured correctly, the following cell shold be runned without any issues:

In [None]:
w = WorkspaceClient()

## Spark session

You can get a databricks session that will have access to your databricks workspace by using `databricks.connect.DatabricksSession.builder.remote().getOrCreate` method.

- You cannnot create crate a `DatabricksSession` if you have a regular `pyspark` installed on your system. You must run this code from a different Python environment.

---

The following cell creates a Spark session that attched to the Databricks environment runned in the "serverless" mode.

In [1]:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()

The following cell displays the list of the tables that are available in my Databricks workspace.

In [2]:
spark.sql("SHOW TABLES").show()

+--------+--------------------+-----------+
|database|           tableName|isTemporary|
+--------+--------------------+-----------+
| default|  telco_churn_bronze|      false|
| default|telco_churn_features|      false|
+--------+--------------------+-----------+



## Feature engineering

The `databricks.feature_engineering` module allows to manipulate feature storage in databricks.

The `databricks.feature_engineering.FeatureEngineeringClient` object provides methods:

| Method                                                                       | Description                                                                                                                           |
| ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `create_feature_table(...)`                                                  | Creates a feature table in Unity Catalog, defining its primary keys, schema, timestamp column, and metadata.                          |
| `create_training_set(...)`                                                   | Joins features (via `FeatureLookup` or `FeatureSpec`) to a DataFrame to form a training set with metadata.                            |
| `log_model(...)`                                                             | Logs an MLflow model together with feature metadata so the required features can be fetched automatically at inference.               |
| `score_batch(...)`                                                           | Runs batch inference: given a model URI and a DataFrame, automatically fetches missing features, joins them, and returns predictions. |
| `create_feature_spec(...)`                                                   | Defines a feature spec (collection of `FeatureLookup`/`FeatureFunction`) for use in training sets or feature serving.                 |
| `create_feature_serving_endpoint(...)`                                       | Creates an endpoint for real-time / online feature serving.                                                                           |
| `get_feature_serving_endpoint(...)` / `delete_feature_serving_endpoint(...)` | Manage (retrieve or delete) feature serving endpoints.                                                                                |
| `publish_table(...)`                                                         | Publishes an offline feature table to an online store for low-latency feature access.                                                 |
| `read_table(...)`                                                            | Reads the contents of a feature table into a Spark DataFrame.                                                                         |
| `write_table(...)`                                                           | Inserts or upserts data into a feature table; supports streaming DataFrames.                                                          |
| `set_feature_table_tag(...)` / `delete_feature_table_tag(...)`               | Manage tags (set or delete) on feature tables for governance and organization.                                                        |
| `drop_online_table(...)`                                                     | Removes a published feature table from an online store.                                                                               |

For more details and examples check the [Feature engineering](databricks_sdk/feature_engineering.ipynb) page.

## Serving

Databricks offers many possibilities of posibilities for serving machine, learning models as well as served models. You can access many functions can be accessed through the Python SDK. The following talbe lists some of them:

| Possibility | Description | Python SDK Context / Method Category |
| :--- | :--- | :--- |
| **Create Endpoint** | Programmatically create a new model serving endpoint, including configuration for custom models, Foundation Model APIs, or external models. | `w.serving_endpoints.create()` |
| **Get/List Endpoints** | Retrieve the status, configuration, and metadata for a specific serving endpoint or list all serving endpoints in the workspace. | `w.serving_endpoints.get()`, `w.serving_endpoints.list()` |
| **Update Endpoint Configuration** | Modify an existing serving endpoint's configuration, such as changing the served model version, adjusting traffic split, or changing the workload size. | `w.serving_endpoints.update_config()` |
| **Delete Endpoint** | Remove a serving endpoint. | `w.serving_endpoints.delete()` |
| **Query Endpoint (Scoring)** | Send real-time inference requests to a deployed model serving endpoint, often using an OpenAI-compatible client configured via the SDK for LLMs, or standard methods for custom models. | Handled via the Databricks-configured OpenAI client or other scoring methods (e.g., `mlflow.deployments.get_deploy_client()` in some contexts). |
| **Manage Provisioned Throughput (PT) Endpoints** | Create and manage endpoints specifically configured for Foundation Models with guaranteed performance (Provisioned Throughput). | `w.serving_endpoints.create_pt_endpoint()` (and related PT methods) |
| **Retrieve Build Logs** | Get the build logs for a served model on a serving endpoint, useful for debugging deployment issues. | `w.serving_endpoints.get_served_model_build_logs()` |
| **Configure AI Gateway** | Set configurations related to AI Gateway features like fallbacks, guardrails, inference tables, and usage tracking for the serving endpoint. | Used within the `ai_gateway` parameter of `create()`/`update_config()`. |
| **Configure Rate Limits** | Apply rate limits to the serving endpoint (though documentation suggests using AI Gateway for newer rate limit management). | Used within the `rate_limits` parameter of `create()`. |
| **Route Optimization** | Enable configuration for route optimization on the serving endpoint for low-latency workloads. | Used within the `route_optimized` parameter of `create()`/`update_config()`. |

Check more in the [Serving](databricks_sdk/serving.ipynb) page.