
# Tracking Databricks Units (DBU's)




## Option 1: System Tables (Best Option)

[Instructions](https://docs.databricks.com/en/administration-guide/system-tables/index.html) 

[Sample Notebooks](https://www.databricks.com/resources/demos/tutorials/governance/system-tables)

[Billable Usage Schema](https://docs.databricks.com/en/administration-guide/system-tables/billing.html)


## Option 2: View billable usage using the account console 

[Instructions](https://docs.databricks.com/en/administration-guide/account-settings/usage.html)

This includes [instructions on downloading usage data](https://docs.databricks.com/en/administration-guide/account-settings/usage.html#usage-downloads).  


## Option 3: Download usage logs with the Account API

Follow the guided instructions below.  

[GET /api/2.0/accounts/{account_id}/usage/download](https://docs.databricks.com/api/account/billableusage/download)

To authenticate to the Account API, you can use Databricks OAuth tokens for [service principals](https://docs.databricks.com/dev-tools/service-principals.html) or an account admin’s username and password. [Databricks strongly recommends that you use OAuth tokens for service principals.](https://docs.databricks.com/dev-tools/authentication-oauth.html)




### Retrieve your Databricks Account ID

[Instructions](https://docs.databricks.com/en/administration-guide/account-settings/index.html#locate-your-account-id)

In [0]:
from ipywidgets import Text, DatePicker, Checkbox, Layout
style = {'description_width': 'initial'}
layout = Layout(width='75%')

In [0]:
accountId = Text(
  value="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  description="Databricks Account Id",
  disabled=False,
  style=style, 
  layout=layout  
)

display(accountId)



### Identify a scope to store secrets

Option 1: Reuse an existing secret scope to store the secret

Option 2: [Create a scope if one does not exist](https://docs.databricks.com/en/security/secrets/secret-scopes.html)

In [0]:
secretScope = Text(
  value="x",
  description="Secret Scope",
  disabled=False,
  style=style  
)

display(secretScope)


### Create a Service Principal and OAuth credentials

You must be an account admin to create a service principal and manage OAuth credentials for service principals. 

[Instructions](https://docs.databricks.com/en/dev-tools/authentication-oauth.html)

The service principal will need to assume the role of Account Admin.  

The secret will only be revealed once during creation. The client ID is the same as the service principal’s application ID.



### Store OAuth Secret ID and Token into the Databricks Secret Vault

[Instructions](https://docs.databricks.com/en/security/secrets/secrets.html)

This step requires use of the [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html). 

In [0]:
spOAuthSecretId = Text(
  value="observeServicePrincipalOAuthSecretId",
  description="Observe Service Principal OAuth Secret Id",
  disabled=False,
  style=style,
  layout=layout
)

spOAuthSecretToken = Text(
  value="observeServicePrincipalOAuthSecretToken",
  description = "Observe Service Principal OAuth Secret Token",
  disabled=False,
  style=style,
  layout=layout  
)

display(spOAuthSecretId, spOAuthSecretToken)


### Select Start and End Date

In [0]:
startDate = DatePicker(
    description='Start Date',
    disabled=False
)

endDate = DatePicker(
    description='End Date',
    disabled=False
)

display(startDate, endDate)

In [0]:
import requests
from requests.auth import HTTPBasicAuth

from pyspark.sql import Row 
from pyspark.sql.functions import hash, col
from datetime import date, datetime

import csv

In [0]:
servicePrincipalOAuthSecretClientId = dbutils.secrets.get(secretScope.value, spOAuthSecretId.value)
servicePrincipalOAuthSecretToken = dbutils.secrets.get(secretScope.value, spOAuthSecretToken.value)
basic = HTTPBasicAuth(servicePrincipalOAuthSecretClientId, servicePrincipalOAuthSecretToken)

url = "https://accounts.cloud.databricks.com/oidc/accounts/{myAccountId}/v1/token".format(myAccountId=accountId.value)

data = {'grant_type': 'client_credentials', 'scope': 'all-apis'}
headers = {'Content-Type': 'application/x-www-form-urlencoded'}

In [0]:
r = requests.post(url, data=data, headers=headers, auth=basic)
print(r.status_code)

In [0]:
access_token = r.json()["access_token"]

In [0]:
queryString = "start_month={startMonth}&end_month={endMonth}&personal_data=false".format(startMonth=str(startDate.value)[0:7], endMonth=str(endDate.value)[0:7])
url = "https://accounts.cloud.databricks.com/api/2.0/accounts/{myAccountId}/usage/download?{queryString}".format(myAccountId=accountId.value, queryString=queryString)
headers = {"Authorization": "Bearer {accessToken}".format(accessToken = access_token)}

In [0]:
r = requests.get(url, headers=headers)
print(r.status_code)

In [0]:
out = r.text
lines = out.split("\n")

rows = []
numberOfValidRows = len(lines) - 1 
for line in lines[1:numberOfValidRows]:
  csvRows = csv.reader([line], delimiter=',', quoting=csv.QUOTE_MINIMAL)
  for item in csvRows:
    row = Row(
      workspaceId=item[0], 
      timestamp=item[1], 
      clusterId=item[2], 
      clusterName=item[3], 
      clusterNodeType=item[4],
      clusterOwnerUserId=item[5],
      clusterCustomTags=item[6],
      sku=item[7],
      dbus=item[8],
      machineHours=item[9]
    )
  rows.append(row)

df = spark.createDataFrame(x for x in rows)

display(df)



In [0]:
df.createOrReplaceTempView("usage")

In [0]:
%sql

SELECT date_format(date_trunc("DAY", timestamp), "yyyy-MM-dd") as dt, sku, SUM(dbus) as dbus
FROM usage
GROUP BY ALL
ORDER BY dt

In [0]:
# LIST PRICES SHOWN
# https://www.databricks.com/product/pricing

sku_df = spark.createDataFrame(
    [
        ('ENTERPRISE_ALL_PURPOSE_COMPUTE', 0.55),
        ('ENTERPRISE_ALL_PURPOSE_COMPUTE_(PHOTON)', 0.55),
        ('ENTERPRISE_JOBS_COMPUTE', 0.15),
        ('ENTERPRISE_JOBS_COMPUTE_(PHOTON)', 0.15),
        ('ENTERPRISE_DLT_CORE_COMPUTE',0.20),
        ('ENTERPRISE_DLT_PRO_COMPUTE',0.25),
        ('ENTERPRISE_DLT_ADVANCED_COMPUTE',0.36),
        ('ENTERPRISE_DLT_CORE_COMPUTE_(PHOTON)',0.20),
        ('ENTERPRISE_DLT_PRO_COMPUTE_(PHOTON)',0.25),
        ('ENTERPRISE_DLT_ADVANCED_COMPUTE_(PHOTON)',0.36),
        ('ENTERPRISE_SQL_COMPUTE',0.22),
        ('ENTERPRISE_SQL_PRO_COMPUTE_US_EAST_N_VIRGINIA',0.55),
        ('ENTERPRISE_SQL_PRO_COMPUTE_US_EAST_OHIO',0.55),
        ('ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_N_VIRGINIA',0.70),
        ('ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_OHIO',0.70),
        ('ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_WEST_OREGON',0.70),
    ],
    ["sku_name", "price"]
)

sku_df.createOrReplaceTempView("sku_prices")

In [0]:
%sql

SELECT *
FROM sku_prices

In [0]:
%sql

SELECT 
  date_format(date_trunc("DAY", usage.timestamp), "yyyy-MM-dd") as dt, 
  usage.sku, 
  SUM(usage.dbus) as dbus, 
  sku_prices.price, 
  (usage.dbus * sku_prices.price) as dollarDBU
FROM usage LEFT JOIN sku_prices on usage.sku = sku_prices.sku_name
GROUP BY ALL
ORDER BY dollarDBU