Skip to content

gardnmi/all_purpose_bypass

Repository files navigation


All Purpose Bypass

Maybe save time but definitley save MONEY

About The Project

Alt Text

In Databricks, running code on an All Purpose Cluster is over 3x the cost of running on a Job Cluster. All Purpose Bypass provides a convenient method to quickly convert your Notebook into a job saving you time and money.

It is perfect for when you know you will have a long running command block or plan on leaving a notebook running overnight.

(back to top)

Built With

(back to top)

Prerequisites

Databricks

  • This tool is meant to be used in databricks workspaces

(back to top)

Installation

pip install in your Databricks Notebook

%pip install all_purpose_bypass

(back to top)

Quickstart

from all_purpose_bypass import Bypass

# Databricks API Token (Found in User Settings)
api_token = '###############################'

bypass = Bypass(api_token)
job_id = bypass.create_job()
bypass.run_job(job_id)

>>> Job located at: https://my-workspace.cloud.databricks.com/?#job/571474934623337
>>> Job Running: run_id is 1535015

Default Behavior

By default Bypass will create a Job named after the current notebook and assign the owner to the current user. The Job cluster that is created is a clone of the attached active all purpose cluster. To make the job more discoverable a tag of all-purpose-bypass is assigned to every job. If the job already exists, the parameters/options are updated.

Alt Text Alt Text

Advanced Usage

Note: You can create cluster compatibility issues. Please check with the databricks create cluster page to make sure the options are compatible with each other.

There are a number of arguments you pass to Bypass to modify the default behavior.

Parameters:

  • new_cluster: pass in your own json like dictionary with cluster configurations
  • spark_version: modify the spark_version of the default current active all purpose cluster
  • node_type_id: modify the node_type_id of the default current active all purpose cluster
  • aws_attributes: modify the aws_attributes of the default current active all purpose cluster
  • autoscale: modify the autoscale of the default current active all purpose cluster
    • if this parameter is set do not use num_workers
  • num_workers: modify the num_workers of the default current active all purpose cluster
    • if this parameter is set do not use autoscale
  • libraries: modify the libraries of the default current active all purpose cluster
  • clusterId: change the default current active all purpose cluster to anothe existing all purpose cluster

Example:

from all_purpose_bypass import Bypass

# Databricks API Token (Found in User Settings)
api_token = '###############################'

bypass = Bypass(api_token, node_type_id="i3.4xlarge", clusterId="1095-225741-yhdswzetj")
job_id = bypass.create_job()
bypass.run_job(job_id)

>>> Job located at: https://my-workspace.cloud.databricks.com/?#job/571474934623337
>>> Job Running: run_id is 1535015

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages