Skip to content

mycaule/dbx-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

myproject

This is a sample project for Databricks, generated via cookiecutter.

While using this project, you need Python 3.X and pip or conda for package management.

Local environment setup

  1. Instantiate a local Python environment via a tool of your choice. This example is based on conda, but you can use any environment management tool:
conda create -n myproject python=3.9
conda activate myproject
  1. If you don't have JDK installed on your local machine, install it (in this example we use conda-based installation):
conda install -c conda-forge openjdk=11.0.15
  1. Install project locally (this will also install dev requirements):
pip install -e ".[local,test]"

Interactive execution and development on Databricks clusters

  1. dbx expects that cluster for interactive execution supports %pip and %conda magic commands.
  2. Please configure your workflow (and tasks inside it) in conf/deployment.yml file.
  3. To execute the code interactively, provide either --cluster-id or --cluster-name.
dbx execute <workflow-name> \
    --cluster-name="<some-cluster-name>"

Multiple users also can use the same cluster for development. Libraries will be isolated per each user execution context.

Working with notebooks and Repos

To start working with your notebooks from a Repos, do the following steps:

  1. Add your git provider token to your user settings in Databricks
  2. Add your repository to Repos. This could be done via UI, or via CLI command below:
databricks repos create --url <your repo URL> --provider <your-provider>

This command will create your personal repository under /Repos/<username>/myproject. 3. Use git_source in your job definition as described here

CI/CD pipeline settings

Please set the following secrets or environment variables for your CI provider:

  • DATABRICKS_HOST
  • DATABRICKS_TOKEN

Testing and releasing via CI pipeline

  • To trigger the CI pipeline, simply push your code to the repository. If CI provider is correctly set, it shall trigger the general testing pipeline
  • To trigger the release pipeline, get the current version from the myproject/__init__.py file and tag the current code version:
git tag -a v<your-project-version> -m "Release tag for version <your-project-version>"
git push origin --tags

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages