Skip to content

lorenzo-romanelli/jupyter-db-connect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Jupyter to Azure Databricks connector

Connect to Azure Databricks clusters from Jupyter notebooks.

Installation

1. Clone this repo

$ git clone https://github.com/lorenzo-romanelli/jupyter-db-connect.git

2. Create and activate a virtual environment

Remember, when creating the virtual environment you need to install either Python 2.7 or Python 3.5, depending on which version is running on the cluster.

$ cd jupyter-db-connect
$ virtualenv -p /usr/bin/python3.5 env
$ cd env
$ source bin/activate

3. Install requirements

$ pip install -r requirements.txt

Configure your Databricks connection

Run the following command:

$ databricks-connect configure

And follow the onscreen instructions. You will be asked to fill in some config values:

Run tests

Run the following command to test your setup is up and working:

$ databricks-connect test

If the remote Databricks cluster is not running, it will start automatically (it might take some time).

Enjoy!

Run the following command:

$ jupyter notebook

And from your browser navigate to localhost:8888.

Example notebook

The notebook test notebook.ipynb contains some example commands to set up your Databricks environment from Jupyter itself, by defining the SparkContext, as well as dbutils and sqlContext.

On Databricks, they are automatically defined, but they need to be specified by hand in your local Jupyter notebook.

About

Connect to Azure Databricks from Jupyter notebooks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published