# Using jupysql in databricks

The [databricks](https://docs.databricks.com/notebooks/notebooks-code.html#code-languages-in-notebooks) supports different languages in their notebooks. Notably, they support python via `%py` and SQL via `%sql`. We can natively use sql magic `%sql` in databricks without installing any additional modules. However, the catch is, it is integrated with pyspark and returns spark dataframes, unlike the regular pandas dataframes.

# Different languages (python and sql) in databricks notebooks

In [0]:
%py
2 + 4

Out[1]: 6

In [0]:
%sql
select 1+2

(1 + 2)
3


In [0]:
output_spark_dataframe = _sqldf

In [0]:
output_spark_dataframe.head()

Out[4]: Row((1 + 2)=3)

In [0]:
pandas_dataframe = output_spark_dataframe.toPandas()
pandas_dataframe

Unnamed: 0,(1 + 2)
0,3


If you have connection to your databases in your databricks, you can also query those databases.
For example, I have access to database called 'datascience' and it has a table called 'test'.
we can access this table using following query:

In [0]:
%sql
select * from datascience.test limit 2;

A,B
10,a
20,b


# Using jupysql in databricks

- step 01: install the module
- step 02: load the extension
- step 03: test the sql query

In [0]:
# if you already have install jupysql, unisntall it
#!pip uninstall jupysql

In [0]:
# install jupysql directly from github branch NOTE: regular installation pip install jupysql may not work (we have %sql already in databricks)
# !pip install git+https://github.com/ploomber/jupysql@alias

In [0]:
!pip show jupysql

Name: jupysql
Version: 0.5.3.dev0
Summary: Better SQL in Jupyter
Home-page: https://github.com/ploomber/jupysql
Author: Ploomber
Author-email: contact@ploomber.io
License: MIT
Location: /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages
Requires: prettytable, ipython, ploomber-core, ipython-genutils, sqlalchemy, jinja2, sqlparse
Required-by: 


In [0]:
# load the extension

In [0]:
%load_ext sql

[33mThere's a new jupysql version available (0.5.3), you're running 0.5.3.dev0. To upgrade: pip install jupysql --upgrade[0m


In [0]:
#ignore the suggestions.

# jupysql with sqlite

In [0]:
# connect to sqlite engine

In [0]:
%jupysql sqlite://

In [0]:
# create test data
import pandas as pd

df = pd.DataFrame({'A': [10,20,30], 'B': list('abc')})
df



Unnamed: 0,A,B
0,10,a
1,20,b
2,30,c


In [0]:
# put this dataframe to sqlite engine

In [0]:
%jupysql --persist df

*  sqlite://
Out[19]: 'Persisted df'

In [0]:
# now you can query this dataframe using sql magic

In [0]:
%jupysql SELECT * FROM df LIMIT 2;

*  sqlite://
Done.


index,A,B
0,10,a
1,20,b
