### The dubo Python SDK

- [dubo.ask](#dubo.ask)
- [dubo.chart](#dubo.chart)
- [Using API keys](#Using-API-Keys)

Free tier requests are rate-limited–if you need a higher rate limit please contact support@dubo.gg.

In [3]:
import dubo
print("Dubo Version", dubo.__version__)

Dubo Version 0.2.8


## dubo.ask

The dubo library includes convenience functions to run queries on top of Pandas DataFrames.

For example, we can load the US Census data below into a Pandas DataFrame and then run `dubo.ask` on top of it.

In [5]:
import pandas as pd
import dubo


# Grab a subset of data grouped by ZIP code from the 2021 American Community Survey
DATA_URL = (
    "https://raw.githubusercontent.com/ajduberstein/"
    "geo_datasets/master/2021_5_yr_acs.csv"
)
census_df = pd.read_csv(DATA_URL)
census_df['zip_code'] = census_df['zip_code'].apply(lambda x: str(x).zfill(5))
census_df.head()

Unnamed: 0,tot_pop,elderly_pop,male_pop,female_pop,white_pop,black_pop,native_american_pop,asian_pop,two_or_more_pop,hispanic_pop,...,pop_35_to_44_years,pop_45_to_54_years,pop_55_to_59_years,pop_60_to_64_years,pop_65_to_74_years,pop_75_to_84_years,pop_85_years_and_over,per_capita_income,median_income_for_workers,zip_code
0,17126.0,3478.0,8451.0,8675.0,15249.0,358.0,111.0,2.0,888.0,17038.0,...,1967.0,2350.0,1237.0,1282.0,1986.0,1088.0,404.0,7587.0,12541.0,601
1,37895.0,7768.0,18588.0,19307.0,35571.0,10754.0,9157.0,46.0,12405.0,35649.0,...,4680.0,5082.0,2736.0,3130.0,4605.0,2349.0,814.0,10699.0,14180.0,602
2,49136.0,11025.0,23817.0,25319.0,39975.0,2621.0,669.0,61.0,3750.0,48121.0,...,5962.0,6312.0,3259.0,3467.0,6225.0,3774.0,1026.0,12280.0,17449.0,603
3,5751.0,1309.0,2817.0,2934.0,3488.0,137.0,21.0,0.0,261.0,5710.0,...,691.0,731.0,385.0,442.0,760.0,273.0,276.0,8574.0,15565.0,606
4,26153.0,5423.0,12678.0,13475.0,24015.0,6882.0,5659.0,30.0,8216.0,25053.0,...,3295.0,3688.0,1649.0,1944.0,3010.0,1952.0,461.0,11638.0,16262.0,610


In [6]:
dubo.ask(
    "What's the most populous ZIP code in the United States?", census_df, verbose=True
)

TypeError: Invalid type for url.  Expected str or httpx.URL, got <class 'NoneType'>: None

How does it work? Internally, the library converts your query into a web request to our backend, where it is translated to SQL based on a combination of OpenAI's GPT-4 and other models. Your dataframe is loaded into an in-memory [SQLite3 database](https://www.sqlite.org/index.html) and the SQL returned from the server is then executed in this SQLite instance.

In [None]:
dubo.ask(
    "What are the ten ZIP codes with the largest Hispanic "
    "populations in the United States?",
    census_df,
    verbose=True,
)

In [None]:
dubo.ask(
    "Where is the wealthiest place in the US that is not majority white?",
    census_df,
    verbose=True,
)

## dubo.chart

Generate charts, using either [pydeck.gl](https://pydeck.gl/) for maps or [Vega-Altair](https://altair-viz.github.io/gallery/index.html) for charts.

In [None]:
import dubo
dubo.chart(
    "A scatterplot of male vs female population, with substantial opacity on the dots. "
    "If a dot is more male than female, make it orange.", census_df, verbose=True)

We can also specify the chart type explicitly, rather than let dubo infer it, like we will do on this dataset of power plants.

In [None]:
power_df = pd.read_csv(open_url("https://raw.githubusercontent.com/ajduberstein/geo_datasets/master/global_power_plant_database.csv"))
power_df.tail()

In [None]:
import dubo

dubo.chart(
    "A scatterplot of powerplants, zoomed out",
    power_df,
    verbose=True,
    specify_chart_type='DECK_GL',
    notebook_display=True
)

## Using API Keys

By [contacting us](founders@dubo.gg), you can connect our product directly to a database and then query against it. The API is modular–you can run a full text-to-SQL pipeline and extract results from a query ui, simply generate the SQL and not execute it, or just grab the tables that would be relevant for a particular query. In addition to these design benefits, you also get higher quality SQL than would be available in our free library.

The example below operates on the 400+ tables of [MusicBrainz](https://musicbrainz.org/doc/MusicBrainz_Database), a crowd-sourced music catalog used in Spotify and elsewhere.

In [None]:
import dubo
from dubo.config import set_dubo_key

# Demo API key
dubo.config.set_dubo_key('pk.f7345174d27f4dbc908afadbaa7d69af')
dubo.query("How many songs belong to artists that began their careers in New York?")

In [None]:
# Just grab the raw SQL
dubo.generate_sql("How many songs belong to artists that began their careers in New York?")

In [None]:
# Isolate to the tables that may be relevant for the query
dubo.search_tables("How many songs belong to artists that began their careers in New York?")