In [None]:
import micropip

await micropip.install(["pandas", "shapely", "pyproj", "pydeck", "dubo"])

### The dubo Python SDK

- [dubo.ask](#dubo.ask)
- [dubo.chart](#dubo.chart)
- [Using API keys](#Using%20API%20keys)

#### Usage notes

- All requests are rate-limited–if you need a higher rate limit please contact support@dubo.gg.

## dubo.ask

The dubo library includes convenience functions for data professionals to run queries on top of CSVs.

For example, we can load the US Census data below into a Pandas DataFrame and then run `dubo.ask` on top of it.

In [None]:
dubo.__version__.__version__

In [None]:
import pandas as pd
import dubo


# Grab a subset of data grouped by ZIP code from the 2021 American Community Survey
DATA_URL = (
    "https://raw.githubusercontent.com/ajduberstein/"
    "geo_datasets/master/2021_5_yr_acs.csv"
)
census_df = pd.read_csv(DATA_URL)
census_df['zip_code'] = census_df['zip_code'].apply(lambda x: str(x).zfill(5))
census_df.head()

In [None]:
dubo.ask(
    "What's the most populous ZIP code in the United States?", census_df, verbose=True
)

How does it work? Internally, the library converts your query into a web request to our backend, where it is translated to SQL based on a combination of OpenAI's GPT-4 and other models. Your dataframe is loaded into an in-memory [SQLite3 database](https://www.sqlite.org/index.html) and the SQL returned from the server is then executed in this SQLite instance.

In [None]:
dubo.ask(
    "What are the ten ZIP codes with the largest Hispanic "
    "populations in the United States?",
    census_df,
    verbose=True,
)

In [None]:
dubo.ask(
    "Where is the wealthiest place in the US that is not majority white?",
    census_df,
    verbose=True,
)

## dubo.chart

Generate charts, using either [pydeck.gl](https://pydeck.gl/) for maps or [Vega-Altair](https://altair-viz.github.io/gallery/index.html) for charts.

Our backend gives its best guess on if you are requesting a chart or a map, but occasionally gets this wrong. You can always force the API to give a map or a tabular chart by passing

In [None]:
import dubo
dubo.chart(
    "A scatterplot of male vs female population, with substantial opacity on the dots. "
    "If a dot is more male than female, make it orange.", census_df, verbose=True)

We can also specify the chart type explicitly, rather than let dubo infer it, like we will do on this dataset of power plants.

In [None]:
power_df = pd.read_csv("https://raw.githubusercontent.com/ajduberstein/geo_datasets/master/global_power_plant_database.csv")
power_df.tail()

In [None]:
import dubo
dubo.chart("A scatterplot of powerplants, zoomed out", power_df, verbose=True, specify_chart_type='DECK_GL')

## Using API Keys

By [contacting us](founders@dubo.gg), you can connect our product directly to a database and then query against it. The API is modular–you can run a full text-to-SQL pipeline and extract results from a query ui, simply generate the SQL and not execute it, or just grab the tables that would be relevant for a particular query. In addition to these design benefits, you also get higher quality SQL than would be available in our free library.

The example below operates on the 400+ tables of [MusicBrainz](https://musicbrainz.org/doc/MusicBrainz_Database), a crowd-sourced music catalog used in Spotify and elsewhere.

In [1]:
import os

os.environ['DUBO_ENV'] = "development"
# os.environ['DUBO_ENV'] = "prod"
dubo_env = os.environ.get('DUBO_ENV', 'Not Set')
print(f'DUBO_ENV is set to: {dubo_env}')


DUBO_ENV is set to: development


In [2]:
import dubo
from dubo.config import get_dubo_key, set_dubo_key, BASE_API_URL

# Demo API key
# dubo.config.set_dubo_key('pk.f7345174d27f4dbc908afadbaa7d69af')
# Dayton's local API key
dubo.config.set_dubo_key('pk.ecebf40f19614c809368e8ee0955f226')

Dubo | Examples: https://dubo.gg/ | Discord: https://discord.gg/Cw7rfpkD | Privacy policy: https://mercator.tech/privacy


In [3]:
dubo.config.get_dubo_key()

'pk.ecebf40f19614c809368e8ee0955f226'

In [4]:
print(BASE_API_URL)

http://localhost:8080/api/v1/dubo


In [None]:
dubo.query("How many songs belong to artists that began their careers in New York?")

In [None]:
# Just grab the raw SQL
dubo.generate_sql("How many songs belong to artists that began their careers in New York?")

In [None]:
# Isolate to the tables that may be relevant for the query
dubo.search_tables("How many songs belong to artists that began their careers in New York?")

In [6]:
import os

file_path = os.path.expanduser('~/mercator/dubo-api/test/handlers/fixtures/documentation.txt')

with open(file_path, 'rb') as f:
    doc_id.id = dubo.create_doc(
        file=f, 
        shingle_length=1000, 
        step=500
    )
    print('doc_id')
    print(doc_id)

print('get_doc')
print(dubo.get_doc(data_source_documentation_id=doc_id))
print('get_all_docs')
print(dubo.get_all_docs())
print('update_doc')
print(dubo.update_doc(data_source_documentation_id=doc_id, file_path=file_path))
print('delete_doc')
print(dubo.delete_doc(data_source_documentation_id=doc_id))

AttributeError: 'dict' object has no attribute 'id'