<a href="https://colab.research.google.com/github/lestermartin/starburst-dataframes-exploration/blob/main/StarburstPythonOptions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python options when using Trino & Starburst

This notebook is focused on showing the primary options available to Python programmers when using Starburst.

1.   [Python UDFs](https://trino.io/docs/current/udf/python.html)
2.   [Python client](https://github.com/trinodb/trino-python-client)
3.   [PyStarburst](https://docs.starburst.io/clients/python/pystarburst.html) (ONLY available with Starburst)
4.   [Ibis](https://ibis-project.org/)



## Trino Python user-defined functions

TODO - ramble about how this is avail with Trino, but for Starburst it is only on SEP and even there, as a public preview (not on Galaxy)

DON'T TRY TO RUN HERE... GIVE EXAMPLES OF SQL UDF AND PORTS TO PYTHON UDF (and tell them to run in Starburst UI (or Trino CLI))

In [None]:
import getpass

# grab credentials from the notebook user to be used when making a connection
my_host = input("Host name")
my_username = input("User name")
my_password = getpass.getpass("Password")



## Trino Python client

TODO - RAMLBE...


In [None]:
pip install trino

In [None]:
from trino.dbapi import connect
from trino.auth import BasicAuthentication

# sanity check
print('\n Make sure the phrase ** CONNECTION IS GOOD ** displays \n')


# build the connection object with the hostname & creds inputed earlier
conn = connect(
    host=my_host,
    port="443",
    user=my_username,
    auth=BasicAuthentication(my_username, my_password),
    http_scheme="https",
    catalog="system",
    schema="runtime",
)
cur = conn.cursor()
cur.execute("SELECT '** CONNECTION IS GOOD **'")
rows = cur.fetchall()
print(rows)

In [None]:
import pandas as pd


cur = conn.cursor()
cur.execute("SELECT * FROM tpch.tiny.nation")
rows = cur.fetchall()
col_name = [desc[0] for desc in cur.description]
pdf = pd.DataFrame(rows, columns=col_name)

pdf




## PyStarburst

TODO - RAMLBE...

In [None]:
import trino

from pystarburst import Session
from pystarburst import functions as F
from pystarburst.functions import *
from pystarburst.window import Window as W

# PyStarburst setup
session_properties = {
    "host":my_host,
    "port": 443,
    "http_scheme": "https",
    "auth": trino.auth.BasicAuthentication(my_username, my_password)
}
session = Session.builder.configs(session_properties).create()

# validate PyStarburst working
print('\n Make sure the phrase ** CONNECTION IS GOOD ** displays \n')
session.sql("select '** CONNECTION IS GOOD **' as conn_check").collect()

In [None]:
# Example 1 (PyStarburst): For each nation, get top X (rn <= X) customers by acctbal.
# Style: DataFrame API modeled after PySpark; also supports session.sql("...")
# Tables: tpch.tiny.customer, tpch.tiny.nation

# Fully qualified names are explicit and avoid relying on default catalog/schema

from pystarburst.window import Window

customer = session.table(f"tpch.tiny.customer")
nation = session.table(f"tpch.tiny.nation") \
            .drop("regionkey", "comment") \
            .rename("name", "nation_name") \
            .rename("nationkey", "n_nationkey")

filtered = customer.select("custkey", "name", "acctbal", "nationkey") \
            .filter(col("acctbal") > 8000.0)
joined = filtered.join(nation, col("nationkey") == nation.n_nationkey) \
            .drop("nationkey", "n_nationkey")

w = Window.partitionBy("nation_name").orderBy(col("acctbal").desc())
ranked = joined.select("*", row_number().over(w).alias("rn"))
top_x = ranked.filter(col("rn") <= 1).sort(col("acctbal").desc(), col("nation_name"))
top_x.show(25)

## Ibis

TODO - RAMLBE...

In [None]:
#
# Install the library
#

%pip install trino
%pip install 'ibis-framework[trino]'
%pip install pystarburst

In [None]:
import os
import ibis
from trino.auth import BasicAuthentication

ibis.options.interactive = True


user = my_username
trino_auth_obj = BasicAuthentication(my_username, my_password)
host = my_host
port = "443"
http_scheme = "https"
catalog = "tpch"
schema = "tiny"

con = ibis.trino.connect(
    user=user, auth=trino_auth_obj, host=host, port=port, http_scheme=http_scheme, database=catalog, schema=schema
)


print('\n Make sure the phrase ** CONNECTION IS GOOD ** displays \n')
con.sql("select '** CONNECTION IS GOOD **' as conn_check")



In [None]:
nations = con.table("nation")
nations[0:50]