# Simple notebook playing around with Pandas

---
## Config & setup

### Galaxy cluster & user credentials

Run the next cell, but realize that it does NOT actually validate your values.

In [None]:
import getpass

# grab credentials from the notebook user to be used when making a connection
host = input("Host name")
username = input("User name")
password = getpass.getpass("Password")

### Setup PyStarburst session

Should return `[Row(Working='Yes')]` if functional.  If an exception is raised, 
it is likely due to incorrect cluster and/or credentials values.

In [None]:
import trino

from pystarburst import Session
from pystarburst import functions as F
from pystarburst.functions import *
from pystarburst.window import Window as W

# PyStarburst setup
session_properties = {
    "host":host,
    "port": 443,
    "http_scheme": "https",
    "auth": trino.auth.BasicAuthentication(username, password)
}
session = Session.builder.configs(session_properties).create()

# validate PyStarburst working
session.sql("select 'Yes' as Working").collect()

---
## Let's play

In [3]:
# create a Pandas DF from a PyStarburst DF
#  https://pystarburst.eng.starburstdata.net/dataframe.html#pystarburst.dataframe.DataFrame.to_pandas

pandas_df = session.create_dataframe([[1, "a", 1.0], [2, "b", 2.0]]).to_df("id", "value1", "value2").to_pandas()
display(pandas_df)

Unnamed: 0,id,value1,value2
0,1,a,1.0
1,2,b,2.0


In [4]:
# create a PyStarburst DF from a Pandas DF
#  https://pystarburst.eng.starburstdata.net/session.html#pystarburst.session.Session.createDataFrame

pystarburst_df = session.create_dataframe(pandas_df.to_dict('records'))
pystarburst_df.show()

------------------------------
|"id"  |"value1"  |"value2"  |
------------------------------
|1     |a         |1.0       |
|2     |b         |2.0       |
------------------------------

