First we will download some data and insert it into our Postgresql database to play with. This code chunk is only meant to populate a database table in this environment, you do not need to have your data within a pandas dataframe to take advantage of Lux' capabilities. 

In [None]:
import pandas as pd
import lux
from sqlalchemy import create_engine

data = pd.read_csv('https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/college.csv')
engine = create_engine("postgresql://testuser:testpass@localhost:5432/testdb")
data.to_sql(name='college', con=engine, if_exists = 'replace', index=False)

Once we have created a postgresql connection, we can now create a LuxDataframe and connect it to our database. You can now take advantage of all of Lux's visualization recommendation system without having to pull the table locally.

In [None]:
sql_df = lux.LuxDataFrame()
lux.config.set_SQL_connection(engine)
sql_df.set_SQL_table("college")
sql_df

Looking at Lux' recommendations we see that the information about ACTMedian and SATAverage has a very strong correlation.From the Category tab, we see that there are few records where PredominantDegree is "Certificate". In addition, there are not a lot of colleges with "Private For-Profit" as FundingModel.

We are interested in picking a college to attend and want to understand the AverageCost of attending different colleges and how that relates to other information in the dataset.

In [None]:
sql_df.intent = ["AverageCost"]
sql_df

We see that there are a large number of colleges that cost around $20000 per year. Scrolling through the Enhance tab, we also see that Bachelor degree colleges and colleges in New England and large cities tend to have a higher AverageCost than its counterparts.

We are interested in the trend of AverageCost v.s. SATAverage since there is a rough upwards relationship above AverageCost of $30000, but below that the trend is less clear.

In [None]:
sql_df.intent = ["AverageCost","SATAverage"]
sql_df

By adding the FundingModel, we see that the cluster of points on the left can clearly be attributed to public colleges, whereas private colleges more or less follow a trend that shows that colleges with higher SATAverage tends to have higher AverageCost.

We can also leverage Lux' vis library to quickly create visualizations from our database data. 

In [None]:
from lux.vis.Vis import Vis
from lux.vis.Vis import Clause

x_clause = Clause(attribute = "AdmissionRate", channel = "x")
y_clause = Clause(attribute = "AverageCost", channel = "y")
color_clause = Clause(attribute = "FundingModel", channel = "color")

new_vis = Vis([x_clause, y_clause, color_clause], sql_df)
new_vis