# Basic Walkthrough

In [None]:
import pandas as pd
import lux

In [None]:
# Collecting basic usage statistics for Lux (For more information, see: https://tinyurl.com/logging-consent)
lux.logger = True # Remove this line if you do not want your interactions recorded

We first load in the [Cars dataset](http://lib.stat.cmu.edu/datasets/) with 392 different cars from 1970-1982, which contains information about its Horsepower, MilesPerGal, Acceleration, etc.

In [None]:
df = pd.read_csv("../data/car.csv")
df["Year"] = pd.to_datetime(df["Year"], format='%Y') # change pandas dtype for the column "Year" to datetype

We print out the dataframe, we see the default Pandas display and we can toggle to a set of recommendations generated by Lux. 
Lux returns four sets of visualizations to show an overview of the dataset.


In [None]:
df

Here we spot this scatterplot visualization `Acceleration` v.s. `Horsepower`.
Intuitively, we expect cars with higher horsepower to have higher acceleration, but we are actually seeing the opposite of that trend. 

Let's learn more about whether there are additional factors that is affecting this relationship.
Using the `intent` property, we indicate to Lux that we are interested in the attributes `Acceleration` and `Horsepower`.

In [None]:
df.intent = ["Acceleration","Horsepower"]
df

On the left, we see that the Current Visualization corresponds to our specified intent.
On the right, we see different tabs of recommendations: 
- `Enhance` shows what happens when we add an additional variable to the current selection
- Then, we have the `Filter` tab, which adds an additional filter, while fixing the selected variables on the X Y axes
- Finally, we have `Generalize` which removes one of the attributes, to display the more general trend.


We can really quickly compare the relationship between `Horsepower` and `Acceleration` when an additional factor is in play. For example, we see here that `Displacement` and `Weight` is higher on the top left and lower towards the bottom right, whereas `MilesPerGal` has the opposite effect.

We see that there is a strong separation between cars that have 4 cylinders (Orange) and cars with 8 cylinders (green), so we are interested in learning more about the attribute `Cylinders`.

We can take a look at this by inspecting the `Series` corresponding to `Cylinders`. Note that Lux not only helps with visualizing dataframes, but also displays visualizations of `Series` objects.

In [None]:
df["Cylinders"]

The Count distribution shows that there is not a lot of cars with 3 and 5 cylinders, so let's clean the data up to remove those.

In [None]:
df[df["Cylinders"]==3].to_pandas()

In [None]:
df[df["Cylinders"]==5].to_pandas()


We can easily clean up the data and update recommendations in Lux, due to the tight integration with Pandas. 
Note that the intent here is kept as what we set earlier (i.e., `Horsepower`, `Acceleration`).

In [None]:
df = df[(df["Cylinders"]!=3) & (df["Cylinders"]!=5)]
df

Let's say we find the time series showing the number of cars of different `Cylinder` over time to be very interesting. In particular, there seems to be a spike in production of 4 Cylinder cars in 1982. To dig into this more, we can export it by selecting the visualization and clicking on the export button.

In [None]:
vis = df.exported[0]
vis

We can then print out the visualization code in [Altair](https://altair-viz.github.io/) that generated this chart so that we can further tweak the chart as desired.

In [None]:
print(vis.to_Altair())

In [None]:
import altair as alt
import pandas._libs.tslibs.timestamps
from pandas._libs.tslibs.timestamps import Timestamp
visData = pd.DataFrame({'Year': {0: Timestamp('1970-01-01 00:00:00'), 1: Timestamp('1970-01-01 00:00:00'), 2: Timestamp('1970-01-01 00:00:00'), 3: Timestamp('1971-01-01 00:00:00'), 4: Timestamp('1971-01-01 00:00:00'), 5: Timestamp('1971-01-01 00:00:00'), 6: Timestamp('1972-01-01 00:00:00'), 7: Timestamp('1972-01-01 00:00:00'), 8: Timestamp('1972-01-01 00:00:00'), 9: Timestamp('1973-01-01 00:00:00'), 10: Timestamp('1973-01-01 00:00:00'), 11: Timestamp('1973-01-01 00:00:00'), 12: Timestamp('1974-01-01 00:00:00'), 13: Timestamp('1974-01-01 00:00:00'), 14: Timestamp('1974-01-01 00:00:00'), 15: Timestamp('1975-01-01 00:00:00'), 16: Timestamp('1975-01-01 00:00:00'), 17: Timestamp('1975-01-01 00:00:00'), 18: Timestamp('1976-01-01 00:00:00'), 19: Timestamp('1976-01-01 00:00:00'), 20: Timestamp('1976-01-01 00:00:00'), 21: Timestamp('1977-01-01 00:00:00'), 22: Timestamp('1977-01-01 00:00:00'), 23: Timestamp('1977-01-01 00:00:00'), 24: Timestamp('1978-01-01 00:00:00'), 25: Timestamp('1978-01-01 00:00:00'), 26: Timestamp('1978-01-01 00:00:00'), 27: Timestamp('1979-01-01 00:00:00'), 28: Timestamp('1979-01-01 00:00:00'), 29: Timestamp('1979-01-01 00:00:00'), 30: Timestamp('1980-01-01 00:00:00'), 31: Timestamp('1980-01-01 00:00:00'), 32: Timestamp('1980-01-01 00:00:00'), 33: Timestamp('1982-01-01 00:00:00'), 34: Timestamp('1982-01-01 00:00:00'), 35: Timestamp('1982-01-01 00:00:00')}, 'Cylinders': {0: 8, 1: 6, 2: 4, 3: 8, 4: 6, 5: 4, 6: 8, 7: 6, 8: 4, 9: 8, 10: 6, 11: 4, 12: 8, 13: 6, 14: 4, 15: 6, 16: 4, 17: 8, 18: 4, 19: 6, 20: 8, 21: 4, 22: 6, 23: 8, 24: 8, 25: 4, 26: 6, 27: 4, 28: 6, 29: 8, 30: 6, 31: 8, 32: 4, 33: 4, 34: 8, 35: 6}, 'Record': {0: 18.0, 1: 4.0, 2: 7.0, 3: 7.0, 4: 8.0, 5: 12.0, 6: 13.0, 7: 0.0, 8: 14.0, 9: 20.0, 10: 8.0, 11: 11.0, 12: 5.0, 13: 6.0, 14: 15.0, 15: 12.0, 16: 12.0, 17: 6.0, 18: 15.0, 19: 10.0, 20: 9.0, 21: 14.0, 22: 5.0, 23: 8.0, 24: 6.0, 25: 17.0, 26: 12.0, 27: 12.0, 28: 6.0, 29: 10.0, 30: 2.0, 31: 0.0, 32: 23.0, 33: 47.0, 34: 1.0, 35: 10.0}})

chart = alt.Chart(visData).mark_line().encode(
    y = alt.Y('Record', type= 'quantitative', title='Number of Records'),
    x = alt.X('Year', type = 'temporal'),
)
chart = chart.interactive() # Enable Zooming and Panning
chart = chart.encode(color=alt.Color('Cylinders',type='nominal'))

chart = chart.configure_title(fontWeight=500,fontSize=13,font='Helvetica Neue')
chart = chart.configure_axis(titleFontWeight=500,titleFontSize=11,titleFont='Helvetica Neue',
			labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue',labelColor='#505050')
chart = chart.configure_legend(titleFontWeight=500,titleFontSize=10,titleFont='Helvetica Neue',
			labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue')
chart = chart.properties(width=160,height=150)

chart

This is obviously a lot of code, let's look at how easy it is if we were to specify this visualization intent in Lux if we knew what we were looking for.

In [None]:
from lux.vis.Vis import Vis
Vis(["Year","Cylinders"],df)

# Creating Visualizations 

In Lux, user can specify particular visualizations that they want to specify and visualize their data on-demand.

In [None]:
from lux.vis.Vis import Vis

In [None]:
Vis(["Horsepower"],df)

In [None]:
Vis(["Origin","Horsepower"],df)

In [None]:
Vis(["Origin",lux.Clause("Horsepower",aggregation="sum")],df)

You can also work with collections of Visualization via a `VisList` object.

In [None]:
from lux.vis.VisList import VisList

For example, we can create a set of visualizations of Weight with respect to all other attributes, using the wildcard “?” symbol.

In [None]:
VisList(["Horsepower","?"],df)

For more support and resources on Lux:
- Sign up for the early-user [mailing list](https://forms.gle/XKv3ejrshkCi3FJE6) to stay tuned for upcoming releases, updates, or user studies. 
- Visit [ReadTheDoc](https://lux-api.readthedocs.io/en/latest/) for more detailed documentation.
- Clone [lux-binder](https://github.com/lux-org/lux-binder) to try out these [hands-on exercises](https://github.com/lux-org/lux-binder/tree/master/exercise) or [tutorial series](https://github.com/lux-org/lux-binder/tree/master/tutorial) on how to use Lux.
- Join our community [Slack](https://lux-project.slack.com/join/shared_invite/zt-iwg84wfb-fBPaGTBBZfkb9arziy3W~g) to discuss and ask questions.
- Report any bugs, issues, or requests through [Github Issues](https://github.com/lux-org/lux/issues). 
