In this tutorial, we will look at the different data types in Lux and how to change them. The data type that Lux detects is used to suggest different recommendations. We will be using several different datasets for this tutorial.

In [None]:
import pandas as pd
import lux

In [None]:
lux.config.default_display="lux"

In [None]:
df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/college.csv?raw=true")
df

To find what Lux has determined as the data type, use `df.data_type`. This is a dictionary object.

In [None]:
df.data_type

## Quantitative
The Quantitative data type is used when there is a count or measure of a certain attribute. 
In the example above, the column `AcceptanceRate` is quantitative because it is a measure. 
Also, any aggregate such as means and medians will be categorized as quantitative. 


In [None]:
df

## Nominal

The Nominal data type is for categorical data. For example, PredominantDegree is nominal because rather than being an explicit measure, it describes an attribute. In this case, there are three possible values: Associate, Bachelor’s, and Certificate. Lux displays these variables under the Occurrence tab as bar charts for the number of occurrences (see above.

In [None]:
df

## Temporal

The Temporal Data Type is used when Lux thinks based on either the format of the data passed in or the title of the column that the data in that column is time-sensitive. Again, setting intent on the temporal column, we can see line graphs that reflect the temporality of the data.

In [None]:
df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/car.csv?raw=true")
df["Year"] = pd.to_datetime(df["Year"], format="%Y")
df

## ID

The ID data type is chosen for any column that looks like an ID and shouldn't be plotted. For example, zip code, user ID, etc. Clicking on the Warning Sign shows that Lux inferred an attribute as an ID and thus, did not plot it.

In [None]:
df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/aug_test.csv?raw=true")
df

## Changing the Inferred Data Type

Sometimes, Lux incorrectly identifies the correct data. To fix this, we can use `df.set_data_type`. Here is an example:

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/communities.csv?raw=true")
df

In [None]:
df.data_type

As you can see, the `state` column is marked as `quantitative`. Thus, the visualizations created with it reflect a `quantitative` variable.

In [None]:
from lux.vis.Vis import Vis

In [None]:
Vis(["state"], df)

However, it makes more sense, because the numbers are a mapping to actual states, for this column to be marked as a `nominal` column. We can fix it using the code below:

In [None]:
df.set_data_type({"state":"nominal"})

In [None]:
df.data_type

Here, we see that the `data_type` has changed to accomodate our fix. Setting intent on this column, we see that it behaves like any other `nominal` column.

In [None]:
Vis(["state"], df)