# Exploring Data with Lumen

This notebook explains how you can use Lumen to explore (tabular) data in  [lumen](https://github.com/lumen-org/lumen).
Generally, Lumen allows you to explore 'models'.
Here, the term 'model' is used in the sense of statistical modelling, where a model typically aims to describe a
particular process that generated data well enough.
If you want to look at data only, you can do so, but you have to wrap it in a special, very simple model. We will call
these models 'data-models', which tries to emphasis the fact that these models are actually not much more than the
data itself.

Let's get started!

## Overview

We will use `modelbase` as the back-end and `lumen` as the front-end.

### Back-end
The backend serves two purposes:
 1. manage models, watch a specific directory for new models/changed models, and
 2. provide an API to run complex inference queries on models.

You may use the API directly (see also: TODO) to run inference queries, however, in many
cases it maybe much more convenient to use the front-end instead.
If you wonder what queries are, then you may imagine them as specific questions, that you ask the model.
Here are some examples:

  * 'How does that marginal distribution of the variable "age" look like?'
  * 'What is the most likely value for "income" given that a person has "low education"?'
  * 'What do samples drawn from the model look like for the variables "east-west" and "age"?

### Front-end
The front-end gives you a visual interactive interface to configure, run and visualize a wide
range of possibly complex queries.
It does not require any programming from your side. The front-end connects
to an instance of the backend to actually execute any queries.

To this end, there are two model types available, a 'empirical model' and a kernel-density-estimator (KDE) model.

This script let's you quickly:
 * create an empirical model from given csv/pd.DataFrame
 * store the model at the correct location for your local Lumen instance to load it

## Example Data Exploration Workflow

To work with Lumen we simply have to:
 1. start the back-end,
 2. start the front-end, and
 3. wrap your data into a data model and save it in the folder `models_path` - the folder that is watched by the backend.

#### Back-End
The backend watches for changes in a folder.
Run the following on a separate console to start the backend and let it watch models in the specified folder:

```
cd <dir-where-you-cloned-the-backend-source-to>
python3 bin/webservice.py --d jupyter/models_data_exploration_example
```

#### Front-End
The front-end is by default configured to use a local backend, that is, you don't have to do anything, but run it.
Simply open its `index.html` in the base directory with a browser (preferably, chrome/chromium-based).

Now, backend and frontend are ready. Let's start with the modelling workflow and create some models... !

## load modelbase back-end

In [15]:
# make sure you run in the correct python environment where mb_modelbase is installed or this will fail!
import mb.modelbase as mbase

## prepare data

In [18]:
# we use some standard data about car and car engine properties
from mb.data import mpg
df = mpg.mixed()
df.head()

Unnamed: 0,transmission,cylinder,turbo,car_size,year,mpg_city,mpg_highway,displacement
0,manual,few,False,1,1984,23.0,35.0,120.0
1,manual,few,False,1,1984,25.0,36.0,91.0
2,auto,medium,False,1,1984,16.0,28.0,350.0
3,manual,medium,False,1,1984,16.0,28.0,350.0
4,auto,medium,True,1,1984,19.0,28.0,181.0


In [13]:
# this is where you could prep / modify / analyse your data

# at the moment Lumen does not support the boolean type directly. but you can simply change it to a string type...
for col in ['turbo']:
    df[col] = df[col].astype('str')

# ...

## wrap DataFrame in data-model

To look at the data in Lumen, just wrap it and save it.

In [14]:
# configuration
modelname = 'mpg_data_model'
output_directory = './models_data_exploration_example/'

# make model and store it
model = mbase.make_empirical_model(modelname, output_directory, df=df)

# very brief summary of your model
print(model.__short_str__())

# this is just some random query to test that the models is probably ready to be used
model.aggregate(method='maximum')

mpg_data_model(#transmission,#cylinder,#turbo,#car_size,±year,±mpg_city,±mpg_highway,±displacement)


['manual',
 'few',
 'False',
 'compact cars',
 1995.0520879445182,
 17.466467463479415,
 23.536741921204072,
 118.16603585657066]

The moment a new model is saved the watched directory, the front-end will show a pop-up informing you about the new
model (data-model). Click on it, and may can start exploring it.


### if it doesn't work

You should make sure that:
 * the model is saved in the correct folder (i.e. where Lumen loads its models from)
 * that the backend is running at all!
