# Midas Tutorial
Midas is a Jupyter notebook library/extension that aids data exploration by providing relevant static  visualizations. The key of Midas is that **the operations you perform in the interactive visualization space is also reflected in code space**---you will see what this means if you run the code cells below!

## Dataframe Operations
Midas is a special dataframe with syntax using that of the [data science module](http://data8.org/datascience/) from Data 8. Thw following are common operations that might be useful for querying:

* SELECT: `df.select(['col_name', 'more_col_name'])` --- Note that columns are referenced as strings.
* WHERER: `df.where('col_name', predicate)` -- the predicates are using lambda functions provided in the [`are`](http://data8.org/datascience/predicates.html) library, such as `are.above(8)` (as opposed to function overloading as seen in pandas, like `df[df['a']>8]`. If you wish to compare two columns, then you can use `.where('col1', preidcate, 'col2')`, such as `marbles.where("Price", are.above, "Amount")`.
* GROUP BY: `df.group('col_name', agg_fun)`, the default aggregation for a `group` is count, but you can also supple the aggregation by using existing aggregation functions such as Python's built in `sum`, `min`, `max` (or any of the `numpy` aggregation methods that work on arrays)
* Apply general methods: `df.apply(map_fun, new_column_name)` -- for instance, if you want to derive a new column that was the original column plus 1, with the new column called "incremented", the function you can call is `df.apply(lambda x: x + 1, 'incremented')`.

The following are useful for data modification:
* `append_column(label, values)` appends a new column, note that values must be created via `make_array` (so that it's numpy compliant) 
* `append(array_of_new_values)` appends a new row

Note that you can also access the columns as numpy arrays by using `df['col_name']`, which can be handy to use methods like `np.average(df['col_name'])`.

## Initiate Midas
Import the library and create an instance, `m = Midas()`, and we call the Midas runtime variable. Per a single notebook, you can only have one Midas instsance.
Then you will see that a dashboard-like area pops up to the right. You will see three areas, one is that of the data (yellow pane), showing the dataframes with acommpanying columns, and the others are the charts.

In [3]:
from midas import Midas
m = Midas()

# other utility libraries
import numpy as np
from datascience import Table, make_array
from datascience.predicates import are

## Load data
Midas takes in data from a few APIs, such as `from_df`, used below, which loads from pandas dataframe.
Note that you can also use

In [1]:
import pandas as pd
import sqlite3
path = "/Users/yifanwu/Dev/diel-db-server-examples/server/sample-data/fire.sqlite"
con = sqlite3.connect(path)
fires_pd_df = pd.read_sql_query("SELECT * from small_fires", con)

In [4]:
fires_df = m.from_df(fires_pd_df)

## Getting distribution from clicking on the columns pane
Go ahead and click on the "STATE" column. After you click, two effects take place:
1. a cell will be created that contains dataframe calls that derives the new filtered values, as well as the visualization calls. You will see that they have color emoji such as 🟠, these are indicators for you to better visually navigate.
2. a chart is created that visualizes the data created in the pane on the right hand side

If the chart is the wrong encoding, or if the groupign query is inacurate, fell free to modify the queries---the results will be reflected in the chart automatically.

In [5]:
STATE_distribution = fires_df.group('STATE')

In [6]:
# 🟠 02:20 PM 🟠
STATE_distribution.show(shape='bar', x='STATE', y='count')

In [None]:
fires_df.select("fire_size", "duration")

## Cleaning Data and Reactive State
Often, the data requires some trimming and modification for analysis to continue. For instance, from the distribution of fires, you notice that only a couple fire sizes are extreme outliers, and you decide to ignore these points. 

However, you might want to keep the previous visualizations and selections, for this, you can use the `update` method to **synchronize state**, where the charts would directly relfect the result of the changes.  In the cases where the selections are no longer relevant, such as when the relevant column is deleted, the charts will be deleted, but the cells will remain.  You can of course create a new dataframe from which to derive charts from, in order to preserve the old ones.  Note that you cannot update derived dataframes. So in our tutorial, only `fires_df` can be updated.

In [None]:
# For instance, if you decide that the extremely large fire sizes are disruptive to your reasoning process,
#   you can just remove them
fires_df.update(fires_df.where("FIRE_SIZE", are.below(1000)))

## Making selections
All the existing visualizations are equipped with the ability to **select**.

* With scatter plots, you can **brush** select on both the x and y axis.
* With bar charts, you can either brush to select the x axis items or click.
* With line charts, you can brush to select a range on the x axis.

When you perform a selection, you will observe two effects
1. a cell will be generated with the selections you have made 


In [9]:
# 🔵 02:20 PM 🔵
m.make_selections([{"STATE_distribution": {"STATE": ["CA"]}}])

## Navigating selections

You will see that your selections are shown in the selection pane (blue). You can rename and click on the selections to make the selections again.

## Accessing selections programmatically

Access selection in **predicate** form from the Midas runtime variable, `m` (you can assign it other names if you wish).
- most recent selection: `m.current_selection`
- all selections made in the past: `m.selection_history`

Access selection results in **data** form, you have the following options:
- access specific charts by the `<chart_name>.filtered_value`


In [None]:
m.current_selection

In [None]:
m.selection_history

In [None]:
STATE_distribution.filtered_value

## Creating custom interactive visualizations

You could bind functions that are invoked after each selection events; the function can either update a chart, modify state, or print. THe function is passed to the Midas instance by invoking this function: `m.bind_to_selection(cb)`.

In [None]:
# The following is an example using this feature
# we want to see, interactively, the average size of the rows selected
average_df = None

def cb(selection, step):
    temp_avg = np.average(fires_df.select('FIRE_SIZE').apply_selection(selection))
    global average_df
    if average_df:
        average_df = Table().with_columns(["step",[step], "temp", [temp_avg]])
    else:
        average_df = temp_df.append([step, temp_avg])
    return df

## Using Joins for Analysis

When performing analysis we often want to connect different sources of information. For instance, for this analysis, we might be interested in locating whether the number of fire has to do with average rainfall or temperatures.

Even with joins, Midas can help you "link" the relevant tables together, given that you provide the information for how the two tables can be joined together, using the API, `a_df.can_join(another_df, 'column_name')`, where the two dataframes share teh same column name.

In [None]:
# load data from a csv file
temperature_df = m.read_table("/Users/yifanwu/Dev/midas/notebook/data/state_temp.csv")
# you can perform basic data cleaning 
state_dict = {"Alabama":"AL", "Alaska":"AK", "Arizona":"AZ", "Arkansas":"AR", "California":"CA", "Colorado":"CO", "Connecticut":"CT", "Delaware":"DE", "Florida":"FL", "Georgia":"GA", "Hawaii":"HI", "Idaho":"ID", "Illinois":"IL", "Indiana":"IN", "Iowa":"IA", "Kansas":"KS", "Kentucky":"KY", "Louisiana":"LA", "Maine":"ME", "Maryland":"MD", "Massachusetts":"MA", "Michigan":"MI", "Minnesota":"MN", "Mississippi":"MS", "Missouri":"MO", "Montana":"MT", "Nebraska":"NE", "Nevada":"NV", "New Hampshire":"NH", "New Jersey":"NJ", "New Mexico":"NM", "New York":"NY", "North Carolina":"NC", "North Dakota":"ND", "Ohio":"OH", "Oklahoma":"OK", "Oregon":"OR", "Pennsylvania":"PA", "Rhode Island":"RI", "South Carolina":"SC", "South Dakota":"SD", "Tennessee":"TN", "Texas":"TX", "Utah":"UT", "Vermont":"VT", "Virginia":"VA", "Washington":"WA", "West Virginia":"WV", "Wisconsin":"WI", "Wyoming":"WY"}
temperature_df.append_column('STATE', table.apply(lambda x: state_dict(x), 'State'))

In [None]:
# providing Midas with join information.
fires_df.can_join(temperature_df, 'STATE')

## Notes on Midas Designs

**Cell generation**: the newly generated cells will keep on appending to the document in an order that respects the dependencies of there original position. Note a few corner cases:
- if you move the cells out of order, the cells will respect only the order that they were orginally created.
- if a cells are the same, then the same cell will be replaced.
