# Information Visualization, Altair Transforms 101
Licia He, Eytan Adar,
Sereen Kallerackal, Dallas Card

School of Information, University of Michigan

## Plan
1. Warmup
2. Layering 
2. Transform/Filter/Aggregate 

## Resources 
*  [Transform documentation](https://altair-viz.github.io/user_guide/transform/index.html)
*  [UW course examples](https://github.com/uwdata/visualization-curriculum/blob/master/altair_data_transformation.ipynb)
* [Reshaping documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html)
* We've also created an additional lecture that shows you how to map ideas in Pandas to Altair (and vice versa)

# Altair Warmup
## Altair Week 2 (part 1)

### Import 

In [None]:
# imports we will use

import altair as alt
import pandas as pd
from vega_datasets import data as vega_data

# grab the data and clean it a bit
movies_url = vega_data.movies.url
movies = pd.read_json(movies_url)
movies.columns = movies.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

## Warmup

Last week, we covered the basic of grammar of graphics. To create a chart, we need to specify the 

1.   Mark: e.g., point, rect, bar
2.   Data: i.e., variables and types
3.   Encoding: e.g., x, y, color 




Warm up exercise: 

 ![1.1](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-1.1.png)

In [None]:
#1.1 Exercise: basic bar chart for Major_Genre and avg of Production Budget 


We can stitch multiple charts together by making a [compound chart](https://altair-viz.github.io/user_guide/compound_charts.html). Specifically, we discussed **horizontal concatenation** (hconcat or |) and **vertical concatenation** (vconcat or & )

Warmup #2 

 ![1.2](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-1.2.png)

In [None]:
#1.2 Exercise 2: average vs median 
# basic bar chart for Major_Genre and avg of Production Budget 
# basic bar chart for Major_Genre and median of Production Budget 
# put them side by side 



You can also build more sophisticated visualizations by combining charts both horizontally and vertically. For example, this next plot is known as a scatterplot with marginal historgrams. In the middle we are looking at the correlation between Rotten Tomatoes and IMDB scores. On the top and right we see the distributions of each variable separately. In this case we see a weak correlation and also note the IMDB rating has a nice normal distribution but the Rotten tomatoes scores seem more uniform. 

 ![1.3](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-1.3.png)

In [None]:
# 1.3 

# scatter plot for the middle


In [None]:
# the histogram to put on the right side (notice we remove the axes)

# the histogram to put on the top (notice we remove the axes)


In [None]:
# put the scatter and right together side by side and then put those
# under the top histogram


In [None]:
# 1.4 (really basic interactivity)
# lots of movies on top of each other



In [None]:
# 1.5 
# How do we know what the movies are?



In [None]:
# 1.6 
# Extended tooltips



# Altair Layering
## Altair Transforms (part 2)

Licia He and Eytan Adar

School of Information, University of Michigan

## Plan
1. Warmup
2. Layering (you are here)
2. Transform/Filter/Aggregate 

[Layering](https://altair-viz.github.io/user_guide/compound_charts.html#layered-charts) is a very useful compounding method that allows you to overlay two different charts on the same set of axes. You can layer charts using the "+" operator. 

### Our Goal:

![1.4](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-1.4.png)

In [None]:
#1.4 copy 1.2, change median to a line 



In [None]:
# 1.4 Shortcut


In [None]:
# 1.4 (one last variant)

# the "base chart" (not for rendering)


Layering is also commonly used to add text annotations. 

### Our goal (ugly version):
![1.5](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-1.5.png)

In [None]:
#1.5 layering text 



In [None]:
## 1.5 Make it look a little better

#adopt settings from the previous chart


# Altair Transforms
## Altair Week 2 (part 3)
Licia He and Eytan Adar

School of Information, University of Michigan

## Plan
1. Warmup
2. Layering 
2. Transform/Filter/Aggregate (you are here) 

### Why Do We Transform?

Often data doesn't "look" the way we need it for visualization
* too much data
* not enugh data
* data in the wrong shape
* data isn't aggregated
* data is missing
* etc.

Altair provides 13 different types of transformations. These will allow you to perform basic manipulation of the data without having to use some external tool like Pandas. Although Pandas may be more powerful, the advantage of doing transformations directly in Altair/Vega-Lite is you will be able to deploy your code without needing Python (just Vega-Lite).

## Pandas vs Altair

* You can usually achieve the transform both ways (see extra presentation)
* With Pandas you'll be stuck running a Python server (can't deploy)
 * "Fix" is to transform in Pandas and save the intermediate result
 * But problematic for dynamic data
* With Altair, it'll be harder to debug


| Transform                | Description                                                                     |
|:-------------------------|:--------------------------------------------------------------------------------|
| Aggregate Transforms     | Create a new data column by aggregating an existing column.                     |
| Bin transforms           | Create a new data column by binning an existing column.                         |
| Calculate Transform      | Create a new data column using an arithmetic calculation on an existing column. |
| Filter Transform         | Select a subset of data based on a condition.                                   |
| Flatten Transform        | Flatten array data into columns.                                                |
| Fold Transform           | Convert wide-form data into long-form data.                                     |
| Impute Transform         | Impute missing data.                                                            |
| Join Aggregate Transform | Aggregate transform joined to original data.                                    |
| Lookup Transform         | One-sided join of two datasets based on a lookup key.                           |
| Sample Transform         | Random sub-sample of the rows in the dataset.                                   |
| Stack Transform          | Compute stacked version of values.                                              |
| TimeUnit Transform       | Discretize/group a date by a time unit (day, month, year, etc.)                 |
| Window Transform         | Compute a windowed aggregation                                                  |

### 2.1 Aggregate and join aggregate  

Last class we covered [aggregate transformation](https://altair-viz.github.io/user_guide/transform/aggregate.html). For example, using mean, max, min, sum to compute aggregate summary statistics over groups of data.

In [None]:
#2.1.1 copy of 1.1 
# calculatge the average production budget per genre



You can also use the longer form to plot the same chart


In [None]:
#2.1.2 copy 2.1.1 and adjust to longer form 
# this is the explicit form of the code above


The same chart could be made with an explicit computted aggregation. 


In [None]:
#2.1.3 copy 2.1.2 and change to transform_aggregate 


If you create an aggregated variable and didn't use it, what will happen? 

In [None]:
#2.1.4 , copy 2.1.3 aggregate without usg 
# original data is impacted, the mean_production_budget is not available for other 
# transformation, such as filter 


#### Aggregate vs. JoinAggregate
Let's take a look at this mini example 


| Title | Major_Genre | Production_Budget |
|-------|-------|-------|
| A     | x     | 1     |
| B     | x     | 2     |
| C     | y     | 10    |

transform_aggregate will **change the original data structure** and group rows together. 


e.g. running the following transformation 
``` 
.transform_aggregate(
    groupby=['Major_Genre'],
    mean_production_budget='mean(Production_Budget)'
) 
```
will create a table like this 

| Major_Genre | mean_production_budget |
|-------|------------------------|
| x     | 1.5                    |
| y     | 10                     |

If we want to **preserve the original data structure**, we will use joinAggregate. 

e.g. running the following transformation 
``` 
.transform_joinaggregate(
    groupby=['Major_Genre'],
    mean_production_budget='mean(Production_Budget)'
) 
```

will generate the following table 

| Title | Major_Genre | Value | mean_production_budget |
|-------|-------|-------|------------------------|
| A     | x     | 1     | 1.5                    |
| B     | x     | 2     | 1.5                    |
| C     | y     | 10    | 10                     |

See the [join aggregate transform](https://altair-viz.github.io/user_guide/transform/joinaggregate.html) for more information.


In [None]:
#2.1.5 copy 2.1.4 use join aggregation


### 2.2 Bin Transformation 

We discussed the [bin transformation](https://altair-viz.github.io/user_guide/transform/bin.html#user-guide-bin-transform), replicate the following visualization: 

![2.2.1](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-2.2.1.png)

In [None]:
#2.2.1, EXERCISE barchart with binned IMDB_rating and mean Production_Budget


In [None]:
# 2.2.2 the "long form" way


### 2.3 Calculate Transform 
The [calculate transform ](https://altair-viz.github.io/user_guide/transform.html#calculate-transform)allows the user to define new fields in the dataset which are calculated from other fields using an expression syntax.

For example, we want to have a column called "**Revenue**" that's equal to the difference between Worldwide_Gross and Production_Budget

In [None]:
## 2.3.1 Revenue vs Major_Genre


In [None]:
## 2.3.2 Revenue vs Major_Genre (Alternative way)


You can chain multiple transformations together. Make a bar chart for mean revenue and major_genre

![2.3.2](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-2.3.2.png)

In [None]:
#2.3.3 Exercise: bar chart for mean revenue and major_genre 


In [None]:
#2.3.4 Color movies by whether they are cheap or expensive 



### 2.4 Filter Transform 
The[ filter transform ](https://altair-viz.github.io/user_guide/transform/filter.html#user-guide-filter-transform)removes objects from a data stream based on a provided filter expression, selection, or other filter predicate. 

There are multiple ways of specifying filters. The first way is using a Vega expression. 

In [None]:
#2.4.1 alt.datum: & for and, | of or, > < = != 

# What's the mean production budget for movies with more than 
# 500 votes and a rating > 5



In [None]:
#2.4.2



You can also make the same chart using **Field Predicates**. Field predicates overlap somewhat in function with expression predicates, but have the advantage that their contents are validated by the schema. Examples are:

In [None]:
#2.4.3 using predicates: equal, lt, gt, lte, gte 


Here are 2 very useful field predicates: 
* **FieldOneOfPredicate** evaluates whether a field is among a list of specified values.
* **FieldRangePredicate** evaluates whether a continuous field is within a range of values.

In [None]:
# 2.4.3 oneOf and Range:  
# include only Drama and Comedy movies that have ratings between 5 and 7. 


instead of !, &  and |, you can also use [logical operands](https://altair-viz.github.io/user_guide/transform/filter.html#logical-operands)

In [None]:
#2.4.4 copy 2.4.3, use logical operands


### 2.4.5 Sample Transform 
[Sample Transform ](https://altair-viz.github.io/user_guide/transform/sample.html)lets you specify a number of rows to randomly choose from the dataset. 

In [None]:
#2.4.5 SAMPLE, run the following code  


### 2.4.6 Window Transformation 
[window transformation](https://altair-viz.github.io/user_guide/transform/window.html) calculates over sorted groups of data objects. These calculations include ranking, lead/lag analysis, and aggregates such as cumulative sums and averages. 

#### Produce the following:

![2.4.7](https://raw.githubusercontent.com/eytanadar/si649public/master/lab4/assets/demo/altair-2.4.7.png)

In [None]:
#2.4.6: Ranking through the Window Transform
