# Data visualization using Altair

## Library import

In [0]:
import pandas as pd

In [0]:
%load_ext google.colab.data_table

In [0]:
import altair as alt

## Data import

In [1]:
# List files in the DATA directory of your Google Drive
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive
%cd My\ Drive/Coursera/Understanding\ clinical\ research/DATA
%ls

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /gdrive
/gdrive
/gdrive/My Drive/Coursera/Understanding clinical research/DATA
 Altair.csv      'Import Google Sheet.ipynb'   SyrianConflict.xlsx
 correct.gsheet   problematic.gsheet
 data.csv         ProjectData.csv


In [0]:
df = pd.read_csv('Altair.csv')

In [6]:
df

Unnamed: 0,SampleID,Type,Grade,MeasureA,MeasureB,MeasureC
0,1,I,2,25,31.3,110.189147
1,2,II,4,22,23.8,99.619512
2,3,I,3,22,27.6,87.551602
3,4,II,2,28,33.2,85.567347
4,5,II,4,30,31.8,110.270053
...,...,...,...,...,...,...
95,96,I,3,21,25.0,103.223743
96,97,I,2,26,26.4,101.130020
97,98,II,3,21,24.0,117.752421
98,99,I,4,30,36.3,101.522800


## Basic visualization

In [0]:
chart = alt.Chart(df)

In [8]:
type(chart)

altair.vegalite.v3.api.Chart

A variety of `chart_` methods exist for expressing the data in visual form.  One such method its the `chart_point` method.  Even with the data loaded in the chart object, calling this method will only produce a point.

In [9]:
alt.Chart(df).mark_point()

This is actually one point per row, hence visually expressed on top of each other.

The point needs encoding for data along an axis.  Below, the values in the `MeasureA` column are encoded _to_ the $x$ axis.

In [10]:
chart.mark_point().encode(x='MeasureA')

In [11]:
df.MeasureA.max()  # Maximum value of the variable

30

Note that an $x$ axis is created and the values in the specified column are displayed as point along this axis.  The interval is from $0$ to the maximum value of the variable.

The `encode` methods maps values for a variable to a channel.  The $x$ axis is such a channel encoded by `x`.  Others include `y`, `color`, `shape`, and `size`.

Below, the numerical values of the `MeasureA` variable are plotted as point on the $y$ axis, for the two given sample space elements in the `Type` variable.

In [12]:
chart.mark_point().encode(x='Type',
                          y='MeasureA')

The data point values are still overlapping.  It might be more prudent to express some statistic for the numerical variable as grouped by the categorical variable above.  `Altair` uses aggregation functions as arguments.  Below is a plot showing the mean of the numerical variable as grouped by the categorical variable.

In [14]:
chart.mark_point().encode(x='Type',
                          y='average(MeasureA)')

Another marker might be better suited for this aggregate.

In [15]:
chart.mark_bar().encode(x='Type',
                        y='average(MeasureA)')

The orientation is controlled by the channel.

In [16]:
chart.mark_bar().encode(y='Type',
                        x='average(MeasureA)')

Customizing the plot requires the addition of arguments and values and the use of the alternative channels for the $x$ and $y$ axes.

In [20]:
chart.mark_bar(color='orange').encode(alt.X('Type', title='Treatment type'),
                                      alt.Y('average(MeasureA)', title='Mean for measure A'))

Note that the `Grade` variable is ordinal.  This can be specified using the `:O` notation following the name of the variable.

In [35]:
chart.mark_bar(color='orange').encode(alt.X('Grade:O', title='Grade of disease'),
                                      alt.Y('average(MeasureA)', title='Mean for measure A'),
                                      opacity=alt.value(0.75))