<a href="https://colab.research.google.com/github/odu-cs625-datavis/public-fall24-mcw/blob/main/Customizations_Vega_Altair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Vega-Altair Customizations**

In this notebook, we'll look at some extra customizations for Vega-Altair charts.

First, we'll load the same datasets that we had in our Marks and Channels tutorial.

In [3]:
!pip install altair==5.4.1



In [4]:
import altair as alt
from vega_datasets import data as vega_data
alt.__version__  # if doesn't say '5.4.1', restart the runtime

'5.4.1'

In [5]:
data = vega_data.gapminder()
data.head()

Unnamed: 0,year,country,cluster,pop,life_expect,fertility
0,1955,Afghanistan,0,8891209,30.332,7.7
1,1960,Afghanistan,0,9829450,31.997,7.7
2,1965,Afghanistan,0,10997885,34.02,7.7
3,1970,Afghanistan,0,12430623,36.088,7.7
4,1975,Afghanistan,0,14132019,38.438,7.7


In [6]:
data_2000 = data[data['year'] == 2000]
data_2000.head()

Unnamed: 0,year,country,cluster,pop,life_expect,fertility
9,2000,Afghanistan,0,23898198,42.129,7.4792
20,2000,Argentina,3,37497728,74.34,2.35
31,2000,Aruba,3,69539,73.451,2.124
42,2000,Australia,4,19164620,80.37,1.756
53,2000,Austria,1,8113413,78.98,1.382


## **Extra Customizations**

For most of these, we'll start with the same base chart.

In [20]:
alt.Chart(data_2000).mark_point(tooltip=True).encode(
    x = 'fertility:Q',
    y = 'life_expect:Q',
    color = 'cluster:N',
    size = 'pop:Q'
)

The first customization we'll do is to remove the gridlines from the chart. This is done setting the `grid` property to `False` within the `axis` argument for both the `x` and `y` encodings, providing a way to customize how the axis is displayed.

Learn more information about [`altair.Axis`](https://altair-viz.github.io/user_guide/generated/core/altair.Axis.html#altair.Axis) and [altair.X](https://altair-viz.github.io/user_guide/generated/channels/altair.X.html).

In [8]:
alt.Chart(data_2000).mark_point(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = 'cluster:N',
    size = 'pop:Q'
)

There are several ways to change the appearance of your axes by utilizing `altair.Axis` and modifying its many parameters. View the [documentation](https://altair-viz.github.io/user_guide/generated/core/altair.Axis.html#altair.Axis) for more information.

(Note: In the following cells, I'll be using the `()` notation so we can add line breaks.)

In [9]:
(
  alt.Chart(data_2000).mark_point(tooltip=True).encode(
    x = alt.X('fertility:Q',
              axis = alt.Axis(grid = False, tickSize = 20, tickWidth = 5,
                              labelAngle = 45, labelColor = "blue",
                              labelFontSize = 20)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = 'cluster:N',
    size = 'pop:Q'
  )
)

Now let's go back to the chart just with no gridlines.  Next up is to change the color palette for the `cluster` attribute.   [Scale Ranges](https://vega.github.io/vega-lite/docs/scale.html#range) indicates that in Vega-Lite, the default color palettes are chosen based on the attribute's type:
* "category" for nominal fields.
* "ordinal" for ordinal fields.
* "heatmap" for quantitative and temporal fields with "rect" marks and "ramp' for other marks

Color palettes can be set using the `range` or `scheme` property within the [`altair.Scale()`](https://altair-viz.github.io/user_guide/generated/core/altair.Scale.html#altair.Scale) function. To learn more about `altair.Color()`, click [here](https://altair-viz.github.io/user_guide/generated/channels/altair.Color.html).

See the [Color Schemes](https://vega.github.io/vega/docs/schemes/) documentation for the various schemes that are available and the [Scale > Color Schemes](https://vega.github.io/vega-lite/docs/scale.html#scheme) documentation for examples.

For this example, we'll pick the [`set1`](https://vega.github.io/vega/docs/schemes/#set1) scheme.

In [10]:
alt.Chart(data_2000).mark_point(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N', scale = alt.Scale(scheme = 'set1' )),
    size = 'pop:Q'
)

If we wanted filled circles instead of open circles, there are two different ways to do this.  

The easiest is to use `mark_circle()` instead of `mark_point()`.

In [11]:
alt.Chart(data_2000).mark_circle(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N', scale = alt.Scale(scheme = 'set1' )),
    size = 'pop:Q'
)

Or you can set the `filled` property to true in `mark_point()`.

In [21]:
alt.Chart(data_2000).mark_point(filled = True, tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N', scale = alt.Scale(scheme = 'set1' )),
    size = 'pop:Q'
)

In some of the smaller dots, it's difficult to see the color, so one thing we could add is a black border around the dots to see if that will help them stand out. For this, we can specify the `stroke` parameter within `mark_circle()`.

In [13]:
alt.Chart(data_2000).mark_circle(stroke = 'black',tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N', scale = alt.Scale(scheme = 'set1' )),
    size = 'pop:Q'
)

Other customizations to the stroke and fill can be used, see the [Color Properties](https://altair-viz.github.io/user_guide/marks/index.html#color-properties) and [Stroke Style Properties](https://altair-viz.github.io/user_guide/marks/index.html#stroke-style-properties) section from the [Mark](https://altair-viz.github.io/user_guide/marks/index.html#marks) documentation.

Below, we'll increase the stroke width and reduce the opacity of the fill.

In [14]:
(
  alt.Chart(data_2000).mark_circle(stroke = 'black',
                                   strokeWidth = 3,
                                   fillOpacity = 0.5,tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N', scale = alt.Scale(scheme = 'set1')),
    size = 'pop:Q'
  )
)

Next, let's look at adjusting the circle sizes. To provide a customized span of sizes, we can adjust the range of sizes used for population by setting the `range` parameter within `altair.Scale()` to an array indicating the smallest and largest sizes.

Here we update the size encoding to range from 0 pixels (for zero values) to 1200 pixels (for the maximum value in the scale domain).

In [15]:
alt.Chart(data_2000).mark_circle(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False)),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False)),
    color = alt.Color('cluster:N'),
    size = alt.Size('pop:Q', scale = alt.Scale(range = [0, 1200] ))
)

Now let's look at customizing the labels and titles on the chart using the `title` parameter within `altair.Chart()`.  We can provide a label for each channel that we specify as well as an overall title for the chart.

In [16]:
(
  alt.Chart(data_2000,
            title = 'Countries with higher fertility tend to have lower life expectancy')
  .mark_circle(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False),
              title = 'Fertility (children per woman)'),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False),
              title = 'Life Expectancy (years)'),
    color = alt.Color('cluster:N'),
    size = alt.Size('pop:Q', scale = alt.Scale(range = [0, 1200] ))
  )
)

To add any modifications to the main title, you can utilize `altair.TitleParams`. In order to add a subtitle to the main title, we need to specify the main title as `text` and the subtitle as `subtitle` within `altair.TitleParams`.

In [17]:
(
  alt.Chart(data_2000,
            title = alt.TitleParams(text = 'Countries with higher fertility tend to have lower life expectancy',
                                    subtitle = 'source: gapminder.json'))
  .mark_circle(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False),
              title = 'Fertility (children per woman)'),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False),
              title = 'Life Expectancy (years)'),
    color = alt.Color('cluster:N'),
    size = alt.Size('pop:Q', scale = alt.Scale(range = [0, 1200]))
  )
)

Our final customization is to change the chart size.  Let's make it wider so that it fills more of the screen.  This can be done by setting the `width` and `height` parameters within the `properties` function.

In [22]:
(
  alt.Chart(data_2000,
            title = alt.TitleParams(text = 'Countries with higher fertility tend to have lower life expectancy',
                                    subtitle = 'source: gapminder.json'))
  .mark_circle(tooltip=True).encode(
    x = alt.X('fertility:Q', axis = alt.Axis(grid = False),
              title = 'Fertility (children per woman)'),
    y = alt.Y('life_expect:Q', axis = alt.Axis(grid = False),
              title = 'Life Expectancy (years)'),
    color = alt.Color('cluster:N'),
    size = alt.Size('pop:Q', scale = alt.Scale(range = [0, 1200]))
  )
  .properties(width = 600, height = 300)
)