# Choosing colors and shapes in ggplot2

Simone Santoni  
2024-10-31

This notebook illustrates how to change

# Notebook setup

## Load libraries

In [None]:
library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

## Load data

The toy dataset we’ll use in this notebook is `laptop_price.csv`. It
contains information on the price of laptops, as well as the laptops’
core featurs. The source for the dataset is
https://www.kaggle.com/datasets/muhammetvarl/laptop-price

In [None]:
df <- read_csv("~/githubRepos/data-viz-smm635/data/laptops/laptop_price.csv")

Rows: 1303 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Company, Product, TypeName, ScreenResolution, Cpu, Ram, Memory, Gp...
dbl  (3): laptop_ID, Inches, Price_euros

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 1,303 × 13
   laptop_ID Company Product TypeName Inches ScreenResolution Cpu   Ram   Memory
       <dbl> <chr>   <chr>   <chr>     <dbl> <chr>            <chr> <chr> <chr> 
 1         1 Apple   MacBoo… Ultrabo…   13.3 IPS Panel Retin… Inte… 8GB   128GB…
 2         2 Apple   Macboo… Ultrabo…   13.3 1440x900         Inte… 8GB   128GB…
 3         3 HP      250 G6  Notebook   15.6 Full HD 1920x10… Inte… 8GB   256GB…
 4         4 Apple   MacBoo… Ultrabo…   15.4 IPS Panel Retin… Inte… 16GB  512GB…
 5         5 Apple   MacBoo… Ultrabo…   13.3 IPS Panel Retin… Inte… 8GB   256GB…
 6         6 Acer    Aspire… Notebook   15.6 1366x768         AMD … 4GB   500GB…
 7         7 Apple   MacBoo… Ultrabo…   15.4 IPS Panel Retin… Inte… 16GB  256GB…
 8         8 Apple   Macboo… Ultrabo…   13.3 1440x900         Inte… 8GB   256GB…
 9         9 Asus    ZenBoo… Ultrabo…   14   Full HD 1920x10… Inte… 16GB  512GB…
10        10 Acer    Swift 3 Ultrabo…   14   IPS Panel Full … Inte… 8GB   256GB…
# ℹ 1

# Colors

## Visual forms’ inner color, boarder color, and transparency

In `ggplot2`, it is possible to alter a visual form’s default color by
passing an optional parameter to the geomtric object at hand. Let’s
consider a bar chart showing the distribution of laptops across
different screen sizes.
<a href="#fig-base" class="quarto-xref">Figure 1</a> illustrates a chart
whose bars exhibit `ggplot2`’s default color. Populating the optional
parameter `fill` would alter the chosen visual form’s inner color – see
<a href="#fig-fill" class="quarto-xref">Figure 2</a>; the optional
parameter `colour` affects the visual form’s boarder color – see
<a href="#fig-fillandboard" class="quarto-xref">Figure 3</a>. It is also
possible to regulate the transparency of the chosen color by fixing the
optional `alpha` parameter – see
<a href="#fig-alpha" class="quarto-xref">Figure 4</a>. Note that the
smaller is the scalar value you pass to `alpha`, the more transparent is
the visual form – see
<a href="#fig-alphaagg" class="quarto-xref">Figure 5</a>.

In [None]:
p <- ggplot(data = df, mapping = aes(factor(Inches)))
p + geom_bar()

In [None]:
p <- ggplot(data = df, mapping = aes(factor(Inches)))
p + geom_bar(fill = "magenta")

In [None]:
p <- ggplot(data = df, mapping = aes(factor(Inches)))
p + geom_bar(fill = "magenta", colour = "blue")

In [None]:
p <- ggplot(data = df, mapping = aes(factor(Inches)))
p + geom_bar(fill = "green", alpha = 0.5)

In [None]:
p <- ggplot(data = df, mapping = aes(factor(Inches)))
p + geom_bar(fill = "green", alpha = 0.1)

## Scales

`ggplot2` comes with plenty of [color scales and
palettes](https://ggplot2-book.org/scales-colour#brewer-scales) that can
help discriminate visually various data groups. Let’s suppose to expand
on the visualization reported in
<a href="#fig-boxplot" class="quarto-xref">Figure 6</a>, dealing with
the distribution of laptop price across different screen size groups.
Specifically, we want to add another dimension to
<a href="#fig-boxplot" class="quarto-xref">Figure 6</a> to show how
laptop prices change across screen and ram size groups. By default,
`ggplot2` will use the `hue` color scale see ―
<a href="#fig-boxplotdefault" class="quarto-xref">Figure 7</a>. To adopt
a non-default color scale, the optional argument `scale_color_*` must be
populated. In
<a href="#fig-boxplotbrewer" class="quarto-xref">Figure 8</a>, I adopt a
color scale for discrete data, namely
[`brewer`](https://ggplot2-book.org/scales-colour#brewer-scales).
*Warning*: always ensure to pair discrete (continuous) color scales with
discrete (continuous) variables. Otherwise, `ggplot2` will return an
error, e.g., `Discrete values supplied to continuous scale`.

In [None]:
p <- ggplot(data = df, mapping = aes(x = factor(Inches), y = Price_euros))
p + geom_boxplot()

In [None]:
p <- ggplot(data = df, mapping = aes(x = factor(Inches), y = Price_euros))
p + geom_boxplot(aes(colour = Ram))

In [None]:
p <- ggplot(data = df, mapping = aes(x = factor(Inches), y = Price_euros))
p + geom_boxplot(aes(colour = Ram)) + scale_color_brewer(palette = "Paired")

# Shapes

In data visualization, shapes can play a role similar to colors, by
representing further data dimensions. For example,
<a href="#fig-colors" class="quarto-xref">Figure 9</a> and
<a href="#fig-shapes" class="quarto-xref">Figure 10</a> use color and
shapes, respectively, to denote two different data series regarding
Apple and Lenovo laptops.

In [None]:
cols <- c("Company", "Inches")
apple_lenovo <- df |> filter(Company == "Apple" | Company == "Lenovo")
ave <- apple_lenovo |> group_by(across(all_of(cols))) |> summarize(ave_price = mean(Price_euros))

`summarise()` has grouped output by 'Company'. You can override using the
`.groups` argument.

In [None]:
p <- ggplot(data = ave, mapping = aes(x = factor(Inches), y = ave_price, shape = factor(Company)))
p + geom_point()

At the same time, one may want to adopt a non-default shape across all
data series. That would be the case of
<a href="#fig-changingshapes" class="quarto-xref">Figure 11</a>, in
which shape ‘5’ ― an empty circle ― replaces `ggplot2`’s default shape.
**?@fig-ggplot2shapes** provides a summary of the shapes available in
`ggplot2` and their underlying numeric codes.

In [None]:
p <- ggplot(data = ave, mapping = aes(x = factor(Inches), y = ave_price, color = factor(Company)))
p + geom_point(shape = 5)