In [None]:
library('tidyverse')

gglpot always has this form:
```
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
```

In [None]:
ggplot(data = mpg)

In [None]:
str(mpg)

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = cyl, y = hwy))

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = drv, y = class))

## 3.3 Aesthetic mapping

* aesthetic is the visual property of the objects in plot, size, shape, color, alpha (transparency), x/y location
* map an aesthetic to a variable, associate the name of varible with that of aesthetic
* ggplot2 assign level to aesthetic.

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue", shape = 1)

### 3.4.1 exercises

In [None]:
str(mpg)

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ))

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = class, alpha = class), size = 2)

In [None]:
# 4:
# For shapes that have a border (like 21), you can colour the inside and
# outside separately. Use the stroke aesthetic to modify the width of the
# border
ggplot(mtcars, aes(wt, mpg)) +
  geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 5)

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))

## 3.5 Facets

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ cyl)

In [None]:
ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

### 3.5.1

In [None]:
# 1.
str(mpg)
# ggplot(data = mpg) +
#  geom_point(mapping = aes(x = displ, y = hwy)) +
#  facet_grid(. ~ displ)

In [None]:
# 3.
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

In [None]:
# 5. because facet_wrap is for 1-d. facet_wrap is for 2-d.
# 6. We don't have much space in a row.

## 3.6 Geometric objects

* geom is geometrical object that a plot uses to represent data
* geom_smooth takes "linetype" aesthetic
* We can have multiple geoms on one plot, then it necessary to move mapping to ggplot()
    * And then we can have localized customization for different geoms

In [None]:
ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv, color = drv)) +
  geom_point(mapping = aes(x = displ, y = hwy, shape = drv, color = drv))

In [None]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(color = class)) +
    geom_smooth()

In [None]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(color = class)) +
    geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)

### 3.6.1 Exercises

In [None]:
# 6.1
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

In [None]:
ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

In [None]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(mapping = aes(color = drv)) +
    geom_smooth(se = FALSE)

In [None]:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(mapping = aes(color = drv)) +
    geom_smooth(mapping = aes(linetype = drv), se = FALSE)

In [None]:
# 6.6
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
    geom_point(color = "white", size = 4) +
  geom_point(mapping = aes(color = drv))

#      geom_point(color = "white", size = 2) +

## 3.7 Statistical transformations

* bar plot uses count as y, but the dataset doesn't contain count. geom_bar do the statistic transformation.
  Other geoms do the transfomration like:
    * bars, histogram, frequency ploygon: bin your data and then plot bin counts
    * smoothers: fit a model
    * boxplots: compute robust summary
* You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using       `stat_count()` instead of `geom_bar()`. We can do this because every geom has a default stat; and every stat has a default geom.
* 3 cases we might want to use other stat other than the default one:
    * override the default stat
    * override the default mapping from transformed variables to aesthetics
    * want to draw greater attention to the statistical transformation in your code, e.g. if you use 
* To see a complete list of stats, try the ggplot2 cheatsheet

In [None]:
summary(diamonds)

In [None]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = cut))

In [None]:
demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551
)
summary(demo)

In [None]:
ggplot(data = demo) +
    geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

In [None]:
ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

### 3.7.1 Exercises

In [None]:
# 1. geom_pointrange is associated with stat_summary()
ggplot(data = diamonds) +
    geom_pointrange(mapping = aes(x = cut, y = depth), stat = "summary",
                    fun.ymin = min, fun.ymax = max, fun.y = median)

In [None]:
# 2. geom does geom_bar(stat = "identity")
ggplot(data = demo) +
    geom_col(mapping = aes(x = cut, y = freq))

In [None]:
# 3. See: http://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms

In [None]:
# 4. stat_smooth computed variables: y/y_min/y_max/se. method/formula/span.

In [None]:
# 5. The problem with the following code is that it treats each combination
# of cut and color as one group, so the proportion is always 1 (100%)
# But our intention is to treat all data as one group
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))