-
Notifications
You must be signed in to change notification settings - Fork 4
/
index.html
1428 lines (979 loc) · 148 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Advanced Data Visualization in R</title>
<meta charset="utf-8">
<meta name="author" content="Sara E. Moore" />
<meta name="date" content="2017-03-16" />
<link href="libs/remark-css-0.0.1/example.css" rel="stylesheet" />
<script src="libs/htmlwidgets-0.8/htmlwidgets.js"></script>
<link href="libs/plotlyjs-1.16.3/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="libs/plotlyjs-1.16.3/plotly-latest.min.js"></script>
<script src="libs/plotly-binding-4.5.6/plotly.js"></script>
<script src="libs/jquery-1.12.4/jquery.min.js"></script>
<script src="libs/datatables-binding-0.2/datatables.js"></script>
<link href="libs/dt-core-1.10.12/css/jquery.dataTables.min.css" rel="stylesheet" />
<link href="libs/dt-core-1.10.12/css/jquery.dataTables.extra.css" rel="stylesheet" />
<script src="libs/dt-core-1.10.12/js/jquery.dataTables.min.js"></script>
<link href="libs/leaflet-0.7.7/leaflet.css" rel="stylesheet" />
<script src="libs/leaflet-0.7.7/leaflet.js"></script>
<link href="libs/leafletfix-1.0.0/leafletfix.css" rel="stylesheet" />
<link href="libs/leaflet-label-0.2.2/leaflet.label.css" rel="stylesheet" />
<script src="libs/leaflet-label-0.2.2/leaflet.label.js"></script>
<script src="libs/Proj4Leaflet-0.7.2/proj4-compressed.js"></script>
<script src="libs/Proj4Leaflet-0.7.2/proj4leaflet.js"></script>
<script src="libs/leaflet-binding-1.1.0/leaflet.js"></script>
<script src="libs/leaflet-providers-1.0.27/leaflet-providers.js"></script>
<script src="libs/leaflet-providers-plugin-1.1.0/leaflet-providers-plugin.js"></script>
<link href="libs/bokehjs-0.12.2/bokeh.min.css" rel="stylesheet" />
<link href="libs/bokehjs-0.12.2/loader.css" rel="stylesheet" />
<script src="libs/bokehjs-0.12.2/bokeh.min.js"></script>
<script src="libs/rbokeh-binding-0.5.0/rbokeh.js"></script>
<link rel="stylesheet" href="custom.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
# Advanced Data Visualization in R
## Epi Doctoral seminar
### Sara E. Moore
### 16 March 2017
---
# How to access these slides
## (and associated code)
### Via git:
```bash
git clone https://github.com/saraemoore/Rdataviz2017.git
```
### Download directly:
https://github.com/saraemoore/Rdataviz2017/archive/master.zip
### View online:
http://saraemoore.github.io/Rdataviz2017
???
* I won't be showing all the code on the slides today, but it's all available in the R Markdown document used to create these slides in this repository on github, or in the corresponding R script that's also in the github repository.
---
# Prerequisites
You have:
* a working knowledge of R,
* some familiarity with the usage of `ggplot2` (such as what was presented during the [2016 UC Berkeley SCF/D-Lab R Bootcamp](https://github.com/berkeley-scf/r-bootcamp-2016)),
* an interest in creating data visualizations in R, both **static** (mostly using *ggplot2*) and **interactive** (using a variety of packages).
---
class: inverse, middle, center
# `ggplot2` and the Grammar of Graphics
---
# Why `ggplot`?
* It's pretty.
* Its commands are intuitive and "human-readable."
* Nearly any graphic can be created, so you can use it for everything and maintain a consistent style.
* It has (sort of) built-in support for maps.
???
* From what I understand, a lot of you are probably already using ggplot or are interested in learning to use it. But, I'm still going to tell you why it's a good thing to use.
* Of course, it's pretty. Graphics made with ggplot are eye-catching. This is actually pretty important -- it goes a long way when you want anyone to look at your graphs.
* The code you use to call ggplot and create a graphic is fairly intuitive. This is because of the "grammar of graphics" that it adheres to, and we'll get back to that in a bit. This is what the "gg" in "ggplot" stands for -- "grammar of graphics" -- so the philosophy is pretty central to the package.
* If you learn how to use it well, you can make almost any visualization in it, and your reports, presentations, papers, and so on will look more cohesive because you stuck to a particular style throughout.
* Finally, with a little help from other R packages, ggplot is able to interface with geographical maps, if that's important to you.
---
# Why not `ggplot`?
* It's slow.
* It won't do some things.
* There's a steep learning curve.
* ~~`lattice` is better at trellis graphs?~~ [Faceting](https://learnr.wordpress.com/2009/08/26/ggplot2-version-of-figures-in-lattice-multivariate-data-visualization-with-r-final-part/) is just as powerful.
???
* So, what are the arguments against ggplot?
* It's slower than other R graphics systems. This is a fair point. However, it's probably not something you'll notice under everyday use.
* It also won't draw some graphics that you might want it to.
* One example that's often brought up is 3D surface plots. But, do you really want to make a static 3D perspective plot? There are other, arguably better, ways to represent three-dimensional data in 2D, like contour plots and heatmaps. This is kind of the story with most things that ggplot supposedly "can't" do -- it's a principled decision by a designer to limit the use of his product -- artistic license.
* For example, having two y-axes, each using a different scale for a different variable, was almost impossible to do in ggplot until recently -- which may have been for the best because having two y-axes can be misleading.
* Moving x-axis labels to the top, rather than the bottom, is another thing that was difficult until recently.
* ggplot also refuses to use more than six shapes -- citing difficultly in determining which is which -- unless you manually override this by specifying your own shapes.
* ggplot can be difficult to break into. If you force yourself to use it, it will become natural fairly quickly, though. You can also start with `qplot()` aka "quick plot," but I won't be going over that here.
* Some people argue that lattice is better at trellis graphs, which were made popular by Bill Cleveland's 1993 book "Visualizing Data." However, I disagree. I've included a link here to a compilation of a series of blog posts from 2009 in which nearly every graphic in the entire Lattice book by Deepayan Sarkar was recreated in ggplot. You can tweak these examples to make them look even more like lattice output, if you want, but the point is that faceting works just as well.
---
# Tidy data <sup>1</sup>
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
.pull-left[
![](assets/img/tidydata_tab1and2.png)
]
.pull-right[
![](assets/img/tidydata_tab3.png)
]
.footnote[1: <a name=cite-Wickham_2014></a>[Wickham (2014)](http://www.jstatsoft.org/v59/i10)]
???
* One stumbling block when getting started with ggplot is that your data needs to be in a certain format before you can use it in ggplot effectively. One name for this format is "tidy data." It can also be called long or tall, as opposed to wide, data, but tidy data is a particular type of long data. The general idea is that there should be one row per observation, whatever you're calling a single observation for your purposes, and one column per variable. This means that you're typically going to want to collapse indicators into factors, for example, or you may need to think carefully about what your observational unit is.
* Here, the first two tables display data that is not tidy. The third table displays the same data, but made tidy.
---
# The (Layered) Grammar of Graphics <sup>1</sup>
* Move away from using "names" and "chart typologies."
* Instead, use "statements" constructed via grammar
* Why?
+ An infinite number of unique graphics can be created.
+ The implementation is **DRY** ("don't repeat yourself") not **WET** ("write everything twice" or "we enjoy typing").
> "Good grammar is just the first step in creating a good sentence." &nbsp;<sup>2</sup>
.footnote[1: <a name=cite-Wilkinson_2005></a><a name=cite-Wickham_2009></a><a name=cite-Wickham_2010></a>[Wilkinson (2009)](http://dx.doi.org/10.1007/0-387-28695-0); [Wickham (2010)](http://dx.doi.org/10.1007/978-0-387-98141-3); [Wickham (2009)](http://dx.doi.org/10.1198/jcgs.2009.07098)<br />2: [Wickham (2010)](http://dx.doi.org/10.1198/jcgs.2009.07098)]
???
* As I mentioned previously, one of the best qualities of ggplot is that a series of commands used to generate a graphic are human-readable. This is because ggplot adheres to the so-called "grammar of graphics," first laid out by Leland Wilkinson in 1999. The general idea is, instead of using a "name," "chart typology," or drawing from what Hadley Wickham calls a "big collection of special cases," think more abstractly and use a "statement" to describe a graphic -- and we need a grammar to construct statements. This infinitely expands the number of unique graphics that can be created and it adheres to the **DRY** ("don't repeat yourself") programming principle (as opposed to WET, "write everything twice" or "we enjoy typing")
* Hadley Wickham's quote from his 2010 paper on his take on the grammar of graphics is just pointing out that this is not a recipe for a perfect graphic -- you can still make some pretty poor visualizations with ggplot -- but it is the first step.
---
# Components of the Grammar
Specify a statistical graphic using components of statements:
+ **Data** (`data`),
+ **Stat**istical transformations (`stat_*`: identity, count, mean, etc.),
+ **Geom**etric elements/objects (`geom_*`: points, lines, etc.),
+ **Aes**thetic mappings (`aes` and `aes_*`: color, shape, size, transparency, etc.),
+ **Coord**inate systems (`coord_*`: cartesian, polar, map, etc.),
+ **Guide**s/Legends (`guide_*`, `guides`),
+ **Scale**s/Axes and transformations thereof (`scale_*`),
+ **Facet**ing/conditioning/latticing/trellising (`facet_*`),
+ Tweaking graphical positioning and visual elements (`position_*`, `annotation_*`, `theme`, `element_*`, etc.), and
+ Layering.
???
* The general concepts or classes, which are kind of a mash-up of Wilkinson's and Wickham's philosophies, are listed here. The realizations of these concepts in `ggplot` are in parentheses so we can connect the ideas to the R commands. Also, layering is not a statement itself, but is implied by the order of the other components.
---
# The anatomy of a `ggplot` call
* All arguments to the first function called, `ggplot`, set graph defaults.
* These defaults can be changed for an individual layer (even `data`).
```r
ggplot(data=, aes(x=, y=, ...)) +
geom_????(...) +
...
```
```r
ggplot() +
geom_????(data=, aes(x=, y=, ...), ...) +
...
```
???
* Some guides, including the ggplot book, start you off with the qplot or "quick plot" command. That's nice if you're coming from another graphics framework like base graphics in R. But, in the interest of time, because I'm assuming a little bit of familiarity with ggplot with this group, and because the ggplot command is more powerful than qplot, I'm going to skip right to using the ggplot command here.
* These are some simple examples of a couple of ways in which you could create a very simple plot using ggplot. Any aesthetics or data you provide to the ggplot command, which always gets called first in the "statement," set defaults for the entire graph. You can instead choose to leave the ggplot command empty -- without arguments -- if you'd like to specify individual data and aesthetics for each geometric object. You can even specify a defaults AND specify different settings for individual geometric elements. Note that it is best practice to not repeat yourself, so typically you'll set some defaults up front and only change later in the statement any individual elements that you want to change.
---
# When geoms transform
geom | stat | notable default settings
------------------- | ------------------- | -------------------
`geom_boxplot()` | `stat_boxplot()` | max length of whiskers (beyond hinges) = 1.5*IQR
`geom_count()` | `stat_sum()` |
`geom_bar()` | `stat_count()` |
`geom_histogram()` | `stat_bin()` | 30 bins: binwidth = [range of x]/30
`geom_freqpoly()` | `stat_bin()` | 30 bins: binwidth = [range of x]/30
`geom_dotplot()` | | 30 bins: binwidth = [range of x]/30; Wilkinson's "dot-density" binning method
`geom_bin2d()` | `stat_bin_2d()` | 30 bins for each of x and y
`geom_hex()` | `stat_bin_hex()` | 30 bins for each of x and y (calls `hexbin::hexbin()`)
???
* Some geometric objects are straightforward. Others involve statistical transformations.
* On this slide and the next two is a reference list of the geometric elements in ggplot that "silently" transform data by default. In other words, their default stat is not "identity." Also listed here are the defaults of those transformations. You can certainly always call a "stat" function that creates a "geom", but it's much more common in practice to just call a "geom" and add arguments specifying the "stat".
* The boxplot is a special case where many elements cannot be set by the user, but that's alright because they're what you'd expect them to be. The middle line is the median and the lower and upper hinges represent the lower and upper quartiles, respectively. The one thing you can specify is how far - at a maximum - the whiskers extend beyond the hinges, but the default (1.5 x inter-quartile range) is consistent with John Tukey's boxplot. You can also make a notched boxplot with geom_boxplot, and the notch locations are not specifiable, but I won't go into the settings for that here -- you can find it in the documentation if you're interested.
---
# When geoms transform
geom | stat | notable default settings
------------------- | ------------------- | -------------------
`geom_density_2d()` | `stat_density_2d()` | bivariate Gaussian kernel; bandwidths (x and y) estimated by `MASS::bandwidth.nrd()` using Scott's "rule of thumb"; 100 grid points for x and y (calls `MASS::kde2d()`)
`geom_contour()` | `stat_contour()` | 10 `pretty()` breakpoints covering [range of z]
`geom_density()` | `stat_density()` | Gaussian kernel; bandwidth estimated by `stats::bw.nrd0()` using Silverman's "rule of thumb" (calls `stats::density()`)
`geom_violin()` | `stat_ydensity()` | Gaussian kernel; bandwidth estimated by `stats::bw.nrd0()` using Silverman's "rule of thumb" (calls `stats::density()`); all violins have same area before trimming tails, tails are trimmed to [range of y]
???
* For one and two dimensional kernel density estimates, by default, the bandwidth is chosen automatically via "rules of thumb," which I won't go into here, but the formulas are available in the R help via the functions I've listed here. For two dimensional density estimates, the kernel cannot be changed to anything other than Gaussian.
* Whenever you allow ggplot to transform your data, you should always know what it's doing -- here I'm careful to state the defaults and changed them just to demonstrate how it can be done. This is one tricky part about using a "smart" graphics package -- you need to be sure to keep up with what it's doing to your data. There was a biostat job talk a couple of years ago that was really amazing for the most part -- really advanced theory and good discussion -- but it was completely derailed for a good 5 minutes by discussion over a single graphic -- a ggplot violin plot -- and which wasn't even demonstrating a main point of the talk. As you can see, violin plots are a bit tricky. The big issues were that there were several geometric objects drawn on the graph, and a couple of them performed transformations on the data, but it wasn't made clear exactly what those transformations were. It's easy to use some of these functions without thinking very hard about them, particularly if you leave them at their default settings, but you really need to be careful.
---
# When geoms transform
geom | stat | notable default settings
------------------- | ------------------- | -------------------
`geom_smooth()` | `stat_smooth()` | if `\(n<1000\)`, `stats::loess()` with polynomial degree 2, `\(\alpha=0.75\)`, etc.; else, `gam::gam()` with penalized cubic regression splines, etc.; 80 evaluation points <br/>
`geom_quantile()` | `stat_quantile()` | 3 quartiles; modified Barrodale & Roberts quantile regression method (calls `quantreg::rq()`)
???
* For loess and gam I only listed the most interesting parameters -- there are more that you can set. Just know that the "et cetera" indicates that any defaults not listed here for these functions are not modified in ggplot's call.
* Note that if a geom function has a default statistical transformation and that stat function has the same geometric object as its default -- which is often the case -- you'll get the same plot no matter which you decide to use (the geom or the stat).
---
# Other `ggplot` transformations
These `stat_*` functions do not have assumed `geom_*` mappings. See their documentation for details of the transformations applied.
* `stat_ecdf`: Empirical Cumulative Density Function
* `stat_ellipse`: Plot data ellipses.
* `stat_function`: Superimpose a function.
* `stat_qq`: Calculation for quantile-quantile plot.
* `stat_spoke`: Convert angle and radius to xend and yend.
* `stat_summary`: Summarise y values at every unique x.
* `stat_summary_bin`: Summarise y values at unique/binned x.
* `stat_summary_hex`: Apply function for 2D hexagonal bins.
* `stat_summary_2d`: Apply function for 2D rectangular bins.
* `stat_unique`: Remove duplicates.
* `stat_identity`
???
* There are many more stat functions that aren't the default stat for any geometric object. However, they can still be called directly and the majority of them perform transformations on your data. Check the documentation for details.
---
# Where to go for help with `ggplot2`
* [RStudio's ggplot2 cheat sheet](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) (updated for ggplot2 v2.1.0, Nov 2016)
* [Hadley Wickham's ggplot2 book (2009)](http://link.springer.com/book/10.1007%2F978-0-387-98141-3), the book's [github repo](https://github.com/hadley/ggplot2-book), or the [companion website](http://ggplot2.org/book/) to the book
* [official documentation](http://docs.ggplot2.org/current/)
* [google group](https://groups.google.com/forum/#!forum/ggplot2)
* [ggplot2 on stackoverflow](http://stackoverflow.com/tags/ggplot2/info)
* The [data visualisation](http://r4ds.had.co.nz/data-visualisation.html) and [graphics for communication](http://r4ds.had.co.nz/graphics-for-communication.html) chapters in Garrett Grolemund and Hadley Wickham's [R for data science](http://r4ds.had.co.nz/).
* Winston Chang's R Graphics Cookbook, [1st edition (2012)](http://shop.oreilly.com/product/0636920023135.do) or forthcoming 2nd edition (2017)
???
Here are a few resources I've found helpful when learning how to do certain tasks in ggplot.
---
class: inverse, middle, center
# `ggplot2` Examples
???
OK, enough setup -- let's run through an example of using ggplot.
---
# The data
* [WHO Global Health Observatory](www.who.int/gho/en/)
* Example:
```r
library(WHO)
codes <- get_codes()
who_data <- get_data("WHS9_CBR")
```
* Alternatively, use data that ships with `ggplot2`:
+ [diamonds](http://docs.ggplot2.org/current/diamonds.html),
+ [mpg](http://docs.ggplot2.org/current/mpg.html),
+ movies (now available as a separate R package, [ggplot2movies](https://CRAN.R-project.org/package=ggplot2movies)),
+ etc.
???
* For the majority of the examples in this presentation, I'm going to use data acquired via the WHO R package. This package just downloads data via the World Health Organization's (WHO) Global Health Observatory (GHO) public API.
* If you would rather not go to this trouble, ggplot2 ships with several good datasets that you can use instead -- a few of them are listed here. You can try your hand with one of those if you want some practice.
---
# A simple scatterplot
```r
library(ggplot2)
p = ggplot(data = subset(who_cbdr, !is.na(country)),
aes(x = value.birthsper1000,
y = value.deathsper1000,
color = worldbankincomegroup)) +
geom_point()
p
```
???
* This is a simple example to get us started. If you've used ggplot before, this plot will probably be pretty boring for you, but we'll start here and make things more complicated in a moment.
* Here we have DATA which is the result of merging two datasets downloaded from the World Health Organization's (WHO) Global Health Observatory (GHO).
* For 193 countries, we have both crude birth rate per 1,000 people and crude mortality rate per 1,000 people in a single year -- 2013.
* We also have the World Bank income group and global region for each of these countries.
* Each row in the data is an observation, as we''ve defined it for this problem -- a single country. Each column is a variable. This data is tidy, at least for how we choose to represent it now -- but there are other ways to structure it.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/simple_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* The crude birth and death rates are ratios. The numerator is the number of live births (or deaths) observed in a population during a reference period and the denominator is the number of person-years lived by the population during the same period. It is expressed as events per 1,000 population.
* For the World Bank income grouping, income is measured using gross national income per capita. Economies are then divided into four income groupings: low, lower-middle, upper-middle, and high, as you can see here.
* There are no implicit or explicit transformations of the data going on, and the only aesthetics are the position of each point in two dimensional space, defined by x and y, and the color of the points, which represent the income group. The geometric objects here are points, which are displayed in ggplot by default as circles.
* All that said, this is not a finished product. We can tell that there is a pattern in the data, but to me, this plot raises more questions than it answers.
* I'm not going to dwell too much on what a good versus bad plot is -- that's a whole separate topic that others can cover better than I can -- but I just wanted to point that out here as a segue to the next few graphics.
---
# Reorder a group aesthetic
```r
levels(who_cbdr$worldbankincomegroup)
```
```
## [1] "Global" "High-income" "Low-income"
## [4] "Lower-middle-income" "Upper-middle-income"
```
```r
# reorder factor levels
who_cbdr$worldbankincomegroup = factor(
who_cbdr$worldbankincomegroup,
levels = levels(who_cbdr$worldbankincomegroup)[c(2, 5:3, 1)],
labels = c("High", "Upper-middle", "Lower-middle", "Low", "Global"))
# recreate plot with modifed data
p = p %+% subset(who_cbdr, !is.na(country))
p
```
???
* One of the first things that sticks out at me in the previous plot is the ordering of the groups in the legend. The groups are in alphabetical order, which is the default for ggplot given no other information.
* Let's change the ordering to something more sensible: high, upper-middle, lower-middle, and then low. There's an additional level in the data, Global, which represents summaries and isn't used in this scatterplot. We can put it last.
* Then we'll use this special addition operator provided by ggplot2 that allows us to switch out the original data for the plot with new data. Plot objects store the data, so even though we've changed the data frame in our R session, ggplot won't know about those changes unless we tell it to use the new data.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/grouporder_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* As you can see, the legend is now ordered differently, and so the colors assigned to each group have changed, as well. Now the pattern in the data is a little easier to read, since, from left to right, the colors of the majority of the points generally go in the same order as they're displayed in the legend, from high income to low income.
---
# Add labels
```r
p = p +
xlab("Crude birth rate") +
ylab("Crude death rate") +
labs(title = "Global population rates",
subtitle = "2013; per 1,000 population",
caption = "Source: UN World Population Prospects (https://esa.un.org/unpd/wpp/)") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(size = 8))
p
```
* `ggtitle()` can instead be used to set the title and subtitle
* `scale_x_*()` and `scale_y_*()` can instead be used to set the x and y axis labels, respectively.
* As of ggplot2 v2.2.0, title(s) are left-aligned by default.
???
* Because it''s just good practice and is pretty critical to understanding a data visualization, let's add some labels to each axis and a title, subtitle, and data source caption to the graph.
* There are multiple ways to specify most labels in ggplot. This is just one option that seemed most efficient for this situation. Since we aren't changing other aspects of the x and y axis, I used `xlab()` and `ylab()` instead of the `scale_x_continuous()` and `scale_y_continuous()` functions. Likewise, because we're setting a caption here as well as a title and subtitle, I used `labs()` instead of `ggtitle()`, since `ggtitle()` can't set captions.
* Since newer versions of ggplot left-align titles, we're manually centering them via the `theme()` function.
* Also, subtitles and captions are new as of ggplot2 v2.2.0.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/label_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* OK, so now it's a lot clearer what's going on here, but the label on the legend sticks out like a sore thumb. Let's fix that.
---
# Improve legend and colors
```r
p = p +
scale_colour_brewer("World Bank Income Group",
palette = "Dark2") +
theme(legend.position = "bottom")
p
```
* For discrete/qualitative color schemes: `scale_colour_hue()`, `scale_colour_discrete()`, `scale_colour_brewer()`, `scale_colour_grey()`, `scale_colour_manual()`, `scale_colour_identity()`
* For continuous (sequential or diverging) color schemes: `scale_colour_continuous()`, `scale_colour_distiller()`, `scale_colour_gradient()`, `scale_colour_gradient2()`, `scale_color_gradientn()`, `scale_colour_date()`, `scale_colour_datetime()`
???
* Because the aesthetic that we want to rename is color, we'll call some `scale_colour_*()` function to rename it. Here, I'm using `scale_colour_brewer()` where the first argument is the new name. I've listed the other `scale_colour_*()` functions here. There are similar functions for each type of aesthetic, like shape, linetype, and so on.
* While we're at it, we can also tweak some visual elements. This isn't really required, but the default colors provided by ggplot aren't necessarily the best ones. For example, you might want to use colors that are more accessible for viewers who are color-blind.
* We can also move the legend around so it doesn't waste so much space where we could instead be displaying data.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/legend_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* OK, so our legend is now a little less conspicuous down here at the bottom, it has a sensible title, and we've got a color scheme that's maybe a little easier to see and doesn't scream "ggplot defaults."
---
# R color palette packages
- [RColorBrewer](https://CRAN.R-project.org/package=RColorBrewer) ([palette browser](http://colorbrewer2.org)): custom color palettes with colorblind friendly, print-friendly, and photocopy-friendly options
- [munsell](https://github.com/cwickham/munsell): called by `scales` to generate continuous/gradient palettes in `ggplot2`
- [colorspace](https://CRAN.R-project.org/package=colorspace): color palettes based on the HCL (Hue-Chroma-Luminance) and HSV (Hue-Saturation-Value) systems
- [viridis](https://github.com/sjmgarnier/viridis): color schemes from Matplotlib (Python plotting library)
- [dichromat](https://CRAN.R-project.org/package=dichromat) ([palette browser](http://geography.uoregon.edu/datagraphics/color_scales.htm)): color palettes for color-impaired viewers
- [pals](https://github.com/kwstat/pals): color palettes and palette evaluation tools
- [ggsci](https://github.com/road2stat/ggsci/): color palettes inspired by scientific journals, dataviz libraries, scifi movies, TV shows
- [wesanderson](https://github.com/karthik/wesanderson): self-explanatory ([tumblr inspiration](http://wesandersonpalettes.tumblr.com/))
- [scales](http://cran.r-project.org/web/packages/scales/index.html): additional functions to deal with ggplot `scale`s, including color palettes
- `\(\ldots\)`
???
* If you'd like to explore some other color schemes, here are a few packages that can help you do that.
* The RColorBrewer package is actually what's driving the `scale_colour_brewer()` function that I used here.
* The munsell package was created by Hadley Wickham's sister and is what is used to create the default color schemes in ggplot.
* The dichromat package is pretty cool because it not only provides some color palettes, but it can also show you what any color palette would look like to a person with one of several types of color-blindness.
---
# Add a summary (or two)
```r
p = p +
stat_ellipse(type = "t", level = 0.9,
segments = 80, alpha = 0.5) +
geom_smooth(aes(color = NULL), method = "loess",
span = 0.6, se = FALSE,
color = "grey40", linetype = "longdash")
p
```
* An alternative way to map an aesthetic to only some layers: specify the mapping in each layer but not in the call to `ggplot`.
???
* OK, back to our scatterplot. Another common thing we might want to do with data we've plotted is to summarize it and display that summary over the raw data.
* Here, we're adding two summaries. The first is a 90% confidence ellipse, which is really just an ad hoc way of showing where we'd expect most of our data points to lie. The second is a locally weighted scatterplot smoother, or loess regression, with a span of 60%. I've also turned off the confidence band for the regression line because it doesn't seem to add much here and detracts from the rest of the plot.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/summary_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* At the risk of making the plot a little "busy," these summaries seem to elucidate the relationship between these variables in this sample.
* The dashed line is our non-parametric regression fit and sheds some light on the pattern across groups.
* The ellipses give us an idea of what's going on within each group, on the whole. They've ordered themselves very nicely along the x-axis in the order of their income group.
---
# Tweak the theme and text
```r
old_theme = theme_set(theme_minimal(base_size = 14))
p
```
* Theme options: `theme_gray()`, `theme_bw()`, `theme_linedraw()`, `theme_light()`, `theme_dark()`, `theme_minimal()`, `theme_classic()`, `theme_void()`
* Alternatively, the call to `theme_minimal()` above could be "added" to the plot `p`. However, this would wipe out previous changes made to `p` via `theme()`.
???
* Now, we might like to change the theme to something a little cleaner. `theme_grey()` is the default ggplot2 theme, with its recognizable grey background, but not everyone loves it. Luckily, there are lots of built-in options to choose from.
* Here, we'll set the theme to something much cleaner since there's already a lot going on in our plot. We can also bump up the font size while we're at it to make the text more readable.
* Like everything in ggplot, there are multiple ways to accomplish this. Here, I'm using `theme_set()` because we've already modified parts of the theme (like the alignment of the title and subtitle and position of the legend). If we were to "add" `theme_minimal()` to our plot object, it would obliterate those theme changes and we'd have to redo them. However, `theme_set()` keeps them intact.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/theme_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* And here we have a cleaner plot with text that's more legible.
---
# `ggplot2` theme packages
- [ggthemes](https://github.com/jrnold/ggthemes): extra `geom`s, `scale`s, and `theme`s for use with `ggplot2`
- [hrbrthemes](https://github.com/hrbrmstr/hrbrthemes): typography-centric themes for `ggplot2`
- [xkcd](https://CRAN.R-project.org/package=xkcd): `ggplot2` plot theme in the style of [XKCD comics](https://xkcd.com/)
- [ggplot2bdc](https://github.com/briandconnelly/ggplot2bdc): 'clean' and specialized themes + additional useful functions
- [ggthemr](https://github.com/cttobin/ggthemr): themes with predefined color palettes and options to modify other elements
- `\(\ldots\)`
???
* Here are a handful of packages that provide alternative ggplot2 themes.
* XKCD is more of a proof of concept than anything, I guess, but if you ever need it, it's there.
---
# Tweak the theme and text
```r
library(showtext)
sysfonts::font.add.google("Open Sans", "open_sans")
showtext::showtext.auto()
p = p + theme(text = element_text(family = "open_sans"))
p
```
* To use a font installed on the local system, call `font.add()` instead.
* Can instead set font via `base_family` argument of any `theme_*()` function.
* Turn off use of `showtext`: `showtext.auto(FALSE)`
* Alternatively, use `showtext.begin()` and `showtext.end()` to only turn on `showtext` as desired
* See [this blog post](http://statr.me/2014/07/showtext-with-knitr/) for details on using `showtext` with `knitr`
???
* At the risk of getting a little obsessive, let's try changing the font in our plot. This is something that may come up when you're preparing plots for publication, as some journals may want you to use Arial or some other specific font in everything you submit, including plots. You can accomplish this via other means, like Adobe Illustrator, but to me, this is much easier.
* Here, I'm using a newer package called `showtext` to apply the font to the plot and a package called `sysfonts` to grab the font I want from Google fonts.
---
class: fullscreen, middle, center
<img src="index_files/figure-html/font_scatterplot1-1.png" style="display: block; margin: auto;" />
???
* The difference here is subtle but the font has been switched out.
* You can also use fonts installed on your local system, and this package will convert them into shapes so that, if you save your plot as a pdf or something, the fonts don't have to be embedded to display properly on someone else's computer.
* The author of showtext recommends [`Cairo`](https://CRAN.R-project.org/package=Cairo) graphics devices for raster output due to sub-par antialiasing with default devices. I didn't have problems with the default graphics device, but your mileage may vary.
---
# Font packages
.pull-left[
**Access system/Google/etc. fonts:**
- [showtext](https://github.com/yixuan/showtext): Easily use alternative fonts in R plots. Allows resulting (vector) graphics files to be font-independent.
- [sysfonts](https://github.com/yixuan/sysfonts): Companion package to [showtext](https://github.com/yixuan/showtext). Loads system and Google fonts.
- [extrafont](https://github.com/wch/extrafont): Use system TrueType fonts in R plots. Embeds fonts in resulting (vector) graphics files. A little more work to use but resulting files should retain text editability.
- [tikzDevice](https://github.com/yihui/tikzDevice): `\(\LaTeX\)`-friendly R graphics output
]
.pull-right[
**Emoji/icons/custom images:**
- [emojifont](https://github.com/GuangchuangYu/emojifont)
- [emoGG](https://github.com/dill/emoGG)
- [ggimage](https://github.com/GuangchuangYu/ggimage)
- [ggflags](https://github.com/baptiste/ggflags)
- [rphylopic](https://github.com/sckott/rphylopic)
]
???
* The packages listed here on the left are for changing fonts on regular text and/or making sure that text renders the way you want it to in your output file. `extrafont` is kind of the original package in this arena, and it works great, but it requires a little more work to get it going, so I went with the simpler example for this presentation.
* On the right, you'll see some packages that allow you to use emoji and custom icons or images, mostly as plot markers rather than in place of text. For example, the ggflags package allows you to use country flags as markers in a scatterplot.
---
# Label observations of interest
```r
library(dplyr)
library(ggrepel)
p = p + geom_label_repel(data = subset(who_cbdr,
!is.na(country)) %>%
group_by(worldbankincomegroup) %>%
top_n(3, abs(scale(value.birthsper1000)) +
abs(scale(value.deathsper1000))),
aes(label = country),
show.legend = FALSE,
size = 3.5,
alpha = 0.65,
box.padding = unit(0.6, "lines"),
point.padding = unit(0.4, "lines"),
segment.color = "grey50",
segment.alpha = 0.65)
p
```
* For a pure `ggplot2` solution, try `geom_label()` or `geom_text()` with `nudge_x` and/or `nudge_y` arguments
---
class: fullscreen, middle, center
<img src="index_files/figure-html/labelobs_scatterplot1-1.png" style="display: block; margin: auto;" />
---
# Packages for labeling & orientation
- [ggrepel](https://github.com/slowkow/ggrepel): prevent plot labels from overlapping
- [directlabels](https://github.com/tdhock/directlabels) ([docs](http://directlabels.r-forge.r-project.org/)): directly label multicolor plots
- [ggstance](https://github.com/lionel-/ggstance): horizontal versions of common ggplots
???
---
# A heatmap with `ggplot2`
```r
wrap_labels = function(x, width = 15) {
wrapped = strwrap(x, width = width, simplify = FALSE)
lapply(wrapped, paste, collapse = "\n")
}
p1 = ggplot(who_yllmajorcause,
aes(x = ghecauses, y = region, fill = med.pct)) +
geom_tile() +
scale_fill_distiller("Percent of total",
palette = "Blues",
direction = 1,
breaks = seq(0, 100, 25),
limits = c(0, 100),
guide = guide_colorbar(barwidth = 15,
barheight = 1)) +
scale_x_discrete("Cause Group", expand = c(0, 0),
labels = wrap_labels) +
scale_y_discrete("Region", expand = c(0, 0),
labels = wrap_labels) +
ggtitle("Median percent of years of life lost",
subtitle = "in 2012 by cause and region") +
theme_minimal(base_size = 16) +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
p1
```
???
* Note that there is a package called `gplots` with a function called `heatmap.2`: this is commonly used to make heatmaps with dendrograms but is not the same as `ggplot`.
* expand argument on scale_x_discrete and scale_y_discrete ensures that there is no margin inside the plot - the tiles run all the way to the edge
* the labels argument accepts a mapping between new labels and old labels or a function
---
class: fullscreen, middle, center
<img src="index_files/figure-html/heatmap1-1.png" style="display: block; margin: auto;" />
---
# Creating a dendrogram with `ggdendro`
```r
library(tidyr)
library(ggdendro)
who_yllmajorcause_mat = who_yllmajorcause %>%
spread(region, med.pct) # long --> wide
rownames(who_yllmajorcause_mat) = who_yllmajorcause_mat[,"ghecauses"]
col_idx = which(colnames(who_yllmajorcause_mat)=="ghecauses")
who_yllmajorcause_mat = as.matrix(who_yllmajorcause_mat[, -col_idx])
yll_hc = hclust(dist(t(who_yllmajorcause_mat)), "average")
# plot the raw dendrogram
ggdendrogram(yll_hc, rotate = TRUE)
```
???
* transform the data frame from long to wide using tidyr::spread
* do a little housekeeping to get a matrix that the dist() function will like
* hierarchical clustering (hclust()) on the distances using average or UPGMA agglomeration method
* create a dendrogram and rotate it so that the branches are horizontal, not vertical (corresponding with rows of heatmap)
---
class: fullscreen, middle, center
<img src="index_files/figure-html/dendro1-1.png" style="display: block; margin: auto;" />
---
# Simplifying the dendrogram
```r
library(grid) # unit
yll_dendro = as.dendrogram(yll_hc)
yll_ddata = dendro_data(yll_dendro)
p2 = ggplot(segment(yll_ddata)) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
coord_flip() +
theme_dendro()
p2
```
---
class: fullscreen, middle, center
<img src="index_files/figure-html/simpledendro1-1.png" style="display: block; margin: auto;" />
---
# Heatmap, reordered
```r
# just in case there are extra factor levels, etc., un-factor
old_levels = levels(who_yllmajorcause$region)
who_yllmajorcause$region = as.character(who_yllmajorcause$region)
# re-factor with clustering/dendrogram ordering
new_order = order.dendrogram(yll_dendro)
who_yllmajorcause$region = factor(who_yllmajorcause$region,
old_levels[new_order])
# recreate heatmap with modifed data
p1 = p1 %+% who_yllmajorcause
p1
```
---
class: fullscreen, middle, center
<img src="index_files/figure-html/heatmap1ro-1.png" style="display: block; margin: auto;" />
---
# Putting it all together
```r
library(gtable)
### dendrogram grob
# tweak these if the dendrogram doesn't line up:
dendro_margin = grid::unit(c(15, 0, -5, -10), "points")
p2_grob = ggplotGrob(p2 + theme(plot.margin = dendro_margin))
### heatmap grob
p1_grob = ggplotGrob(p1)
# add some space for the dendrogram
g1 = gtable_add_cols(p1_grob, grid::unit(4, "cm"))
# adjust top ("t") and bottom ("b") if no title/subtitle:
g = gtable_add_grob(g1,
p2_grob,
t = 5, b = 6,
l = ncol(g1), r = ncol(g1))
grid.newpage()
grid.draw(g)
```
---
class: fullscreen, middle, center
<img src="index_files/figure-html/heatmapdendro1-1.png" style="display: block; margin: auto;" />
---
# Packages to pair with `ggplot2`
.pull-left[
**Plot wrangling:**
- [grid](https://stat.ethz.ch/R-manual/R-devel/library/grid/html/00Index.html): now a 'base' package (installed with vanilla R). `grid` graphics are the foundation on which `ggplot2` and `lattice` build.
- [gridExtra](http://cran.r-project.org/web/packages/gridExtra/index.html): additional functions to tweak/manipulate grid graphics
- [scales](http://cran.r-project.org/web/packages/scales/index.html): additional functions to deal with `scale`s
- [gtable](http://cran.r-project.org/web/packages/gtable/index.html): use to dismantle/hack underlying table of Grid Graphical Objects (grobs) that make up a ggplot
- [ggsubplot](http://cran.r-project.org/web/packages/ggsubplot/index.html): embed smaller subplots within larger plots
- [cowplot](https://github.com/wilkelab/cowplot): arrange and label multiple plots on a grid, add overlays, etc.
]
.pull-right[
**Data wrangling:**
- [dplyr](http://cran.r-project.org/web/packages/dplyr/index.html): manipulate data
- [tidyr](http://cran.r-project.org/web/packages/tidyr/index.html): restructure data (esp. wide `\(\leftrightarrow\)` long)
- [lubridate](http://cran.r-project.org/web/packages/lubridate/index.html): "makes working with dates fun instead of frustrating"
- and other [tidyverse](https://github.com/tidyverse/) packages
]
---
# Packages to pair with `ggplot2`
## Specialized Geoms/Stats
- [ggdendro](https://github.com/andrie/ggdendro), [dendextend](https://github.com/talgalili/dendextend): dendrograms and tree diagrams with `ggplot2`
- [ggtern](http://www.ggtern.com/): ternary diagrams (as in `vcd::ternaryplot`) and other additional *geom*s for `ggplot2`.
- [GGally](https://github.com/ggobi/ggally): scatterplot plot matrices (as in `graphics::pairs`), pairwise plot matrices, parallel coordinates plots, survival plots, network plots, etc. with `ggplot2`
- [ggHorizon](https://github.com/thomaskern/ggHorizon): horizon graphs with `ggplot2` ([example 1](http://www.perceptualedge.com/articles/visual_business_intelligence/time_on_the_horizon.pdf), [example 2](http://vis.berkeley.edu/papers/horizon/2009-TimeSeries-CHI.pdf))
- [ggmosaic](https://github.com/haleyjeppson/ggmosaic): Mosaic plots
- [survminer](https://github.com/kassambara/survminer): Survival curves
- [waffle](https://github.com/hrbrmstr/waffle): waffle charts, sometimes referred to as square pie charts
- [slopegraph](https://github.com/leeper/slopegraph): visualization created by Edward Tufte for plotting timeseries data.
- [ggradar](https://github.com/ricardo-bion/ggradar): radar charts
- [ggbio](http://bioconductor.org/packages/release/bioc/html/ggbio.html): `ggplot2` extensions for the visualization of genomic data
---
# Packages to pair with `ggplot2`
## Specialized Geoms/Stats
- [ggstraw](https://github.com/nacnudus/ggstraw): visualize the difference between two events related to one object
- [ggraph](https://github.com/thomasp85/ggraph): `geom`s, `facet`s, and layouts for networks, graphs, trees, etc.
- [geomnet](https://github.com/sctyner/geomnet): `geom`s and `stat`s for network visuaslization
- [ggtree](https://guangchuangyu.github.io/ggtree/): visualize and annotate phylogenetic trees
- [ggnetwork](https://github.com/briatte/ggnetwork) ([docs](https://briatte.github.io/ggnetwork/)): `geom`s for network plots
- [ggTimeSeries](https://github.com/Ather-Energy/ggTimeSeries): time series visualizations
- [ggseas](https://github.com/ellisp/ggseas): seasonal adjustment tools
- [plotROC](https://github.com/sachsmc/plotROC): ROC plots using ggplot2. Some interactive functionality.
- [classifierplots](https://github.com/ambiata/classifierplots): visualize classifier performance as grid of diagnostic plots
- [ggExtra](https://github.com/daattali/ggExtra): marginal histograms, etc.
- [ggpmisc](https://bitbucket.org/aphalo/ggpmisc): add equations and parameters from model fits as text or labels, label peaks, valleys, or observations in low density regions, etc.
- [WVPlots](https://github.com/WinVector/WVPlots): "pre-packaged" ggplots
- [ggforce](https://github.com/thomasp85/ggforce), [ggalt](https://github.com/hrbrmstr/ggalt): additional coordinate systems, `geom`s, etc.
---
# Packages to pair with `ggplot2`
## Maps
- [ggmap](https://github.com/dkahle/ggmap): allows visualization of spatial data and models on top of Google Maps, OpenStreetMaps, or Stamen Maps using ggplot2
- [maps](http://cran.r-project.org/web/packages/maps/index.html): maps with `ggplot2`
- [maptools](http://cran.r-project.org/web/packages/maptools/index.html)
- [sp](http://cran.r-project.org/web/packages/sp/index.html)
- [rgdal](http://cran.r-project.org/web/packages/rgdal/index.html)
- [rworldmap](https://github.com/AndySouth/rworldmap/)
- [RgoogleMaps](http://cran.r-project.org/web/packages/RgoogleMaps/index.html)
- [statebins](https://github.com/hrbrmstr/statebins)
## Et Cetera
#### [Gallery of ggplot2 extensions](http://www.ggplot2-exts.org/gallery/)
???
* The examples for ggsubplot are a little "busy" but it actually can be useful in practice. For example, if you want to zoom in on a particular part of a plot and show it as an inset, or show an inset of what a particular graph would look like under conditions like the null, and so on.
* Note that the munsell color system is what is used by ggplot by default, but you may want to use the same color system to make your own palettes.
---
class: inverse, middle, center
# Interactive graphics in R
---
# Why interactive?
.pull-left[
![](assets/img/economist_quickanddead.png)
]
.pull-right[
- They're pretty, fun, and they engage audiences (see [Hans Rosling's TED talks](https://www.ted.com/speakers/hans_rosling)).
- They allow you to connect with, explore, and discover more about your data -- visually.
- Static graphics are "dead" (according to The Economist).
]
---
# Interactive and animated `ggplot2` graphics
- [ggiraph](https://github.com/davidgohel/ggiraph): make components of ggplot2 graphics **interactive** via additional aesthetics and geoms
- [gganimate](https://github.com/dgrtwo/gganimate): create animated ggplot2 plots
- [plotly](https://github.com/ropensci/plotly) ([docs](https://plot.ly/r/): Convert `ggplot2` graphics `\(\rightarrow\)` interactive graphics via `ggplotly()` OR create interactive graphics directly with `plot_ly()`, which function via the plot.ly [ggplot2 library](https://plot.ly/ggplot2/) and [R library](https://plot.ly/r/), respectively. Can operate entirely locally (meaning your plot doesn't have to be uploaded to plot.ly's servers and shared with the world).
---
# `ggplot2` + `ggiraph`
```r
p_basic = ggplot(data = subset(who_cbdr, !is.na(country)),
aes(x = value.birthsper1000,