generated from jtr13/cctemplate
-
Notifications
You must be signed in to change notification settings - Fork 67
/
intro_lattice_tutorial.Rmd
239 lines (171 loc) · 8.32 KB
/
intro_lattice_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# (PART) Data Visualizations {-}
# Introduction to the lattice package
Eubin Park
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(lattice)
library(car)
```
The lattice package is a data visualization package created by Deepayan Sarkar. It is an add-on package that improves the defaults on base R, with an emphasis on displaying **multivariate data** - supporting the creation of trellis graphs. The strength of the lattice package is mainly from its ability to manage dependent data.
The general format of plotting using lattice functions is: **graph_type(formula, data)**. The main workhorse function in the lattice package is `xyplot()`.
## Producing a Plot in Lattice
We can begin by creating the most basic of all plots in Lattice - a scatterplot. In Lattice, this is done by using the xyplot. In this example, we will use the iris dataset.
```{r}
data(iris)
```
```{r}
xyplot(Sepal.Length ~ Sepal.Width, data = iris,
xlab = "Sepal Width",
ylab = "Sepal Length")
```
This type of scatterplot should be very familiar to many. As seen above, we can see that the basic method of plotting using `xyplot()` is through the symbolic formula `y ~ x`, where `x` is the independent variable and `y` is the dependent variable.
## Plotting by Groups
There are 2 main ways of going about plotting multivariate data in lattice.
**1. Superposition:**
All data is plotted in the same region of the graph, but distinct groups are able to be categorized by varying plot features such as color, shapes, etc. To use superposition in plots, the **groups** argument must be specified.
**2. Juxtaposition:**
Data is plotted in separate regions of a larger graph. To use juxtaposition in plots, one must specify a **conditioning statement**, such as: `y ~ x | z`, where `z` is the conditioning variable.
The difference between superposition and juxtaposition can be shown below:
Superposition:
```{r}
xyplot(Sepal.Length ~ Sepal.Width, data=iris,
groups=iris$Species, # use groups argument
auto.key=list(text=c("setosa", "versicolor", "virginica")),
xlab = "Sepal Width",
ylab = "Sepal Length")
```
Juxtaposition:
```{r}
xyplot(Sepal.Length ~ Sepal.Width | Species, data=iris, # add conditioning statement
pch=1, col="black",
xlab = "Sepal Width",
ylab = "Sepal Length")
```
As seen above, the only real difference in plotting the two graphs is whether one uses the groups argument (Supposition) or conditioning statement (Juxtaposition), but Lattice is able to create two very different graphs with this small difference.
There are many problems with the supposition plot above that the juxtaposition plot overcomes. For example, there is a good deal of over-plotting in the first plot, and as a result it is difficult to distinguish clear trends within each species group. However, these problems are not seen in the juxtaposition plot.
This sort of advanrage becomes much more conspicuous when dealing with larger multivariate datasets. In this next example, we will use the quakes dataset.
```{r}
data(quakes)
```
```{r}
# Create shingles
Depth = equal.count(quakes$depth, number = 8, overlap = .1)
# Plot graph using supposition
xyplot(lat ~ long, data = quakes,
groups = Depth,
xlab = "Longitude",
ylab = "Latitude")
```
In the above example, we have created **shingles** in order to essentially bin the data. Each shingle contains the data from some subset of the variable it is being created from.
```{r}
# Plot graph using juxtaposition
xyplot(lat ~ long | Depth, data = quakes,
xlab = "Longtitude",
ylab = "Latitude")
```
In this example, it is clear to see the advantages of using the juxtaposition method of plotting.
If we want to re-arrange the panels in the above plot, we can use the `layout` argument. This argument takes a vector of three values: number of rows, number of columns, and number of pages.
```{r}
# Use layout argument
xyplot(lat ~ long | Depth, data = quakes,
layout = c(3, 3, 1),
xlab = "Longtitude",
ylab = "Latitude")
```
However, by using this argument, we have skewed the shapes of the plots. We can fix this using the `aspect` argument, which controls the ratio of the plots.
```{r}
# Use aspect argument
xyplot(lat ~ long | Depth, data = quakes,
aspect = 1,
layout = c(3, 3, 1),
xlab = "Longtitude",
ylab = "Latitude")
```
If we wanted to fit regression lines into each panel, we can use the panel function argument of the `xyplot` function.
```{r}
# Use the panel function argument
xyplot(lat ~ long | Depth, data = quakes,
panel = function(x,y,subscripts,...){
panel.points(x,y,...)
panel.lmline(x,y,...) })
```
## Histograms and Density Plots
Lattice offers other options too, such as histograms and density plots. In this example, we will use the Duncan dataset from the car package.
```{r}
data(Duncan)
```
To make a histogram with the lattice package, use the `histogram()` function.
```{r}
histogram(~ prestige, data=Duncan,
type="count", # can take 'count', 'percent', or 'density'
nint = 10, # number of bins
endpoints = c(0, 100)
)
```
To make a density plot with lattice package, use the `densityplot()` function.
```{r}
densityplot(~ prestige, data = Duncan,
col = "black",
plot.points = F # specify whether to have data points
)
```
We can even combine histograms and density plots using the panel function argument. While doing so, we can split the data into separate panels, which is useful for multivariate data.
```{r}
b <- with(Duncan, do.breaks(range(income), 3))
xyplot(~income | type, data=Duncan,
xlim = range(b), ylim = c(0, 0.04),
panel = function(x){
panel.histogram(x,
breaks = b,
col="gray80")
panel.densityplot(x,
darg =list(n=100),
col="red",
lwd=1.5,
plot.points=F)
})
```
## Boxplots, Violinplots, and Dotplots
Some other options offered by the Lattice package include boxplots and dotplots. In this example, we will use the ToothGrowth dataset.
To make a boxplot, use the `bwplot()` function. As always with the lattice package, you can use a conditioning statement to create juxtaposing panels.
```{r}
bwplot(len ~ supp | dose, data = ToothGrowth,
layout = c(3, 1),
xlab = "Dose", ylab = "Length")
```
To make a violin plot, use the `bwplot()` function and specify the `panel` argument.
```{r}
bwplot(len ~ supp | dose, data = ToothGrowth,
layout = c(3, 1),
panel = panel.violin, # specify panel argument to make violin plot
xlab = "Dose", ylab = "Length")
```
To make a dotplot, use the `dotplot()` function.
```{r}
dotplot(len ~ supp | dose, data = ToothGrowth,
layout = c(3, 1),
xlab = "Dose", ylab = "Length")
```
## Trivariate Plots
One option when displaying trivariate continuous data is to utilize all 3 axes. This can be done with a three-dimensional scatterplot.
In the lattice package, one can create such a plot using the `cloud()` function. This function takes in a symbolic formula as its first argument, in the form: `z ~ x * y`, where x, z, and y are the three continuous variables.
In this example we will use the quakes dataset again.
```{r}
cloud(depth ~ lat * long, data=quakes)
```
To view the data from a different perspective, you can rotate the plot using the `screen` argument. You can play with this feature until you find the best view of your data.
```{r}
cloud(depth ~ lat * long, data=quakes,
screen = list(z = 105, x = -70))
```
Unfortunately, interactive options are not available in the Lattice package.
## Pros and Cons of Lattice
Now that we have a basic overview of the kinds of things the lattice package can do, let's discuss some of the advantages and disadvantages of this data visualization package.
Pros:
* Very good at allowing one to visualize multivariate data, i.e. comparing how some variable y changes with some variable x across levels of some other variable z
* Many settings set automatically because the entire plot is created at once.
Cons:
* Can be difficult to flesh out an entire plot in one method call
* Cannot add more elements to a plot once it is created; it has to be modified.