-
Notifications
You must be signed in to change notification settings - Fork 229
/
data-transformation.html
1241 lines (1197 loc) · 112 KB
/
data-transformation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html >
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Exercise Solutions and Notes for R for Data Science</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta name="description" content="Exercise Solutions and Notes for “R for Data Science”">
<meta name="generator" content="bookdown 0.3.6 and GitBook 2.6.7">
<meta property="og:title" content="Exercise Solutions and Notes for R for Data Science" />
<meta property="og:type" content="book" />
<meta name="github-repo" content="jrnold/e4qf" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Exercise Solutions and Notes for R for Data Science" />
<meta name="author" content="Jeffrey B. Arnold">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<link rel="prev" href="workflow-basics.html">
<link rel="next" href="exploratory-data-analysis.html">
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<script src="libs/htmlwidgets-0.8/htmlwidgets.js"></script>
<link href="libs/str_view-0.1.0/str_view.css" rel="stylesheet" />
<script src="libs/str_view-binding-1.1.0/str_view.js"></script>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link rel="stylesheet" href="r4ds.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li><strong><a href="./">R for Data Science</a></strong></li>
<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>Welcome</a></li>
<li class="part"><span><b>I Explore</b></span></li>
<li class="chapter" data-level="1" data-path="explore-intro.html"><a href="explore-intro.html"><i class="fa fa-check"></i><b>1</b> Introduction</a></li>
<li class="chapter" data-level="2" data-path="visualize.html"><a href="visualize.html"><i class="fa fa-check"></i><b>2</b> Visualize</a><ul>
<li class="chapter" data-level="2.1" data-path="visualize.html"><a href="visualize.html#introduction"><i class="fa fa-check"></i><b>2.1</b> Introduction</a><ul>
<li class="chapter" data-level="2.1.1" data-path="visualize.html"><a href="visualize.html#prerequisites"><i class="fa fa-check"></i><b>2.1.1</b> Prerequisites</a></li>
<li class="chapter" data-level="2.1.2" data-path="visualize.html"><a href="visualize.html#first-steps"><i class="fa fa-check"></i><b>2.1.2</b> First Steps</a></li>
<li class="chapter" data-level="2.1.3" data-path="visualize.html"><a href="visualize.html#aesthetic-mappings"><i class="fa fa-check"></i><b>2.1.3</b> Aesthetic mappings</a></li>
<li class="chapter" data-level="2.1.4" data-path="visualize.html"><a href="visualize.html#facets"><i class="fa fa-check"></i><b>2.1.4</b> Facets</a></li>
<li class="chapter" data-level="2.1.5" data-path="visualize.html"><a href="visualize.html#geometric-objects"><i class="fa fa-check"></i><b>2.1.5</b> Geometric Objects</a></li>
<li class="chapter" data-level="2.1.6" data-path="visualize.html"><a href="visualize.html#statistical-transformations"><i class="fa fa-check"></i><b>2.1.6</b> Statistical Transformations</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="visualize.html"><a href="visualize.html#position-adjustments"><i class="fa fa-check"></i><b>2.2</b> Position Adjustments</a></li>
<li class="chapter" data-level="2.3" data-path="visualize.html"><a href="visualize.html#coordinate-systems"><i class="fa fa-check"></i><b>2.3</b> Coordinate Systems</a><ul>
<li class="chapter" data-level="2.3.1" data-path="visualize.html"><a href="visualize.html#exercises-3"><i class="fa fa-check"></i><b>2.3.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="workflow-basics.html"><a href="workflow-basics.html"><i class="fa fa-check"></i><b>3</b> Workflow Basics</a><ul>
<li class="chapter" data-level="3.1" data-path="workflow-basics.html"><a href="workflow-basics.html#practice"><i class="fa fa-check"></i><b>3.1</b> Practice</a><ul>
<li class="chapter" data-level="3.1.1" data-path="workflow-basics.html"><a href="workflow-basics.html#exercises-4"><i class="fa fa-check"></i><b>3.1.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="data-transformation.html"><a href="data-transformation.html"><i class="fa fa-check"></i><b>4</b> Data Transformation</a><ul>
<li class="chapter" data-level="4.1" data-path="data-transformation.html"><a href="data-transformation.html#prerequisites-1"><i class="fa fa-check"></i><b>4.1</b> Prerequisites</a></li>
<li class="chapter" data-level="4.2" data-path="data-transformation.html"><a href="data-transformation.html#filter"><i class="fa fa-check"></i><b>4.2</b> Filter</a></li>
<li class="chapter" data-level="4.3" data-path="data-transformation.html"><a href="data-transformation.html#exercises-5"><i class="fa fa-check"></i><b>4.3</b> Exercises</a></li>
<li class="chapter" data-level="4.4" data-path="data-transformation.html"><a href="data-transformation.html#arrange"><i class="fa fa-check"></i><b>4.4</b> Arrange</a><ul>
<li class="chapter" data-level="4.4.1" data-path="data-transformation.html"><a href="data-transformation.html#exercises-6"><i class="fa fa-check"></i><b>4.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="data-transformation.html"><a href="data-transformation.html#mutate"><i class="fa fa-check"></i><b>4.5</b> Mutate</a><ul>
<li class="chapter" data-level="4.5.1" data-path="data-transformation.html"><a href="data-transformation.html#exercises-7"><i class="fa fa-check"></i><b>4.5.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="4.6" data-path="data-transformation.html"><a href="data-transformation.html#grouped-summaries-with-summarise"><i class="fa fa-check"></i><b>4.6</b> Grouped summaries with <code>summarise()</code></a><ul>
<li class="chapter" data-level="4.6.1" data-path="data-transformation.html"><a href="data-transformation.html#exercises-8"><i class="fa fa-check"></i><b>4.6.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="4.7" data-path="data-transformation.html"><a href="data-transformation.html#grouped-mutates-and-filters"><i class="fa fa-check"></i><b>4.7</b> Grouped mutates and filters</a><ul>
<li class="chapter" data-level="4.7.1" data-path="data-transformation.html"><a href="data-transformation.html#exercises-9"><i class="fa fa-check"></i><b>4.7.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="5" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html"><i class="fa fa-check"></i><b>5</b> Exploratory Data Analysis</a><ul>
<li class="chapter" data-level="5.1" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#introduction-1"><i class="fa fa-check"></i><b>5.1</b> Introduction</a><ul>
<li class="chapter" data-level="5.1.1" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#questions"><i class="fa fa-check"></i><b>5.1.1</b> Questions</a></li>
<li class="chapter" data-level="5.1.2" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#variation"><i class="fa fa-check"></i><b>5.1.2</b> Variation</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#missing-values"><i class="fa fa-check"></i><b>5.2</b> Missing Values</a><ul>
<li class="chapter" data-level="5.2.1" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#exercises-11"><i class="fa fa-check"></i><b>5.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#covariation"><i class="fa fa-check"></i><b>5.3</b> Covariation</a><ul>
<li class="chapter" data-level="5.3.1" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#a-categorical-and-continuous-variable"><i class="fa fa-check"></i><b>5.3.1</b> A categorical and continuous variable</a></li>
<li class="chapter" data-level="5.3.2" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#two-categorical-variables"><i class="fa fa-check"></i><b>5.3.2</b> Two categorical variables</a></li>
<li class="chapter" data-level="5.3.3" data-path="exploratory-data-analysis.html"><a href="exploratory-data-analysis.html#two-continuous-variables"><i class="fa fa-check"></i><b>5.3.3</b> Two continuous variables</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>II Wrangle</b></span></li>
<li class="chapter" data-level="6" data-path="tibbles.html"><a href="tibbles.html"><i class="fa fa-check"></i><b>6</b> Tibbles</a><ul>
<li class="chapter" data-level="6.1" data-path="tibbles.html"><a href="tibbles.html#prerquisites"><i class="fa fa-check"></i><b>6.1</b> Prerquisites</a></li>
<li class="chapter" data-level="6.2" data-path="tibbles.html"><a href="tibbles.html#creating-tibbles"><i class="fa fa-check"></i><b>6.2</b> Creating Tibbles</a></li>
<li class="chapter" data-level="6.3" data-path="tibbles.html"><a href="tibbles.html#tibbles-vs.data.frame"><i class="fa fa-check"></i><b>6.3</b> Tibbles vs. data.frame</a></li>
<li class="chapter" data-level="6.4" data-path="tibbles.html"><a href="tibbles.html#subsetting"><i class="fa fa-check"></i><b>6.4</b> Subsetting</a></li>
<li class="chapter" data-level="6.5" data-path="tibbles.html"><a href="tibbles.html#interacting-with-older-code"><i class="fa fa-check"></i><b>6.5</b> Interacting with older code</a></li>
<li class="chapter" data-level="6.6" data-path="tibbles.html"><a href="tibbles.html#exercises-12"><i class="fa fa-check"></i><b>6.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="data-import.html"><a href="data-import.html"><i class="fa fa-check"></i><b>7</b> Data Import</a><ul>
<li class="chapter" data-level="7.1" data-path="data-import.html"><a href="data-import.html#introduction-2"><i class="fa fa-check"></i><b>7.1</b> Introduction</a></li>
<li class="chapter" data-level="7.2" data-path="data-import.html"><a href="data-import.html#getting-started"><i class="fa fa-check"></i><b>7.2</b> Getting started</a><ul>
<li class="chapter" data-level="7.2.1" data-path="data-import.html"><a href="data-import.html#exercises-13"><i class="fa fa-check"></i><b>7.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="7.3" data-path="data-import.html"><a href="data-import.html#parsing-a-vector"><i class="fa fa-check"></i><b>7.3</b> Parsing a vector</a><ul>
<li class="chapter" data-level="7.3.1" data-path="data-import.html"><a href="data-import.html#exercises-14"><i class="fa fa-check"></i><b>7.3.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="7.4" data-path="data-import.html"><a href="data-import.html#other-types-of-data"><i class="fa fa-check"></i><b>7.4</b> Other Types of Data</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="tidy-data.html"><a href="tidy-data.html"><i class="fa fa-check"></i><b>8</b> Tidy Data</a><ul>
<li class="chapter" data-level="8.1" data-path="tidy-data.html"><a href="tidy-data.html#introduction-3"><i class="fa fa-check"></i><b>8.1</b> Introduction</a></li>
<li class="chapter" data-level="8.2" data-path="tidy-data.html"><a href="tidy-data.html#tidy-data-1"><i class="fa fa-check"></i><b>8.2</b> Tidy Data</a><ul>
<li class="chapter" data-level="8.2.1" data-path="tidy-data.html"><a href="tidy-data.html#exercises-15"><i class="fa fa-check"></i><b>8.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8.3" data-path="tidy-data.html"><a href="tidy-data.html#spreading-and-gathering"><i class="fa fa-check"></i><b>8.3</b> Spreading and Gathering</a><ul>
<li class="chapter" data-level="8.3.1" data-path="tidy-data.html"><a href="tidy-data.html#exercises-16"><i class="fa fa-check"></i><b>8.3.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8.4" data-path="tidy-data.html"><a href="tidy-data.html#separating-and-uniting"><i class="fa fa-check"></i><b>8.4</b> Separating and Uniting</a><ul>
<li class="chapter" data-level="8.4.1" data-path="tidy-data.html"><a href="tidy-data.html#exercises-17"><i class="fa fa-check"></i><b>8.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8.5" data-path="tidy-data.html"><a href="tidy-data.html#missing-values-1"><i class="fa fa-check"></i><b>8.5</b> Missing Values</a><ul>
<li class="chapter" data-level="8.5.1" data-path="tidy-data.html"><a href="tidy-data.html#exercises-18"><i class="fa fa-check"></i><b>8.5.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8.6" data-path="tidy-data.html"><a href="tidy-data.html#case-study"><i class="fa fa-check"></i><b>8.6</b> Case Study</a><ul>
<li class="chapter" data-level="8.6.1" data-path="tidy-data.html"><a href="tidy-data.html#exercises-19"><i class="fa fa-check"></i><b>8.6.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="8.7" data-path="tidy-data.html"><a href="tidy-data.html#non-tidy-data"><i class="fa fa-check"></i><b>8.7</b> Non-Tidy Data</a></li>
</ul></li>
<li class="chapter" data-level="9" data-path="relational-data.html"><a href="relational-data.html"><i class="fa fa-check"></i><b>9</b> Relational Data</a><ul>
<li class="chapter" data-level="9.1" data-path="relational-data.html"><a href="relational-data.html#prerequisites-2"><i class="fa fa-check"></i><b>9.1</b> Prerequisites</a></li>
<li class="chapter" data-level="9.2" data-path="relational-data.html"><a href="relational-data.html#nycflights13"><i class="fa fa-check"></i><b>9.2</b> nycflights13</a><ul>
<li class="chapter" data-level="9.2.1" data-path="relational-data.html"><a href="relational-data.html#exercises-20"><i class="fa fa-check"></i><b>9.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="9.3" data-path="relational-data.html"><a href="relational-data.html#keys"><i class="fa fa-check"></i><b>9.3</b> Keys</a></li>
<li class="chapter" data-level="9.4" data-path="relational-data.html"><a href="relational-data.html#mutating-joins"><i class="fa fa-check"></i><b>9.4</b> Mutating Joins</a><ul>
<li class="chapter" data-level="9.4.1" data-path="relational-data.html"><a href="relational-data.html#exercises-21"><i class="fa fa-check"></i><b>9.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="9.5" data-path="relational-data.html"><a href="relational-data.html#filtering-joins"><i class="fa fa-check"></i><b>9.5</b> Filtering Joins</a><ul>
<li class="chapter" data-level="9.5.1" data-path="relational-data.html"><a href="relational-data.html#exercises-22"><i class="fa fa-check"></i><b>9.5.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="9.6" data-path="relational-data.html"><a href="relational-data.html#set-operations"><i class="fa fa-check"></i><b>9.6</b> Set operations</a></li>
</ul></li>
<li class="chapter" data-level="10" data-path="strings.html"><a href="strings.html"><i class="fa fa-check"></i><b>10</b> Strings</a><ul>
<li class="chapter" data-level="10.1" data-path="strings.html"><a href="strings.html#introduction-4"><i class="fa fa-check"></i><b>10.1</b> Introduction</a></li>
<li class="chapter" data-level="10.2" data-path="strings.html"><a href="strings.html#string-basics"><i class="fa fa-check"></i><b>10.2</b> String Basics</a><ul>
<li class="chapter" data-level="10.2.1" data-path="strings.html"><a href="strings.html#exercises-23"><i class="fa fa-check"></i><b>10.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="10.3" data-path="strings.html"><a href="strings.html#matching-patterns-and-regular-expressions"><i class="fa fa-check"></i><b>10.3</b> Matching Patterns and Regular Expressions</a><ul>
<li class="chapter" data-level="10.3.1" data-path="strings.html"><a href="strings.html#exercises-24"><i class="fa fa-check"></i><b>10.3.1</b> Exercises</a></li>
<li class="chapter" data-level="10.3.2" data-path="strings.html"><a href="strings.html#repitition"><i class="fa fa-check"></i><b>10.3.2</b> Repitition</a></li>
<li class="chapter" data-level="10.3.3" data-path="strings.html"><a href="strings.html#grouping-and-backreferences"><i class="fa fa-check"></i><b>10.3.3</b> Grouping and backreferences</a></li>
</ul></li>
<li class="chapter" data-level="10.4" data-path="strings.html"><a href="strings.html#tools"><i class="fa fa-check"></i><b>10.4</b> Tools</a><ul>
<li class="chapter" data-level="10.4.1" data-path="strings.html"><a href="strings.html#detect-matches"><i class="fa fa-check"></i><b>10.4.1</b> Detect matches</a></li>
<li class="chapter" data-level="10.4.2" data-path="strings.html"><a href="strings.html#exercises-29"><i class="fa fa-check"></i><b>10.4.2</b> Exercises</a></li>
<li class="chapter" data-level="10.4.3" data-path="strings.html"><a href="strings.html#extract-matches"><i class="fa fa-check"></i><b>10.4.3</b> Extract Matches</a></li>
<li class="chapter" data-level="10.4.4" data-path="strings.html"><a href="strings.html#grouped-matches"><i class="fa fa-check"></i><b>10.4.4</b> Grouped Matches</a></li>
<li class="chapter" data-level="10.4.5" data-path="strings.html"><a href="strings.html#splitting"><i class="fa fa-check"></i><b>10.4.5</b> Splitting</a></li>
</ul></li>
<li class="chapter" data-level="10.5" data-path="strings.html"><a href="strings.html#other-types-of-patterns"><i class="fa fa-check"></i><b>10.5</b> Other types of patterns</a><ul>
<li class="chapter" data-level="10.5.1" data-path="strings.html"><a href="strings.html#exercises-33"><i class="fa fa-check"></i><b>10.5.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="10.6" data-path="strings.html"><a href="strings.html#stringi"><i class="fa fa-check"></i><b>10.6</b> stringi</a><ul>
<li class="chapter" data-level="10.6.1" data-path="strings.html"><a href="strings.html#exercises-34"><i class="fa fa-check"></i><b>10.6.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="11" data-path="factors.html"><a href="factors.html"><i class="fa fa-check"></i><b>11</b> Factors</a><ul>
<li class="chapter" data-level="11.1" data-path="factors.html"><a href="factors.html#introduction-5"><i class="fa fa-check"></i><b>11.1</b> Introduction</a></li>
<li class="chapter" data-level="11.2" data-path="factors.html"><a href="factors.html#creating-factors"><i class="fa fa-check"></i><b>11.2</b> Creating Factors</a></li>
<li class="chapter" data-level="11.3" data-path="factors.html"><a href="factors.html#general-social-survey"><i class="fa fa-check"></i><b>11.3</b> General Social Survey</a><ul>
<li class="chapter" data-level="11.3.1" data-path="factors.html"><a href="factors.html#exercises-35"><i class="fa fa-check"></i><b>11.3.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="11.4" data-path="factors.html"><a href="factors.html#modifying-factor-order"><i class="fa fa-check"></i><b>11.4</b> Modifying factor order</a><ul>
<li class="chapter" data-level="11.4.1" data-path="factors.html"><a href="factors.html#exercises-36"><i class="fa fa-check"></i><b>11.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="11.5" data-path="factors.html"><a href="factors.html#modifying-factor-levels"><i class="fa fa-check"></i><b>11.5</b> Modifying factor levels</a><ul>
<li class="chapter" data-level="11.5.1" data-path="factors.html"><a href="factors.html#exercises-37"><i class="fa fa-check"></i><b>11.5.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="12" data-path="dates-and-times.html"><a href="dates-and-times.html"><i class="fa fa-check"></i><b>12</b> Dates and Times</a><ul>
<li class="chapter" data-level="12.1" data-path="dates-and-times.html"><a href="dates-and-times.html#prerequisite"><i class="fa fa-check"></i><b>12.1</b> Prerequisite</a></li>
<li class="chapter" data-level="12.2" data-path="dates-and-times.html"><a href="dates-and-times.html#creating-datetimes"><i class="fa fa-check"></i><b>12.2</b> Creating date/times</a><ul>
<li class="chapter" data-level="12.2.1" data-path="dates-and-times.html"><a href="dates-and-times.html#exercises-38"><i class="fa fa-check"></i><b>12.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="12.3" data-path="dates-and-times.html"><a href="dates-and-times.html#date-time-components"><i class="fa fa-check"></i><b>12.3</b> Date-Time Components</a><ul>
<li class="chapter" data-level="12.3.1" data-path="dates-and-times.html"><a href="dates-and-times.html#exercises-39"><i class="fa fa-check"></i><b>12.3.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="12.4" data-path="dates-and-times.html"><a href="dates-and-times.html#time-spans"><i class="fa fa-check"></i><b>12.4</b> Time Spans</a><ul>
<li class="chapter" data-level="12.4.1" data-path="dates-and-times.html"><a href="dates-and-times.html#durations"><i class="fa fa-check"></i><b>12.4.1</b> Durations</a></li>
<li class="chapter" data-level="12.4.2" data-path="dates-and-times.html"><a href="dates-and-times.html#periods"><i class="fa fa-check"></i><b>12.4.2</b> Periods</a></li>
<li class="chapter" data-level="12.4.3" data-path="dates-and-times.html"><a href="dates-and-times.html#intervals"><i class="fa fa-check"></i><b>12.4.3</b> Intervals</a></li>
<li class="chapter" data-level="12.4.4" data-path="dates-and-times.html"><a href="dates-and-times.html#exercises-40"><i class="fa fa-check"></i><b>12.4.4</b> Exercises</a></li>
<li class="chapter" data-level="12.4.5" data-path="dates-and-times.html"><a href="dates-and-times.html#time-zones"><i class="fa fa-check"></i><b>12.4.5</b> Time Zones</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>III Program</b></span></li>
<li class="chapter" data-level="13" data-path="program-intro.html"><a href="program-intro.html"><i class="fa fa-check"></i><b>13</b> Introduction</a></li>
<li class="chapter" data-level="14" data-path="pipes.html"><a href="pipes.html"><i class="fa fa-check"></i><b>14</b> Pipes</a></li>
<li class="chapter" data-level="15" data-path="vectors.html"><a href="vectors.html"><i class="fa fa-check"></i><b>15</b> Vectors</a><ul>
<li class="chapter" data-level="15.1" data-path="vectors.html"><a href="vectors.html#introduction-6"><i class="fa fa-check"></i><b>15.1</b> Introduction</a></li>
<li class="chapter" data-level="15.2" data-path="vectors.html"><a href="vectors.html#important-types-of-atomic-vector"><i class="fa fa-check"></i><b>15.2</b> Important types of Atomic Vector</a><ul>
<li class="chapter" data-level="15.2.1" data-path="vectors.html"><a href="vectors.html#exercises-41"><i class="fa fa-check"></i><b>15.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="15.3" data-path="vectors.html"><a href="vectors.html#using-atomic-vectors"><i class="fa fa-check"></i><b>15.3</b> Using atomic vectors</a></li>
<li class="chapter" data-level="15.4" data-path="vectors.html"><a href="vectors.html#recursive-vectors-lists"><i class="fa fa-check"></i><b>15.4</b> Recursive Vectors (lists)</a><ul>
<li class="chapter" data-level="15.4.1" data-path="vectors.html"><a href="vectors.html#exercises-42"><i class="fa fa-check"></i><b>15.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="15.5" data-path="vectors.html"><a href="vectors.html#augmented-vectors"><i class="fa fa-check"></i><b>15.5</b> Augmented Vectors</a><ul>
<li class="chapter" data-level="15.5.1" data-path="vectors.html"><a href="vectors.html#exercises-43"><i class="fa fa-check"></i><b>15.5.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="16" data-path="iteration.html"><a href="iteration.html"><i class="fa fa-check"></i><b>16</b> Iteration</a><ul>
<li class="chapter" data-level="16.1" data-path="iteration.html"><a href="iteration.html#introduction-7"><i class="fa fa-check"></i><b>16.1</b> Introduction</a></li>
<li class="chapter" data-level="16.2" data-path="iteration.html"><a href="iteration.html#for-loops"><i class="fa fa-check"></i><b>16.2</b> For Loops</a><ul>
<li class="chapter" data-level="16.2.1" data-path="iteration.html"><a href="iteration.html#exercises-44"><i class="fa fa-check"></i><b>16.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="16.3" data-path="iteration.html"><a href="iteration.html#for-loop-variations"><i class="fa fa-check"></i><b>16.3</b> For loop variations</a><ul>
<li class="chapter" data-level="16.3.1" data-path="iteration.html"><a href="iteration.html#section"><i class="fa fa-check"></i><b>16.3.1</b> </a></li>
</ul></li>
<li class="chapter" data-level="16.4" data-path="iteration.html"><a href="iteration.html#for-loops-vs.functionals"><i class="fa fa-check"></i><b>16.4</b> For loops vs. functionals</a><ul>
<li class="chapter" data-level="16.4.1" data-path="iteration.html"><a href="iteration.html#exercises-45"><i class="fa fa-check"></i><b>16.4.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="16.5" data-path="iteration.html"><a href="iteration.html#the-map-functions"><i class="fa fa-check"></i><b>16.5</b> The map functions</a><ul>
<li class="chapter" data-level="16.5.1" data-path="iteration.html"><a href="iteration.html#shortcuts"><i class="fa fa-check"></i><b>16.5.1</b> Shortcuts</a></li>
<li class="chapter" data-level="16.5.2" data-path="iteration.html"><a href="iteration.html#exercises-46"><i class="fa fa-check"></i><b>16.5.2</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="16.6" data-path="iteration.html"><a href="iteration.html#dealing-with-failure"><i class="fa fa-check"></i><b>16.6</b> Dealing with Failure</a></li>
<li class="chapter" data-level="16.7" data-path="iteration.html"><a href="iteration.html#mapping-over-multiple-arguments"><i class="fa fa-check"></i><b>16.7</b> Mapping over multiple arguments</a></li>
<li class="chapter" data-level="16.8" data-path="iteration.html"><a href="iteration.html#walk"><i class="fa fa-check"></i><b>16.8</b> Walk</a></li>
<li class="chapter" data-level="16.9" data-path="iteration.html"><a href="iteration.html#other-patterns-of-for-loops"><i class="fa fa-check"></i><b>16.9</b> Other patterns of for loops</a><ul>
<li class="chapter" data-level="16.9.1" data-path="iteration.html"><a href="iteration.html#exercises-47"><i class="fa fa-check"></i><b>16.9.1</b> Exercises</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>IV Model</b></span></li>
<li class="chapter" data-level="17" data-path="model-intro.html"><a href="model-intro.html"><i class="fa fa-check"></i><b>17</b> Introduction</a></li>
<li class="chapter" data-level="18" data-path="model-basics.html"><a href="model-basics.html"><i class="fa fa-check"></i><b>18</b> Model Basics</a><ul>
<li class="chapter" data-level="18.1" data-path="model-basics.html"><a href="model-basics.html#prerequisites-3"><i class="fa fa-check"></i><b>18.1</b> Prerequisites</a></li>
<li class="chapter" data-level="18.2" data-path="model-basics.html"><a href="model-basics.html#a-simple-model"><i class="fa fa-check"></i><b>18.2</b> A simple model</a><ul>
<li class="chapter" data-level="18.2.1" data-path="model-basics.html"><a href="model-basics.html#exercises-48"><i class="fa fa-check"></i><b>18.2.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="18.3" data-path="model-basics.html"><a href="model-basics.html#visualizing-models"><i class="fa fa-check"></i><b>18.3</b> Visualizing Models</a><ul>
<li class="chapter" data-level="18.3.1" data-path="model-basics.html"><a href="model-basics.html#exercises-49"><i class="fa fa-check"></i><b>18.3.1</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="18.4" data-path="model-basics.html"><a href="model-basics.html#formulas-and-model-families"><i class="fa fa-check"></i><b>18.4</b> Formulas and Model Families</a><ul>
<li class="chapter" data-level="18.4.1" data-path="model-basics.html"><a href="model-basics.html#categorical-variables"><i class="fa fa-check"></i><b>18.4.1</b> Categorical Variables</a></li>
<li class="chapter" data-level="18.4.2" data-path="model-basics.html"><a href="model-basics.html#exercises-50"><i class="fa fa-check"></i><b>18.4.2</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="18.5" data-path="model-basics.html"><a href="model-basics.html#missing-values-2"><i class="fa fa-check"></i><b>18.5</b> Missing values</a></li>
<li class="chapter" data-level="18.6" data-path="model-basics.html"><a href="model-basics.html#other-model-families"><i class="fa fa-check"></i><b>18.6</b> Other model families</a></li>
</ul></li>
<li class="part"><span><b>V Communicate</b></span></li>
<li class="chapter" data-level="19" data-path="communicate-intro.html"><a href="communicate-intro.html"><i class="fa fa-check"></i><b>19</b> Introduction</a></li>
<li class="chapter" data-level="20" data-path="r-markdown.html"><a href="r-markdown.html"><i class="fa fa-check"></i><b>20</b> R Markdown</a><ul>
<li class="chapter" data-level="20.1" data-path="r-markdown.html"><a href="r-markdown.html#r-markdown-basics"><i class="fa fa-check"></i><b>20.1</b> R Markdown Basics</a><ul>
<li class="chapter" data-level="20.1.1" data-path="r-markdown.html"><a href="r-markdown.html#exercise"><i class="fa fa-check"></i><b>20.1.1</b> Exercise</a></li>
</ul></li>
<li class="chapter" data-level="20.2" data-path="r-markdown.html"><a href="r-markdown.html#text-formatting-with-r-markdown"><i class="fa fa-check"></i><b>20.2</b> Text formatting with R Markdown</a></li>
</ul></li>
<li class="chapter" data-level="21" data-path="r-markdown-formats.html"><a href="r-markdown-formats.html"><i class="fa fa-check"></i><b>21</b> R Markdown Formats</a></li>
<li class="chapter" data-level="22" data-path="r-markdown-workflow.html"><a href="r-markdown-workflow.html"><i class="fa fa-check"></i><b>22</b> R Markdown Workflow</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Exercise Solutions and Notes for “R for Data Science”</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="data-transformation" class="section level1">
<h1><span class="header-section-number">4</span> Data Transformation</h1>
<div id="prerequisites-1" class="section level2">
<h2><span class="header-section-number">4.1</span> Prerequisites</h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
<span class="kw">library</span>(tidyverse)</code></pre></div>
</div>
<div id="filter" class="section level2">
<h2><span class="header-section-number">4.2</span> Filter</h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(flights)
<span class="co">#> Observations: 336,776</span>
<span class="co">#> Variables: 19</span>
<span class="co">#> $ year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,...</span>
<span class="co">#> $ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...</span>
<span class="co">#> $ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...</span>
<span class="co">#> $ dep_time <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 55...</span>
<span class="co">#> $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 60...</span>
<span class="co">#> $ dep_delay <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2...</span>
<span class="co">#> $ arr_time <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 7...</span>
<span class="co">#> $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 7...</span>
<span class="co">#> $ arr_delay <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -...</span>
<span class="co">#> $ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV",...</span>
<span class="co">#> $ flight <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79...</span>
<span class="co">#> $ tailnum <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN...</span>
<span class="co">#> $ origin <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR"...</span>
<span class="co">#> $ dest <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL"...</span>
<span class="co">#> $ air_time <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138...</span>
<span class="co">#> $ distance <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 94...</span>
<span class="co">#> $ hour <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5,...</span>
<span class="co">#> $ minute <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, ...</span>
<span class="co">#> $ time_hour <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013...</span></code></pre></div>
</div>
<div id="exercises-5" class="section level2">
<h2><span class="header-section-number">4.3</span> Exercises</h2>
<ol style="list-style-type: decimal">
<li><p>Find all flights that</p></li>
<li>Had an arrival delay of two or more hours</li>
<li>Flew to Houston (IAH or HOU)</li>
<li>Were operated by United, American, or Delta</li>
<li>Departed in summer (July, August, and September)</li>
<li>Arrived more than two hours late, but didn’t leave late</li>
<li>Were delayed by at least an hour, but made up over 30 minutes in flight</li>
<li><p>Departed between midnight and 6am (inclusive)</p></li>
</ol>
<p><em>Had an arrival delay of two or more hours</em> Since delay is in minutes, we are looking for flights where <code>arr_delay > 120</code>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights %>%<span class="st"> </span>
<span class="st"> </span><span class="kw">filter</span>(arr_delay ><span class="st"> </span><span class="dv">120</span>)
<span class="co">#> # A tibble: 10,034 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 811 630 101 1047</span>
<span class="co">#> 2 2013 1 1 848 1835 853 1001</span>
<span class="co">#> 3 2013 1 1 957 733 144 1056</span>
<span class="co">#> 4 2013 1 1 1114 900 134 1447</span>
<span class="co">#> 5 2013 1 1 1505 1310 115 1638</span>
<span class="co">#> 6 2013 1 1 1525 1340 105 1831</span>
<span class="co">#> # ... with 1.003e+04 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Flew to Houston (IAH or HOU)</em>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights %>%
<span class="st"> </span><span class="kw">filter</span>(dest %in%<span class="st"> </span><span class="kw">c</span>(<span class="st">"IAH"</span>, <span class="st">"HOU"</span>))
<span class="co">#> # A tibble: 9,313 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 517 515 2 830</span>
<span class="co">#> 2 2013 1 1 533 529 4 850</span>
<span class="co">#> 3 2013 1 1 623 627 -4 933</span>
<span class="co">#> 4 2013 1 1 728 732 -4 1041</span>
<span class="co">#> 5 2013 1 1 739 739 0 1104</span>
<span class="co">#> 6 2013 1 1 908 908 0 1228</span>
<span class="co">#> # ... with 9,307 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Were operated by United, American, or Delta</em> The variable <code>carrier</code> has the airline: but it is in two-digit carrier codes. However, we can look it up in the <code>airlines</code> dataset.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">airlines
<span class="co">#> # A tibble: 16 × 2</span>
<span class="co">#> carrier name</span>
<span class="co">#> <chr> <chr></span>
<span class="co">#> 1 9E Endeavor Air Inc.</span>
<span class="co">#> 2 AA American Airlines Inc.</span>
<span class="co">#> 3 AS Alaska Airlines Inc.</span>
<span class="co">#> 4 B6 JetBlue Airways</span>
<span class="co">#> 5 DL Delta Air Lines Inc.</span>
<span class="co">#> 6 EV ExpressJet Airlines Inc.</span>
<span class="co">#> # ... with 10 more rows</span></code></pre></div>
<p>Since there are only 16 rows, its not even worth filtering. Delta is <code>DL</code>, American is <code>AA</code>, and United is <code>UA</code>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, carrier %in%<span class="st"> </span><span class="kw">c</span>(<span class="st">"AA"</span>, <span class="st">"DL"</span>, <span class="st">"UA"</span>))
<span class="co">#> # A tibble: 139,504 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 517 515 2 830</span>
<span class="co">#> 2 2013 1 1 533 529 4 850</span>
<span class="co">#> 3 2013 1 1 542 540 2 923</span>
<span class="co">#> 4 2013 1 1 554 600 -6 812</span>
<span class="co">#> 5 2013 1 1 554 558 -4 740</span>
<span class="co">#> 6 2013 1 1 558 600 -2 753</span>
<span class="co">#> # ... with 1.395e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Departed in summer (July, August, and September)</em> The variable <code>month</code> has the month, and it is numeric.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, <span class="kw">between</span>(month, <span class="dv">7</span>, <span class="dv">9</span>))
<span class="co">#> # A tibble: 86,326 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 7 1 1 2029 212 236</span>
<span class="co">#> 2 2013 7 1 2 2359 3 344</span>
<span class="co">#> 3 2013 7 1 29 2245 104 151</span>
<span class="co">#> 4 2013 7 1 43 2130 193 322</span>
<span class="co">#> 5 2013 7 1 44 2150 174 300</span>
<span class="co">#> 6 2013 7 1 46 2051 235 304</span>
<span class="co">#> # ... with 8.632e+04 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Arrived more than two hours late, but didn’t leave late</em></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, !<span class="kw">is.na</span>(dep_delay), dep_delay <=<span class="st"> </span><span class="dv">0</span>, arr_delay ><span class="st"> </span><span class="dv">120</span>)
<span class="co">#> # A tibble: 29 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 27 1419 1420 -1 1754</span>
<span class="co">#> 2 2013 10 7 1350 1350 0 1736</span>
<span class="co">#> 3 2013 10 7 1357 1359 -2 1858</span>
<span class="co">#> 4 2013 10 16 657 700 -3 1258</span>
<span class="co">#> 5 2013 11 1 658 700 -2 1329</span>
<span class="co">#> 6 2013 3 18 1844 1847 -3 39</span>
<span class="co">#> # ... with 23 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Were delayed by at least an hour, but made up over 30 minutes in flight</em></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, !<span class="kw">is.na</span>(dep_delay), dep_delay >=<span class="st"> </span><span class="dv">60</span>, arr_delay <<span class="st"> </span><span class="dv">30</span>)
<span class="co">#> # A tibble: 206 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 3 1850 1745 65 2148</span>
<span class="co">#> 2 2013 1 3 1950 1845 65 2228</span>
<span class="co">#> 3 2013 1 3 2015 1915 60 2135</span>
<span class="co">#> 4 2013 1 6 1019 900 79 1558</span>
<span class="co">#> 5 2013 1 7 1543 1430 73 1758</span>
<span class="co">#> 6 2013 1 11 1020 920 60 1311</span>
<span class="co">#> # ... with 200 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<p><em>Departed between midnight and 6am (inclusive)</em>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, dep_time >=<span class="st"> </span><span class="dv">0</span>, dep_time <=<span class="st"> </span><span class="dv">600</span>)
<span class="co">#> # A tibble: 9,344 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 517 515 2 830</span>
<span class="co">#> 2 2013 1 1 533 529 4 850</span>
<span class="co">#> 3 2013 1 1 542 540 2 923</span>
<span class="co">#> 4 2013 1 1 544 545 -1 1004</span>
<span class="co">#> 5 2013 1 1 554 600 -6 812</span>
<span class="co">#> 6 2013 1 1 554 558 -4 740</span>
<span class="co">#> # ... with 9,338 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<p>or using <code>between</code> (see next question)</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, <span class="kw">between</span>(dep_time, <span class="dv">0</span>, <span class="dv">600</span>))
<span class="co">#> # A tibble: 9,344 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 517 515 2 830</span>
<span class="co">#> 2 2013 1 1 533 529 4 850</span>
<span class="co">#> 3 2013 1 1 542 540 2 923</span>
<span class="co">#> 4 2013 1 1 544 545 -1 1004</span>
<span class="co">#> 5 2013 1 1 554 600 -6 812</span>
<span class="co">#> 6 2013 1 1 554 558 -4 740</span>
<span class="co">#> # ... with 9,338 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<ol start="2" style="list-style-type: decimal">
<li>Another useful dplyr filtering helper is <code>between()</code>. What does it do? Can you use it to simplify the code needed to answer the previous challenges?</li>
</ol>
<p><code>between(x, left, right)</code> is equivalent to <code>x >= left & x <= right</code>. I already used it in 1.4.</p>
<ol start="3" style="list-style-type: decimal">
<li>How many flights have a missing <code>dep_time</code>? What other variables are missing? What might these rows represent?</li>
</ol>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, <span class="kw">is.na</span>(dep_time))
<span class="co">#> # A tibble: 8,255 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 NA 1630 NA NA</span>
<span class="co">#> 2 2013 1 1 NA 1935 NA NA</span>
<span class="co">#> 3 2013 1 1 NA 1500 NA NA</span>
<span class="co">#> 4 2013 1 1 NA 600 NA NA</span>
<span class="co">#> 5 2013 1 2 NA 1540 NA NA</span>
<span class="co">#> 6 2013 1 2 NA 1620 NA NA</span>
<span class="co">#> # ... with 8,249 more rows, and 12 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm></span></code></pre></div>
<p>Since <code>arr_time</code> is also missing, these are cancelled flights.</p>
<ol start="4" style="list-style-type: decimal">
<li>Why is <code>NA ^ 0</code> not missing? Why is <code>NA | TRUE</code> not missing? Why is <code>FALSE & NA</code> not missing? Can you figure out the general rule? (<code>NA * 0</code> is a tricky counterexample!)</li>
</ol>
<p><code>NA ^ 0 == 1</code> since for all numeric values <span class="math inline">\(x ^ 0 = 1\)</span>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="ot">NA</span> ^<span class="st"> </span><span class="dv">0</span>
<span class="co">#> [1] 1</span></code></pre></div>
<p><code>NA | TRUE</code> is <code>TRUE</code> because the it doesn’t matter whether the missing value is <code>TRUE</code> or <code>FALSE</code>, <code>x \lor T = T</code> for all values of <code>x</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="ot">NA</span> |<span class="st"> </span><span class="ot">TRUE</span>
<span class="co">#> [1] TRUE</span></code></pre></div>
<p>Likewise, anything and <code>FALSE</code> is always <code>FALSE</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="ot">NA</span> &<span class="st"> </span><span class="ot">FALSE</span>
<span class="co">#> [1] FALSE</span></code></pre></div>
<p>Because the value of the missing element matters in <code>NA | FALSE</code> and <code>NA & TRUE</code>, these are missing:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="ot">NA</span> |<span class="st"> </span><span class="ot">FALSE</span>
<span class="co">#> [1] NA</span>
<span class="ot">NA</span> &<span class="st"> </span><span class="ot">TRUE</span>
<span class="co">#> [1] NA</span></code></pre></div>
<p>wut? Since <code>x * 0 = 0</code> for all <span class="math inline">\(x\)</span> we might expect <code>NA * 0 = 0</code>, but that’s not the case.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="ot">NA</span> *<span class="st"> </span><span class="dv">0</span>
<span class="co">#> [1] NA</span></code></pre></div>
</div>
<div id="arrange" class="section level2">
<h2><span class="header-section-number">4.4</span> Arrange</h2>
<p>missing values always at the end.</p>
<div id="exercises-6" class="section level3">
<h3><span class="header-section-number">4.4.1</span> Exercises</h3>
<ol style="list-style-type: decimal">
<li>How could you use <code>arrange()</code> to sort all missing values to the start? (Hint: use <code>is.na()</code>).</li>
</ol>
<p>This sorts by increasing <code>dep_time</code>, but with all missing values put first.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, <span class="kw">desc</span>(<span class="kw">is.na</span>(dep_time)), dep_time)
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 NA 1630 NA NA</span>
<span class="co">#> 2 2013 1 1 NA 1935 NA NA</span>
<span class="co">#> 3 2013 1 1 NA 1500 NA NA</span>
<span class="co">#> 4 2013 1 1 NA 600 NA NA</span>
<span class="co">#> 5 2013 1 2 NA 1540 NA NA</span>
<span class="co">#> 6 2013 1 2 NA 1620 NA NA</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<ol start="2" style="list-style-type: decimal">
<li>Sort flights to find the most delayed flights. Find the flights that left earliest.</li>
</ol>
<p>The most delayed flights are found by sorting by <code>dep_delay</code> in descending order.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, <span class="kw">desc</span>(dep_delay))
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 9 641 900 1301 1242</span>
<span class="co">#> 2 2013 6 15 1432 1935 1137 1607</span>
<span class="co">#> 3 2013 1 10 1121 1635 1126 1239</span>
<span class="co">#> 4 2013 9 20 1139 1845 1014 1457</span>
<span class="co">#> 5 2013 7 22 845 1600 1005 1044</span>
<span class="co">#> 6 2013 4 10 1100 1900 960 1342</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<p>If we sort <code>dep_delay</code> in ascending order, we get those that left earliest. There was a flight that left 43 minutes early.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, dep_delay)
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 12 7 2040 2123 -43 40</span>
<span class="co">#> 2 2013 2 3 2022 2055 -33 2240</span>
<span class="co">#> 3 2013 11 10 1408 1440 -32 1549</span>
<span class="co">#> 4 2013 1 11 1900 1930 -30 2233</span>
<span class="co">#> 5 2013 1 29 1703 1730 -27 1947</span>
<span class="co">#> 6 2013 8 9 729 755 -26 1002</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<ol start="3" style="list-style-type: decimal">
<li>Sort flights to find the fastest flights.</li>
</ol>
<p>I assume that by by “fastest flights” it means the flights with the minimum air time. So I sort by <code>air_time</code>. The fastest flights. The fastest flights area couple of flights between EWR and BDL with an air time of 20 minutes.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, air_time)
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 16 1355 1315 40 1442</span>
<span class="co">#> 2 2013 4 13 537 527 10 622</span>
<span class="co">#> 3 2013 12 6 922 851 31 1021</span>
<span class="co">#> 4 2013 2 3 2153 2129 24 2247</span>
<span class="co">#> 5 2013 2 5 1303 1315 -12 1342</span>
<span class="co">#> 6 2013 2 12 2123 2130 -7 2211</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<ol start="4" style="list-style-type: decimal">
<li>Which flights travelled the longest? Which travelled the shortest?</li>
</ol>
<p>I’ll assume hat travelled the longest or shortest refers to distance, rather than air-time.</p>
<p>The longest flights are the Hawaii Air (HA 51) between JFK and HNL (Honolulu) at 4,983 miles.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, <span class="kw">desc</span>(distance))
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 1 857 900 -3 1516</span>
<span class="co">#> 2 2013 1 2 909 900 9 1525</span>
<span class="co">#> 3 2013 1 3 914 900 14 1504</span>
<span class="co">#> 4 2013 1 4 900 900 0 1516</span>
<span class="co">#> 5 2013 1 5 858 900 -2 1519</span>
<span class="co">#> 6 2013 1 6 1019 900 79 1558</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<p>Apart from an EWR to LGA flight that was cancelled, the shortest flights are the Envoy Air Flights between EWR and PHL at 80 miles.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">arrange</span>(flights, distance)
<span class="co">#> # A tibble: 336,776 × 19</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 7 27 NA 106 NA NA</span>
<span class="co">#> 2 2013 1 3 2127 2129 -2 2222</span>
<span class="co">#> 3 2013 1 4 1240 1200 40 1333</span>
<span class="co">#> 4 2013 1 4 1829 1615 134 1937</span>
<span class="co">#> 5 2013 1 4 2128 2129 -1 2218</span>
<span class="co">#> 6 2013 1 5 1155 1200 -5 1241</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 12 more variables:</span>
<span class="co">#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,</span>
<span class="co">#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,</span>
<span class="co">#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm></span></code></pre></div>
<ol style="list-style-type: decimal">
<li>Brainstorm as many ways as possible to select <code>dep_time</code>, <code>dep_delay</code>, <code>arr_time</code>, and <code>arr_delay</code> from flights.</li>
</ol>
<p>A few ways include:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">select</span>(flights, dep_time, dep_delay, arr_time, arr_delay)
<span class="co">#> # A tibble: 336,776 × 4</span>
<span class="co">#> dep_time dep_delay arr_time arr_delay</span>
<span class="co">#> <int> <dbl> <int> <dbl></span>
<span class="co">#> 1 517 2 830 11</span>
<span class="co">#> 2 533 4 850 20</span>
<span class="co">#> 3 542 2 923 33</span>
<span class="co">#> 4 544 -1 1004 -18</span>
<span class="co">#> 5 554 -6 812 -25</span>
<span class="co">#> 6 554 -4 740 12</span>
<span class="co">#> # ... with 3.368e+05 more rows</span>
<span class="kw">select</span>(flights, <span class="kw">starts_with</span>(<span class="st">"dep_"</span>), <span class="kw">starts_with</span>(<span class="st">"arr_"</span>))
<span class="co">#> # A tibble: 336,776 × 4</span>
<span class="co">#> dep_time dep_delay arr_time arr_delay</span>
<span class="co">#> <int> <dbl> <int> <dbl></span>
<span class="co">#> 1 517 2 830 11</span>
<span class="co">#> 2 533 4 850 20</span>
<span class="co">#> 3 542 2 923 33</span>
<span class="co">#> 4 544 -1 1004 -18</span>
<span class="co">#> 5 554 -6 812 -25</span>
<span class="co">#> 6 554 -4 740 12</span>
<span class="co">#> # ... with 3.368e+05 more rows</span>
<span class="kw">select</span>(flights, <span class="kw">matches</span>(<span class="st">"^(dep|arr)_(time|delay)$"</span>))
<span class="co">#> # A tibble: 336,776 × 4</span>
<span class="co">#> dep_time dep_delay arr_time arr_delay</span>
<span class="co">#> <int> <dbl> <int> <dbl></span>
<span class="co">#> 1 517 2 830 11</span>
<span class="co">#> 2 533 4 850 20</span>
<span class="co">#> 3 542 2 923 33</span>
<span class="co">#> 4 544 -1 1004 -18</span>
<span class="co">#> 5 554 -6 812 -25</span>
<span class="co">#> 6 554 -4 740 12</span>
<span class="co">#> # ... with 3.368e+05 more rows</span></code></pre></div>
<p>using <code>ends_with()</code> doesn’t work well since it would bget <code>sched_arr_time</code> and <code>sched_dep_time</code>.</p>
<ol start="2" style="list-style-type: decimal">
<li>What happens if you include the name of a variable multiple times in a select() call?</li>
</ol>
<p>It ignores the duplicates, and that variable is only included once. No error, warning, or message is emited.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">select</span>(flights, year, month, day, year, year)
<span class="co">#> # A tibble: 336,776 × 3</span>
<span class="co">#> year month day</span>
<span class="co">#> <int> <int> <int></span>
<span class="co">#> 1 2013 1 1</span>
<span class="co">#> 2 2013 1 1</span>
<span class="co">#> 3 2013 1 1</span>
<span class="co">#> 4 2013 1 1</span>
<span class="co">#> 5 2013 1 1</span>
<span class="co">#> 6 2013 1 1</span>
<span class="co">#> # ... with 3.368e+05 more rows</span></code></pre></div>
<ol start="3" style="list-style-type: decimal">
<li>What does the <code>one_of()</code> function do? Why might it be helpful in conjunction with this vector?</li>
</ol>
<p>The <code>one_of</code> vector allows you to select variables with a character vector rather than as unquoted variable names. It’s useful because then you can easily pass vectors to <code>select()</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">vars <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"year"</span>, <span class="st">"month"</span>, <span class="st">"day"</span>, <span class="st">"dep_delay"</span>, <span class="st">"arr_delay"</span>)
<span class="kw">select</span>(flights, <span class="kw">one_of</span>(vars))
<span class="co">#> # A tibble: 336,776 × 5</span>
<span class="co">#> year month day dep_delay arr_delay</span>
<span class="co">#> <int> <int> <int> <dbl> <dbl></span>
<span class="co">#> 1 2013 1 1 2 11</span>
<span class="co">#> 2 2013 1 1 4 20</span>
<span class="co">#> 3 2013 1 1 2 33</span>
<span class="co">#> 4 2013 1 1 -1 -18</span>
<span class="co">#> 5 2013 1 1 -6 -25</span>
<span class="co">#> 6 2013 1 1 -4 12</span>
<span class="co">#> # ... with 3.368e+05 more rows</span></code></pre></div>
<ol start="4" style="list-style-type: decimal">
<li>Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?</li>
</ol>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">select</span>(flights, <span class="kw">contains</span>(<span class="st">"TIME"</span>))
<span class="co">#> # A tibble: 336,776 × 6</span>
<span class="co">#> dep_time sched_dep_time arr_time sched_arr_time air_time</span>
<span class="co">#> <int> <int> <int> <int> <dbl></span>
<span class="co">#> 1 517 515 830 819 227</span>
<span class="co">#> 2 533 529 850 830 227</span>
<span class="co">#> 3 542 540 923 850 160</span>
<span class="co">#> 4 544 545 1004 1022 183</span>
<span class="co">#> 5 554 600 812 837 116</span>
<span class="co">#> 6 554 558 740 728 150</span>
<span class="co">#> # ... with 3.368e+05 more rows, and 1 more variables: time_hour <dttm></span></code></pre></div>
<p>The default behavior for contains is to ignore case. Yes, it surprises me. Upon reflection, I realized that this is likely the default behavior because <code>dplyr</code> is designed to deal with a variety of data backends, and some database engines don’t differentiate case.</p>
<p>To change the behavior add the argument <code>ignore.case = FALSE</code>. Now no variables are selected.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">select</span>(flights, <span class="kw">contains</span>(<span class="st">"TIME"</span>, <span class="dt">ignore.case =</span> <span class="ot">FALSE</span>))
<span class="co">#> # A tibble: 336,776 × 0</span></code></pre></div>
</div>
</div>
<div id="mutate" class="section level2">
<h2><span class="header-section-number">4.5</span> Mutate</h2>
<div id="exercises-7" class="section level3">
<h3><span class="header-section-number">4.5.1</span> Exercises</h3>
<ol style="list-style-type: decimal">
<li>Currently <code>dep_time</code> and <code>sched_dep_time</code> are convenient to look at, but hard to compute with because they’re not really continuous numbers. Convert them to a more convenient representation of number of minutes since midnight.</li>
</ol>
<p>To get the departure times in the number of minutes, (integer) divide <code>dep_time</code> by 100 to get the hours since midnight and muliply by 60 and add the remainder of <code>dep_time</code> divided by 100.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">mutate</span>(flights,
<span class="dt">dep_time_mins =</span> dep_time %/%<span class="st"> </span><span class="dv">100</span> *<span class="st"> </span><span class="dv">60</span> +<span class="st"> </span>dep_time %%<span class="st"> </span><span class="dv">100</span>,
<span class="dt">sched_dep_time_mins =</span> sched_dep_time %/%<span class="st"> </span><span class="dv">100</span> *<span class="st"> </span><span class="dv">60</span> +<span class="st"> </span>sched_dep_time %%<span class="st"> </span><span class="dv">100</span>) %>%
<span class="st"> </span><span class="kw">select</span>(dep_time, dep_time_mins, sched_dep_time, sched_dep_time_mins)
<span class="co">#> # A tibble: 336,776 × 4</span>
<span class="co">#> dep_time dep_time_mins sched_dep_time sched_dep_time_mins</span>
<span class="co">#> <int> <dbl> <int> <dbl></span>
<span class="co">#> 1 517 317 515 315</span>
<span class="co">#> 2 533 333 529 329</span>
<span class="co">#> 3 542 342 540 340</span>
<span class="co">#> 4 544 344 545 345</span>
<span class="co">#> 5 554 354 600 360</span>
<span class="co">#> 6 554 354 558 358</span>
<span class="co">#> # ... with 3.368e+05 more rows</span></code></pre></div>
<p>This would be more cleanly done by first definining a funciton and reusing that:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">time2mins <-<span class="st"> </span>function(x) {
x %/%<span class="st"> </span><span class="dv">100</span> *<span class="st"> </span><span class="dv">60</span> +<span class="st"> </span>x %%<span class="st"> </span><span class="dv">100</span>
}
<span class="kw">mutate</span>(flights,
<span class="dt">dep_time_mins =</span> <span class="kw">time2mins</span>(dep_time),
<span class="dt">sched_dep_time_mins =</span> <span class="kw">time2mins</span>(sched_dep_time)) %>%
<span class="st"> </span><span class="kw">select</span>(dep_time, dep_time_mins, sched_dep_time, sched_dep_time_mins)
<span class="co">#> # A tibble: 336,776 × 4</span>
<span class="co">#> dep_time dep_time_mins sched_dep_time sched_dep_time_mins</span>
<span class="co">#> <int> <dbl> <int> <dbl></span>
<span class="co">#> 1 517 317 515 315</span>
<span class="co">#> 2 533 333 529 329</span>
<span class="co">#> 3 542 342 540 340</span>
<span class="co">#> 4 544 344 545 345</span>
<span class="co">#> 5 554 354 600 360</span>
<span class="co">#> 6 554 354 558 358</span>
<span class="co">#> # ... with 3.368e+05 more rows</span></code></pre></div>
<ol start="2" style="list-style-type: decimal">
<li>Compare <code>air_time</code> with <code>arr_time - dep_time</code>. What do you expect to see? What do you see? What do you need to do to fix it?</li>
</ol>
<p>Since <code>arr_time</code> and <code>dep_time</code> may be in different time zones, the <code>air_time</code> doesn’t equal the difference. We would need to account for time-zones in these calculations.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">mutate</span>(flights,
<span class="dt">air_time2 =</span> arr_time -<span class="st"> </span>dep_time,
<span class="dt">air_time_diff =</span> air_time2 -<span class="st"> </span>air_time) %>%
<span class="st"> </span><span class="kw">filter</span>(air_time_diff !=<span class="st"> </span><span class="dv">0</span>) %>%
<span class="st"> </span><span class="kw">select</span>(air_time, air_time2, dep_time, arr_time, dest)
<span class="co">#> # A tibble: 326,128 × 5</span>
<span class="co">#> air_time air_time2 dep_time arr_time dest</span>
<span class="co">#> <dbl> <int> <int> <int> <chr></span>
<span class="co">#> 1 227 313 517 830 IAH</span>
<span class="co">#> 2 227 317 533 850 IAH</span>
<span class="co">#> 3 160 381 542 923 MIA</span>
<span class="co">#> 4 183 460 544 1004 BQN</span>
<span class="co">#> 5 116 258 554 812 ATL</span>
<span class="co">#> 6 150 186 554 740 ORD</span>
<span class="co">#> # ... with 3.261e+05 more rows</span></code></pre></div>
<ol start="3" style="list-style-type: decimal">
<li>Compare <code>dep_time</code>, <code>sched_dep_time</code>, and <code>dep_delay</code>. How would you expect those three numbers to be related?</li>
</ol>
<p>I’d expect <code>dep_time</code>, <code>sched_dep_time</code>, and <code>dep_delay</code> to be related so that <code>dep_time - sched_dep_time = dep_delay</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">mutate</span>(flights,
<span class="dt">dep_delay2 =</span> dep_time -<span class="st"> </span>sched_dep_time) %>%
<span class="st"> </span><span class="kw">filter</span>(dep_delay2 !=<span class="st"> </span>dep_delay) %>%
<span class="st"> </span><span class="kw">select</span>(dep_time, sched_dep_time, dep_delay, dep_delay2)
<span class="co">#> # A tibble: 99,777 × 4</span>
<span class="co">#> dep_time sched_dep_time dep_delay dep_delay2</span>
<span class="co">#> <int> <int> <dbl> <int></span>
<span class="co">#> 1 554 600 -6 -46</span>
<span class="co">#> 2 555 600 -5 -45</span>
<span class="co">#> 3 557 600 -3 -43</span>
<span class="co">#> 4 557 600 -3 -43</span>
<span class="co">#> 5 558 600 -2 -42</span>
<span class="co">#> 6 558 600 -2 -42</span>
<span class="co">#> # ... with 9.977e+04 more rows</span></code></pre></div>
<p>Oops, I forgot to convert to minutes. I’ll reuse the <code>time2mins</code> function I wrote earlier.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">mutate</span>(flights,
<span class="dt">dep_delay2 =</span> <span class="kw">time2mins</span>(dep_time) -<span class="st"> </span><span class="kw">time2mins</span>(sched_dep_time)) %>%
<span class="st"> </span><span class="kw">filter</span>(dep_delay2 !=<span class="st"> </span>dep_delay) %>%
<span class="st"> </span><span class="kw">select</span>(dep_time, sched_dep_time, dep_delay, dep_delay2)
<span class="co">#> # A tibble: 1,207 × 4</span>
<span class="co">#> dep_time sched_dep_time dep_delay dep_delay2</span>
<span class="co">#> <int> <int> <dbl> <dbl></span>
<span class="co">#> 1 848 1835 853 -587</span>
<span class="co">#> 2 42 2359 43 -1397</span>
<span class="co">#> 3 126 2250 156 -1284</span>
<span class="co">#> 4 32 2359 33 -1407</span>
<span class="co">#> 5 50 2145 185 -1255</span>
<span class="co">#> 6 235 2359 156 -1284</span>
<span class="co">#> # ... with 1,201 more rows</span></code></pre></div>
<p>Well, that solved most of the problems, but these two numbers don’t match because we aren’t accounting for flights where the departure time is the next day from the scheduled departure time.</p>
<ol start="4" style="list-style-type: decimal">
<li>Find the 10 most delayed flights using a ranking function. How do you want to handle ties? Carefully read the documentation for <code>min_rank()</code>.</li>
</ol>
<p>I’d want to handle ties by taking the minimum of tied values. If three flights are have the same value and are the most delayed, we would say they are tied for first, not tied for third or second.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">mutate</span>(flights,
<span class="dt">dep_delay_rank =</span> <span class="kw">min_rank</span>(-dep_delay)) %>%
<span class="st"> </span><span class="kw">arrange</span>(dep_delay_rank) %>%<span class="st"> </span>
<span class="st"> </span><span class="kw">filter</span>(dep_delay_rank <=<span class="st"> </span><span class="dv">10</span>)
<span class="co">#> # A tibble: 10 × 20</span>
<span class="co">#> year month day dep_time sched_dep_time dep_delay arr_time</span>
<span class="co">#> <int> <int> <int> <int> <int> <dbl> <int></span>
<span class="co">#> 1 2013 1 9 641 900 1301 1242</span>
<span class="co">#> 2 2013 6 15 1432 1935 1137 1607</span>
<span class="co">#> 3 2013 1 10 1121 1635 1126 1239</span>
<span class="co">#> 4 2013 9 20 1139 1845 1014 1457</span>
<span class="co">#> 5 2013 7 22 845 1600 1005 1044</span>
<span class="co">#> 6 2013 4 10 1100 1900 960 1342</span>
<span class="co">#> # ... with 4 more rows, and 13 more variables: sched_arr_time <int>,</span>
<span class="co">#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,</span>
<span class="co">#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,</span>
<span class="co">#> # minute <dbl>, time_hour <dttm>, dep_delay_rank <int></span></code></pre></div>
<ol start="5" style="list-style-type: decimal">
<li>What does <code>1:3 + 1:10</code> return? Why?</li>
</ol>
<p>It returns <code>c(1 + 1, 2 + 2, 3 + 3, 1 + 4, 2 + 5, 3 + 6, 1 + 7, 2 + 8, 3 + 9, 1 + 10)</code>. When adding two vectors recycles the shorter vector’s values to get vectors of the same length. We get a warning vector since the shorter vector is not a multiple of the longer one (this often, but not necessarily, means we made an error somewhere).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="dv">1</span>:<span class="dv">3</span> +<span class="st"> </span><span class="dv">1</span>:<span class="dv">10</span>
<span class="co">#> Warning in 1:3 + 1:10: longer object length is not a multiple of shorter</span>
<span class="co">#> object length</span>
<span class="co">#> [1] 2 4 6 5 7 9 8 10 12 11</span></code></pre></div>
<ol start="6" style="list-style-type: decimal">
<li>What trigonometric functions does R provide?</li>
</ol>
<p>All the classics: <code>cos</code>, <code>sin</code>, <code>tan</code>, <code>acos</code>, <code>asin</code>, <code>atan</code>, plus a few others that are drive by numerical or computational issues.</p>
</div>
</div>
<div id="grouped-summaries-with-summarise" class="section level2">
<h2><span class="header-section-number">4.6</span> Grouped summaries with <code>summarise()</code></h2>
<div id="exercises-8" class="section level3">
<h3><span class="header-section-number">4.6.1</span> Exercises</h3>
<ol style="list-style-type: decimal">
<li>Brainstorm at least 5 different ways to assess the typical delay characteristics of a group of flights. Consider the following scenarios:</li>
</ol>
<ul>
<li><p>A flight is 15 minutes early 50% of the time, and 15 minutes late 50% of the time.</p></li>
<li><p>A flight is always 10 minutes late.</p></li>
<li><p>A flight is 30 minutes early 50% of the time, and 30 minutes late 50% of the time.</p></li>
<li><p>99% of the time a flight is on time. 1% of the time it’s 2 hours late.</p></li>
</ul>
<p>Which is more important: arrival delay or departure delay?</p>
<p>Arrival delay is more important. Arriving early is nice, but equally as good as arriving late is bad. Variation is worse than consistency; if I know the plane will always arrive 10 minutes late, then I can plan for it arriving as if the actual arrival time was 10 minutes later than the scheduled arrival time.</p>
<p>So I’d try something that calculates the expected time of the flight, and then aggregates over any delays from that time. I would ignore any early arrival times. A better ranking would also consider cancellations, and need a way to convert them to a delay time (perhaps using the arrival time of the next flight to the same destination).</p>
<ol start="2" style="list-style-type: decimal">
<li><p>Come up with another approach that will give you the same output as <code>not_cancelled %>% count(dest)</code> and <code>not_cancelled %>% count(tailnum, wt = distance)</code> (without using <code>count()</code>).</p></li>
<li><p>Our definition of cancelled flights <code>(is.na(dep_delay) | is.na(arr_delay))</code> is slightly suboptimal. Why? Which is the most important column?</p></li>
</ol>
<p>If a flight doesn’t depart, then it won’t arrive. A flight can also depart and not arrive if it crashes; I’m not sure how this data would handle flights that are redirected and land at other airports for whatever reason.</p>
<p>The more important column is <code>arr_delay</code> so we could just use that.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(flights, !<span class="kw">is.na</span>(dep_delay), <span class="kw">is.na</span>(arr_delay)) %>%
<span class="st"> </span><span class="kw">select</span>(dep_time, arr_time, sched_arr_time, dep_delay, arr_delay)
<span class="co">#> # A tibble: 1,175 × 5</span>
<span class="co">#> dep_time arr_time sched_arr_time dep_delay arr_delay</span>
<span class="co">#> <int> <int> <int> <dbl> <dbl></span>
<span class="co">#> 1 1525 1934 1805 -5 NA</span>
<span class="co">#> 2 1528 2002 1647 29 NA</span>
<span class="co">#> 3 1740 2158 2020 -5 NA</span>
<span class="co">#> 4 1807 2251 2103 29 NA</span>
<span class="co">#> 5 1939 29 2151 59 NA</span>
<span class="co">#> 6 1952 2358 2207 22 NA</span>
<span class="co">#> # ... with 1,169 more rows</span></code></pre></div>
<p>Okay, I’m not sure what’s going on in this data. <code>dep_time</code> can be non-missing and <code>arr_delay</code> missing but <code>arr_time</code> not missing. They may be combining different flights?</p>
<ol start="4" style="list-style-type: decimal">
<li>Look at the number of cancelled flights per day. Is there a pattern? Is the proportion of cancelled flights related to the average delay?</li>
</ol>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">cancelled_delayed <-<span class="st"> </span>
<span class="st"> </span>flights %>%
<span class="st"> </span><span class="kw">mutate</span>(<span class="dt">cancelled =</span> (<span class="kw">is.na</span>(arr_delay) |<span class="st"> </span><span class="kw">is.na</span>(dep_delay))) %>%
<span class="st"> </span><span class="kw">group_by</span>(year, month, day) %>%
<span class="st"> </span><span class="kw">summarise</span>(<span class="dt">prop_cancelled =</span> <span class="kw">mean</span>(cancelled),
<span class="dt">avg_dep_delay =</span> <span class="kw">mean</span>(dep_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
<span class="kw">ggplot</span>(cancelled_delayed, <span class="kw">aes</span>(<span class="dt">x =</span> avg_dep_delay, prop_cancelled)) +
<span class="st"> </span><span class="kw">geom_point</span>() +
<span class="st"> </span><span class="kw">geom_smooth</span>()
<span class="co">#> `geom_smooth()` using method = 'loess'</span></code></pre></div>
<p><img src="transform_files/figure-html/unnamed-chunk-38-1.png" width="70%" style="display: block; margin: auto;" /></p>
<ol start="5" style="list-style-type: decimal">
<li>Which carrier has the worst delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not? (Hint: think about <code>flights %>% group_by(carrier, dest) %>% summarise(n())</code>)</li>
</ol>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights %>%
<span class="st"> </span><span class="kw">group_by</span>(carrier) %>%
<span class="st"> </span><span class="kw">summarise</span>(<span class="dt">arr_delay =</span> <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)) %>%
<span class="st"> </span><span class="kw">arrange</span>(<span class="kw">desc</span>(arr_delay))
<span class="co">#> # A tibble: 16 × 2</span>
<span class="co">#> carrier arr_delay</span>
<span class="co">#> <chr> <dbl></span>
<span class="co">#> 1 F9 21.9</span>
<span class="co">#> 2 FL 20.1</span>
<span class="co">#> 3 EV 15.8</span>
<span class="co">#> 4 YV 15.6</span>
<span class="co">#> 5 OO 11.9</span>
<span class="co">#> 6 MQ 10.8</span>
<span class="co">#> # ... with 10 more rows</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">filter</span>(airlines, carrier ==<span class="st"> "F9"</span>)
<span class="co">#> # A tibble: 1 × 2</span>
<span class="co">#> carrier name</span>
<span class="co">#> <chr> <chr></span>
<span class="co">#> 1 F9 Frontier Airlines Inc.</span></code></pre></div>
<p>Frontier Airlines (FL) has the worst delays.</p>
<p>You can get part of the way to disentangling the effects of airports vs. carriers by comparing each flight’s delay to the average delay of destination airport. However, you’d really want to compare it to the average delay of the desination airport, <em>after</em> removing other flights from the same airline.</p>
<p>538 has done something like this: <a href="http://fivethirtyeight.com/features/the-best-and-worst-airlines-airports-and-flights-summer-2015-update/" class="uri">http://fivethirtyeight.com/features/the-best-and-worst-airlines-airports-and-flights-summer-2015-update/</a>.</p>
<ol start="6" style="list-style-type: decimal">
<li>For each plane, count the number of flights before the first delay of greater than 1 hour.</li>
</ol>
<p>I think this requires grouped mutate (but I may be wrong):</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights %>%
<span class="st"> </span><span class="kw">arrange</span>(tailnum, year, month, day) %>%
<span class="st"> </span><span class="kw">group_by</span>(tailnum) %>%
<span class="st"> </span><span class="kw">mutate</span>(<span class="dt">delay_gt1hr =</span> dep_delay ><span class="st"> </span><span class="dv">60</span>) %>%
<span class="st"> </span><span class="kw">mutate</span>(<span class="dt">before_delay =</span> <span class="kw">cumsum</span>(delay_gt1hr)) %>%
<span class="st"> </span><span class="kw">filter</span>(before_delay <<span class="st"> </span><span class="dv">1</span>) %>%
<span class="st"> </span><span class="kw">count</span>(<span class="dt">sort =</span> <span class="ot">TRUE</span>)
<span class="co">#> # A tibble: 3,755 × 2</span>
<span class="co">#> tailnum n</span>
<span class="co">#> <chr> <int></span>
<span class="co">#> 1 N954UW 206</span>
<span class="co">#> 2 N952UW 163</span>
<span class="co">#> 3 N957UW 142</span>
<span class="co">#> 4 N5FAAA 117</span>
<span class="co">#> 5 N38727 99</span>
<span class="co">#> 6 N3742C 98</span>
<span class="co">#> # ... with 3,749 more rows</span></code></pre></div>
<ol start="7" style="list-style-type: decimal">
<li>What does the sort argument to <code>count()</code> do. When might you use it?</li>
</ol>
<p>The sort argument to <code>count</code> sorts the results in order of <code>n</code>. You could use this anytime you would do <code>count</code> followed by <code>arrange</code>.</p>
</div>
</div>
<div id="grouped-mutates-and-filters" class="section level2">
<h2><span class="header-section-number">4.7</span> Grouped mutates and filters</h2>
<div id="exercises-9" class="section level3">