-
Notifications
You must be signed in to change notification settings - Fork 0
/
NIBLSESurvey2DataAnalysis_All.Rmd
951 lines (816 loc) · 43.6 KB
/
NIBLSESurvey2DataAnalysis_All.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
---
title: "Statistical analysis of NIBLSE Survey 2 Data"
author: "William Morgan (College of Wooster)"
date: "Sep 22, 2023"
output:
word_document: default
html_document:
df_print: paged
pdf_document: default
---
These R Notebooks presents R code for statistical analyses of 2nd NIBLSE Survey data.
<!--
#### Prerequisites
Clear the Global Environment and load R packages that will be needed later.
-->
```{r echo=FALSE, warning=FALSE, message=FALSE}
rm(list = ls())
library(tidyverse)
library(ggmosaic)
library(infer)
library(moderndive)
```
# Chapter 1: Overview of survey data
This chapter starts with the `Merged_Data_Anonymous` set produced in Chapter 0.
```{r echo=FALSE, warning=FALSE, message=FALSE}
Merged_Data_Anonymous <- read_csv("Merged_Data_Anonymous.csv")
```
### CUREs/SUREs
```{r echo=FALSE}
TeachBioinfo <- Merged_Data_Anonymous %>%
filter(Q3TeachBioinfor != "I do not include bioinformatics in my teaching and do not plan to do so") %>%
filter(Q3TeachBioinfor != "I do not include bioinformatics in my teaching at this time, but do plan to do so")
nCures <- str_subset(TeachBioinfo$`Q24CURE_SURE`, "part") %>% length()
```
Of the `r tally(TeachBioinfo) %>% pull()` instructors who teach any bioinformatics in a life science course, `r nCures` used a CURE or SURE to do so.
### Is more undergraduate bioinformatics content needed at your institution?
```{r echo=FALSE}
MoreBioinfo <- Merged_Data_Anonymous %>% count(Q5MoreCourses) %>%
drop_na(Q5MoreCourses)
nYes <- MoreBioinfo %>% filter(Q5MoreCourses == "Yes") %>% pull()
nMaybe <- MoreBioinfo %>% filter(Q5MoreCourses == "Maybe") %>% pull()
```
Of the `r sum(MoreBioinfo$n)` participants who answered if more bioinformatics content is needed in undergraduate courses at their instituion, `r nYes + nMaybe` (`r 100 * (nYes + nMaybe) / sum(MoreBioinfo$n) %>% round()`%) said "Yes" (`r nYes`) or "Maybe" (`r nMaybe`).
### Barriers
```{r echo=FALSE}
Plans_Teach <- Merged_Data_Anonymous %>% filter(Q3TeachBioinfor != "I do not include bioinformatics in my teaching and do not plan to do so")
Plans_barriers <- Plans_Teach %>%
count(Q7FacedBarriers) %>%
drop_na()
nPlans_barriers <- Plans_barriers %>% filter(Q7FacedBarriers == "Yes") %>% pull()
```
Of the `r sum(Plans_barriers$n)` instructors who do or plan to include bioinformatics content, `r nPlans_barriers` (`r 100 * nPlans_barriers / sum(Plans_barriers$n %>% round())`%) reported facing barriers to integrating bioinformatics into their teaching.
### Count data for each identifier variable
Let's visualize the count of each identifier variable (gender, ethnicity, etc.) using a count table when the number of unique responses (levels) is three or less and a bar graph when the number is larger. (Here, we ignore identifier variables encoded numerically in favor of those encoded with character strings.)
```{r echo=FALSE, fig.width=8.0}
library(skimr)
# get the names of variables with a few levels
few_levels <- Merged_Data_Anonymous %>%
dplyr::select(Gender:`Q33 Which of the following represents your highest academic degree?`) %>%
skim_without_charts(where(is.character)) %>%
dplyr::filter(character.n_unique < 4) %>%
pull(skim_variable)
# count the responses for each variable
few_levels %>%
map(function(x) {
Merged_Data_Anonymous %>% count(.data[[x]])
}
)
# get the names of variables with 4-20 levels
many_levels <- Merged_Data_Anonymous %>%
dplyr::select(Gender:`Q33 Which of the following represents your highest academic degree?`) %>%
skim_without_charts(where(is.character)) %>%
dplyr::filter(character.n_unique >= 4 & character.n_unique <= 20) %>%
pull(skim_variable)
# plot the responses for each variable
many_levels %>%
map(function(x) {
Merged_Data_Anonymous %>%
ggplot(aes(x = .data[[x]])) +
geom_bar(fill = "red") +
theme(axis.text.x=element_text(angle=45,hjust=1))
}
)
```
## Is Carnegie classification of your current institution associated with faculty gender proportions?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# make count table & get proportions for each gender
count_table <- Merged_Data_Anonymous %>%
group_by(Gender, BASIC2018_bins_text.Current) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
# hypothesis test: chi-squared
test_results <- Merged_Data_Anonymous %>%
chisq_test(BASIC2018_bins_text.Current ~ Gender)
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous %>%
ggplot() +
geom_mosaic(aes(x = product(BASIC2018_bins_text.Current),
fill = Gender)) +
labs(x = "Gender",
y = "Proportion",
subtitle = "Gender composition based on Carnegie classification") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
# Associations between identifier variables and survey responses
Are there any associations between particular identifier and response variables? In addition to statistical tests, associations are visualized using mosaic plots, which present the frequency of each explanatory (x axis) and response (y-axis) variable. (Note: Non-responses are removed from the following analyses.)
## Do non-male faculty experience more barriers/more severe barriers than male faculty?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# make count table & get proportions for each gender
count_table <- Merged_Data_Anonymous %>%
group_by(Gender, Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
prop_f <- count_table %>% filter(Gender == "F" & Q7FacedBarriers == "Yes") %>% pull()
prop_m <- count_table %>% filter(Gender == "M" & Q7FacedBarriers == "Yes") %>% pull()
prop_u <- count_table %>% filter(Gender == "U" & Q7FacedBarriers == "Yes") %>% pull()
# hypothesis test: chi-squared
test_results <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
chisq_test(Q7FacedBarriers ~ Gender)
```
There is a significant association between gender and encountering barriers (p-val = `r test_results %>% pull() %>% signif(digits = 2)`). Compared to males, other genders are more likely to report barriers to integrating bioinformatics into their teaching (M = `r prop_m %>% signif(2) * 100`%, F = `r prop_f %>% signif(2) * 100`%, U = `r prop_u %>% signif(2) * 100`%).
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous %>%
ggplot() +
geom_mosaic(aes(x = product(Gender),
fill = Q7FacedBarriers)) +
labs(x = "Gender",
y = "Proportion",
subtitle = "Have you faced any barriers integrating bioinformatics into your teaching?") +
theme_classic(base_size = 13) +
theme(legend.position = "none")
```
### Are the gender differences due to any particular type of barrier?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# Retrieve "I lack" agree/disagree barrier columns & get agree counts
Barriers_gender_df <- Merged_Data_Anonymous %>%
select(Gender, `I lack expertise in bioinformatics`:`My student population lacks interest in bioinformatics....20`) %>%
pivot_longer(!Gender, names_to = "Barrier", values_to = "Response")
Barriers_gender_table <- Barriers_gender_df %>%
filter(!is.na(Response)) %>%
count(Gender, Barrier, Response)
# hypothesis test: chi-squared
Barriers_char <- unique(Barriers_gender_df$Barrier)
test_results <- Barriers_char %>%
map_df(function(x) {
Barriers_gender_df %>%
filter(Gender != "U") %>%
filter(!is.na(Response)) %>%
filter(Barrier == x) %>%
chisq_test(Response ~ Gender)
}
)
test_results <- test_results %>%
mutate(Barrier = Barriers_char,
adj.p_val = p.adjust(p_value, method = "fdr")
) %>%
select(Barrier, everything())
sig_results <- test_results %>%
filter(adj.p_val < 0.05)
```
Of those respondents who reported facing a barrier to integrating bioinformatics into their teaching, we sought to identify associations between gender and specific barriers. Of the `r nrow(test_results)` queried barriers, `r nrow(sig_results)` (`r sig_results$Barrier`) exhibited a significant difference among genders (adj. p-value = `r sig_results %>% pull %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
Barriers_gender_df %>%
filter(!is.na(Response)) %>%
ggplot() +
geom_mosaic(aes(x = product(Gender),
fill = Response)) +
labs(x = "Gender",
y = "Proportion",
subtitle = "Which barriers have you faced integrating bioinformatics into your teaching?"
) +
facet_wrap("Barrier", ncol = 2, strip.position = "bottom") +
theme_classic(base_size = 8) +
theme(legend.position = "none")
```
The difference between genders is more apparent when we explore the severity of each challenge, where the non-responses (NA) were converted to "Not a challenge." (Due to small numbers, U gender was removed to meet the assumptions of statistical testing.)
```{r echo=FALSE, warning=FALSE, message=FALSE}
BarriersQ8_gender_df <- Merged_Data_Anonymous %>%
select(Gender, `I lack expertise in bioinformatics.`:`My student population lacks interest in bioinformatics....30`) %>%
filter(Gender != "U") %>%
pivot_longer(!Gender, names_to = "Barrier", values_to = "Response") %>%
mutate(Response = replace_na(Response, "Not a challenge")) %>% # change NAs
mutate(Response = fct_relevel(Response, "Not a challenge")) # ordered by severity
# hypothesis test: chi-squared
BarriersQ8_char <- unique(BarriersQ8_gender_df$Barrier)
test_results <- BarriersQ8_char %>%
map_df(function(x) {
BarriersQ8_gender_df %>%
filter(Barrier == x) %>%
chisq_test(Response ~ Gender)
}
)
test_results <- test_results %>%
mutate(Barrier = BarriersQ8_char,
adj.p_val = p.adjust(p_value, method = "fdr")
) %>%
select(Barrier, everything())
sig_results <- test_results %>%
filter(adj.p_val < 0.05)
# make count table with proportions for significant barriers
count_table <- sig_results$Barrier %>%
map(function(x) {
BarriersQ8_gender_df %>%
filter(Barrier == x) %>%
group_by(Gender, Response) %>%
summarise(count = n()) %>%
mutate(percentage = round(100 * count / sum(count), 1)) %>%
select(-count) %>%
pivot_wider(names_from = Gender, values_from = percentage)
}
)
```
Of those respondents who answered the barrier question (Q7 or Q21, response 1), we sought to identify associations between gender and the severity of each barrier. Of the `r nrow(test_results)` queried barriers, `r nrow(sig_results)` (`r sig_results$Barrier`) exhibited a significant difference among genders (adj. p-value = `r sig_results %>% pull %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
BarriersQ8_gender_df %>%
ggplot() +
geom_mosaic(aes(x = product(Gender),
fill = Response)) +
labs(x = "Gender",
y = "Proportion",
subtitle = "How severe is this barrier to integrating bioinformatics into your teaching?"
) +
facet_wrap("Barrier", ncol = 2, strip.position = "bottom") +
theme_classic(base_size = 8) +
theme(legend.position = "none")
```
## Is there an association between Carnegie classification of current institution and barriers to integrating bioinformatics?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# make count table
count_table <- Merged_Data_Anonymous %>%
group_by(BASIC2018_bins_text.Current,Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
prop_bacc <- count_table %>% filter(Q7FacedBarriers == "Yes" & BASIC2018_bins_text.Current == "Baccalaureate Colleges") %>%
pull() %>% signif(3)
prop_Doct <- count_table %>% filter(Q7FacedBarriers == "Yes" & BASIC2018_bins_text.Current == "Doctoral/Professional Universities") %>%
pull() %>% signif(3)
prop_all <- Merged_Data_Anonymous %>%
group_by(Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count)) %>%
filter(Q7FacedBarriers == "Yes") %>%
pull() %>% signif(3)
# flag those with few counts
# hypothesis test: chi-squared
test_results <- Merged_Data_Anonymous %>%
filter(BASIC2018_bins.Current %in% c(1, 3, 4, 5)) %>% # remove those with too few number
chisq_test(Q7FacedBarriers ~ BASIC2018_bins_text.Current)
# chisq.test(x=Merged_Data_Anonymous$BASIC2018_bins_text, y=Merged_Data_Anonymous$Q7FacedBarriers)$expected
```
Do faculty at some institution types more frequently report barriers to integrating bioinformatics into their teaching? There is a significant association between Carnegie classification and encountering barriers (p-value = `r test_results %>% pull() %>% signif(digits = 2)`). For example, faculty at baccalaureate colleges reported facing barriers more frequently than average (`r prop_bacc * 100`% vs. `r prop_all * 100`%), while doctoral institutions reported facing barriers less frequently (`r prop_Doct * 100`%).
```{r echo=FALSE, warning=FALSE, message=FALSE}
# reorder levels for plotting
Merged_Data_Anonymous$BASIC2018_bins_text <- fct_relevel(Merged_Data_Anonymous$BASIC2018_bins_text.Current,
"Doctoral/Professional Universities",
after = Inf)
# view mosaic plot
Merged_Data_Anonymous %>%
filter(BASIC2018_bins.Current %in% c(1, 3, 4, 5)) %>% # remove those with too few number
droplevels() %>% # remove unused levels
ggplot() +
geom_mosaic(aes(x = product(BASIC2018_bins_text.Current),
fill = Q7FacedBarriers)) +
labs(x="Institution type",
y = "Proportion",
subtitle = "Have you faced any barriers integrating bioinformatics into your teaching?") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# Retrieve "I lack" agree/disagree barrier columns & get agree counts
Barriers_BASIC2018_df <- Merged_Data_Anonymous %>%
filter(BASIC2018_bins.Current %in% c(1, 3, 4, 5)) %>% # remove those types with too few number
droplevels() %>% # remove unused levels
select(BASIC2018_bins_text.Current, `I lack expertise in bioinformatics`:`My student population lacks interest in bioinformatics....20`) %>%
pivot_longer(!BASIC2018_bins_text.Current, names_to = "Barrier", values_to = "Response")
Barriers_BASIC2018_table <- Barriers_BASIC2018_df %>%
filter(!is.na(Response)) %>%
count(BASIC2018_bins_text.Current, Barrier, Response)
# hypothesis test: chi-squared
test_results <- Barriers_char %>%
map_df(function(x) {
Barriers_BASIC2018_df %>%
filter(!is.na(Response)) %>%
filter(Barrier == x) %>%
chisq_test(Response ~ BASIC2018_bins_text.Current)
}
)
test_results <- test_results %>%
mutate(Barrier = Barriers_char,
adj.p_val = p.adjust(p_value, method = "fdr")
) %>%
select(Barrier, everything())
sig_results <- test_results %>%
filter(adj.p_val < 0.05)
```
Of those respondents who reported facing a barrier to integrating bioinformatics into their teaching, we sought to identify associations between institution type and specific barriers. Of the `r nrow(test_results)` queried barriers, `r nrow(sig_results)` (`r sig_results$Barrier`) exhibited a significant difference among different institutions (adj. p-value = `r sig_results %>% pull %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
Barriers_BASIC2018_df %>%
filter(!is.na(Response)) %>%
ggplot() +
geom_mosaic(aes(x = product(BASIC2018_bins_text.Current),
fill = Response)) +
labs(x="Institution type",
y = "Proportion",
# subtitle = "Which barriers have you faced integrating bioinformatics into your teaching?"
) +
facet_wrap("Barrier", ncol = 2, strip.position = "bottom") +
theme_classic(base_size = 8) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
## Do URM faculty experience more barriers/more severe barriers to integrating bioinformatics than non-URM faculty?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# make count table & get proportions for each
count_table <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
group_by(URM, Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
prop_URM <- count_table %>% filter(URM == "URM" & Q7FacedBarriers == "Yes") %>% pull()
n_URM <- count_table %>% filter(URM == "URM") %>% pull(count) %>% sum()
prop_non <- count_table %>% filter(URM == "non-URM" & Q7FacedBarriers == "Yes") %>% pull()
n_non <- count_table %>% filter(URM == "non-URM") %>% pull(count) %>% sum()
# hypothesis test: diff in props
# Calculating the observed statistic
d_hat <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
specify(Q7FacedBarriers ~ URM, success = "Yes") %>%
calculate(stat = "diff in props")
# Then, generating the null distribution,
null_dist <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
specify(Q7FacedBarriers ~ URM, success = "Yes") %>%
hypothesize(null = "independence") %>%
generate(reps = 1000) %>%
calculate(stat = "diff in props")
# Calculating the p-value from the null distribution and observed statistic,
p_value <- null_dist %>%
get_p_value(obs_stat = d_hat, direction = "two-sided")
# test_results <- Merged_Data_Anonymous %>%
# filter(!is.na(Q7FacedBarriers)) %>%
# chisq_test(Q7FacedBarriers ~ URM)
```
Do faculty who identified as an under-represented minority more frequently report barriers to integrating bioinformatics into their teaching? There is insufficient evidence to indicate a significant association between URM faculty status and encountering barriers (p-val = `r p_value %>% signif(digits = 2)`), although URM faculty are somewhat more likely to report barriers to integrating bioinformatics into their teaching than non-URM faculty (URM = `r prop_URM %>% signif(2) * 100`%, n = `r n_URM`; non-URM = `r prop_non %>% signif(2) * 100`%, n = `r n_non`).
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous %>%
ggplot() +
geom_mosaic(aes(x = product(URM),
fill = Q7FacedBarriers)) +
labs(x = "URM status",
y = "Proportion",
subtitle = "Have you faced any barriers integrating bioinformatics into your teaching?") +
theme_classic(base_size = 13) +
theme(legend.position = "none")
```
Since the difference wasn't statistically significant, specific barriers were not analyzed for an association with URM status.
The effect of URM is more apparent when we explore the severity of each challenge, where the non-responses (NA) were converted to "Not a challenge."
```{r echo=FALSE, warning=FALSE, message=FALSE}
BarriersQ8_URM_df <- Merged_Data_Anonymous %>%
select(URM, `I lack expertise in bioinformatics.`:`My student population lacks interest in bioinformatics....30`) %>%
pivot_longer(!URM, names_to = "Barrier", values_to = "Response") %>%
mutate(Response = replace_na(Response, "Not a challenge")) %>% # change NAs
mutate(Response = fct_relevel(Response, "Not a challenge")) # ordered by severity
# hypothesis test: chi-squared
BarriersQ8_char <- unique(BarriersQ8_URM_df$Barrier)
test_results <- BarriersQ8_char %>%
map_df(function(x) {
BarriersQ8_URM_df %>%
filter(Barrier == x) %>%
chisq_test(Response ~ URM)
}
)
test_results <- test_results %>%
mutate(Barrier = BarriersQ8_char,
adj.p_val = p.adjust(p_value, method = "fdr")
) %>%
select(Barrier, everything())
sig_results <- test_results %>%
filter(adj.p_val < 0.05)
```
Of those respondents who responded to the barrier question (Q7 or Q21, response 1), we sought to identify associations between URM status and the severity of each barrier. Of the `r nrow(test_results)` queried barriers, `r nrow(sig_results)` (`r sig_results$Barrier`) exhibited a significant difference among genders (adj. p-value = `r sig_results %>% pull %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
BarriersQ8_URM_df %>%
ggplot() +
geom_mosaic(aes(x = product(URM),
fill = Response)) +
labs(x = "URM Status",
y = "Proportion",
# subtitle = "How severe is this barrier to integrating bioinformatics into your teaching?"
) +
facet_wrap("Barrier", ncol = 2, strip.position = "bottom") +
theme_classic(base_size = 8) +
theme(legend.position = "none")
```
## Do faculty at MSIs experience more barriers to integrating bioinformatics?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# create column to encode four categories of institutions:
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
mutate(MSI_status = 2*MSI.Current + HBCU.Current + HSI.Current)
# 2*MSI, no (0); HBCU, no (2) ; HSI, no (0) (453 responses) = 2
# 2*MSI, yes (1); HBCU, yes (1); HSI, no (0) (18 responses) = 3
# 2*MSI, yes (1); HBCU, no (2) ; HSI, no (0) (27 responses) = 4
# 2*MSI, yes (1); HBCU, no (2); HSI, yes (1) (55 responses) = 5
# make count table & get proportions for each
count_table <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
group_by(MSI_status, Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
prop_non <- count_table %>% filter(MSI_status == 2 & Q7FacedBarriers == "Yes") %>% pull()
n_non <- count_table %>% filter(MSI_status == 2) %>% pull(count) %>% sum()
prop_HBCU <- count_table %>% filter(MSI_status == 3 & Q7FacedBarriers == "Yes") %>% pull()
n_HBCU <- count_table %>% filter(MSI_status == 3) %>% pull(count) %>% sum()
prop_other <- count_table %>% filter(MSI_status == 4 & Q7FacedBarriers == "Yes") %>% pull()
n_other <- count_table %>% filter(MSI_status == 4) %>% pull(count) %>% sum()
prop_HSI <- count_table %>% filter(MSI_status == 5 & Q7FacedBarriers == "Yes") %>% pull()
n_HSI <- count_table %>% filter(MSI_status == 5) %>% pull(count) %>% sum()
# hypothesis test: chi-squared
# convert to factor for chi-squared test
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
mutate(MSI_status = factor(MSI_status,
labels = c("non-MSI", "HBCU",
"Other MSI", "HSI")))
test_results <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
chisq_test(Q7FacedBarriers ~ MSI_status)
```
Do faculty at HBCUs, HSIs or other minority-serving institutions more frequently report barriers to integrating bioinformatics into their teaching than faculty at non-MSI institutions? Although faculty at HBCUs and HSIs more frequently reported barriers to integrating bioinformatics into their teaching than other MSI and non-MSI faculty (HBCU = `r prop_HBCU %>% signif(2) * 100`%, n = `r n_HBCU`; HSI = `r prop_HSI %>% signif(2) * 100`%, n = `r n_HSI`; other-HSI = `r prop_other %>% signif(2) * 100`%, n = `r n_other`; non-MSI = `r prop_non %>% signif(2) * 100`%, n = `r n_non`), there is insufficient evidence to indicate a significant association between MSI faculty status and encountering barriers (p-val = `r test_results$p_value %>% signif(digits = 2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous %>%
ggplot() +
geom_mosaic(aes(x = product(MSI_status),
fill = Q7FacedBarriers)) +
labs(x = "MSI Institution?",
y = "Proportion",
subtitle = "Have you faced any barriers integrating bioinformatics into your teaching?") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
Since the difference wasn't statistically significant, specific barriers were not analyzed for an association with MSI faculty.
## Is terminal degree year associated with barriers to integrating bioinformatics?
```{r echo=FALSE, warning=FALSE, message=FALSE}
# make count table
count_table <- Merged_Data_Anonymous %>%
group_by(`Q14 In which year did you earn your highest academic degree?`,
Q7FacedBarriers) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
# hypothesis test: chi-squared after removing decades with <10 responses
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
filter(`Q14 In which year did you earn your highest academic degree?` %in% c("1980-1989", "1990-1999","2000-2009","2010-2019")) %>%
mutate(Q14DegreeYear = as.factor(`Q14 In which year did you earn your highest academic degree?`))
test_results <- Merged_Data_Anonymous %>%
filter(!is.na(Q7FacedBarriers)) %>%
chisq_test(Q7FacedBarriers ~ Q14DegreeYear)
```
There is a significant association between between when the highest degree was awarded and how frequently faculty reported barriers to integrating bioinformatics into their teaching (p-val = `r test_results$p_value %>% signif(digits = 2)`). Faculty were more likely to report encountering a barrier if they received their highest degree more recently.
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
mutate(Q14DegreeYear = as.factor(`Q14 In which year did you earn your highest academic degree?`))
Merged_Data_Anonymous %>%
filter(!is.na(Q14DegreeYear)) %>%
filter(Q14DegreeYear %in% c("1980-1989", "1990-1999","2000-2009","2010-2019")) %>%
ggplot() +
geom_mosaic(aes(x = product(Q14DegreeYear), fill = Q7FacedBarriers)) +
labs(x="Year awarded highest degree?",
y="Faced barriers integrating bioinformatics?",
caption = "Before 1980, after 2020, and NAs were removed.") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# Retrieve "I lack" agree/disagree barrier columns & get agree counts
Barriers_DegreeYear_df <- Merged_Data_Anonymous %>%
filter(!is.na(Q14DegreeYear)) %>%
filter(Q14DegreeYear %in% c("1980-1989", "1990-1999","2000-2009","2010-2019")) %>%
droplevels() %>% # remove unused levels for chisq_test
select(Q14DegreeYear, `I lack expertise in bioinformatics`:`My student population lacks interest in bioinformatics....20`) %>%
pivot_longer(!Q14DegreeYear, names_to = "Barrier", values_to = "Response")
Barriers_DegreeYear_table <- Barriers_DegreeYear_df %>%
filter(!is.na(Response)) %>%
count(Q14DegreeYear, Barrier, Response)
# hypothesis test: chi-squared
test_results <- Barriers_char %>%
map_df(function(x) {
Barriers_DegreeYear_df %>%
filter(!is.na(Response)) %>%
filter(Barrier == x) %>%
# mutate(Q14DegreeYear = as.factor(Q14DegreeYear),
# Response = as.factor(Response)) %>%
# count(Q14DegreeYear, Barrier, Response)
chisq_test(Response ~ Q14DegreeYear)
}
)
test_results <- test_results %>%
mutate(Barrier = Barriers_char,
adj.p_val = p.adjust(p_value, method = "fdr")
) %>%
select(Barrier, everything())
sig_results <- test_results %>%
filter(adj.p_val < 0.05)
# filter(p_value < 0.05)
```
Was there a specific barrier associated with when the highest degree was awarded? Of those respondents who reported facing a barrier to integrating bioinformatics into their teaching, we sought to identify associations between degree year and specific barriers. Of the `r nrow(test_results)` queried barriers, "`r nrow(sig_results)`" exhibited a significant difference among the degree years (adj. p-value < 0.05).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
Barriers_DegreeYear_df %>%
filter(!is.na(Response)) %>%
ggplot() +
geom_mosaic(aes(x = product(Q14DegreeYear),
fill = Response)) +
labs(x = "Year when Highest Degree Awarded",
y = "Proportion",
# subtitle = "Have you faced any barriers integrating bioinformatics into your teaching?"
) +
facet_wrap("Barrier", ncol = 2, strip.position = "bottom") +
theme_classic(base_size = 8) +
theme(legend.position = "none")
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# fix Q12 column to remove internal comma
Merged_Data_Anonymous$`Q12 Which of the following best describes your level of bioinformatics training? Select ALL that apply.` <- Merged_Data_Anonymous$`Q12 Which of the following best describes your level of bioinformatics training? Select ALL that apply.` %>%
str_replace_all(pattern = coll(".,"),
replacement = ".")
# split multiple Q12 responses at commas
Merged_Data_Anonymous2 <- Merged_Data_Anonymous %>%
separate(col = `Q12 Which of the following best describes your level of bioinformatics training? Select ALL that apply.`,
into = c("Q12_1", "Q12_2", "Q12_3", "Q12_4", "Q12_5"),
sep = ",",
remove = FALSE)
# collect multiple Q12 responses in one column
Merged_Data_Anonymous2 <- Merged_Data_Anonymous2 %>%
pivot_longer(cols = starts_with("Q12_"),
names_to = "Position",
names_prefix = "Q12_",
values_to = "Q12Training",
values_drop_na = TRUE)
# truncate for fewer response types
Merged_Data_Anonymous2$Q12Training <- Merged_Data_Anonymous2$Q12Training %>% str_trunc(12)
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# any grad-level training in bioinformatics?
Merged_Data_Anonymous$GradTraining <-
Merged_Data_Anonymous$`Q12 Which of the following best describes your level of bioinformatics training? Select ALL that apply.` %>%
str_replace_all("undergraduate", "undergrad") %>%
str_detect("graduate")
# make count table & get proportions for each
count_table <- Merged_Data_Anonymous %>%
group_by(Q14DegreeYear, GradTraining) %>%
summarise(count = n()) %>%
mutate(proportion = count / sum(count))
prop_1980s <- count_table %>% filter(Q14DegreeYear == "1980-1989" & GradTraining == "TRUE") %>% pull()
prop_1990s <- count_table %>% filter(Q14DegreeYear == "1990-1999" & GradTraining == "TRUE") %>% pull()
prop_2000s <- count_table %>% filter(Q14DegreeYear == "2000-2009" & GradTraining == "TRUE") %>% pull()
prop_2010s <- count_table %>% filter(Q14DegreeYear == "2010-2019" & GradTraining == "TRUE") %>% pull()
# hypothesis test: chi-squared
test_results <- Merged_Data_Anonymous %>%
chisq_test(GradTraining ~ Q14DegreeYear)
```
Do "more experienced" instructors differ in their bioinformatics training compared to "less experienced" instructors (i.e., those who received their terminal degree more recently)? The overall pattern of bioinformatics training -- with multiple responses allowed per respondent -- did not differ significantly based on year of degree award. However, those who received their highest degree more recently were significantly more likely to have had bioinformatics training in graduate school (1980's = `r prop_1980s %>% signif(2) * 100`%, 1990's = `r prop_1990s %>% signif(2) * 100`%, 2000's = `r prop_2000s %>% signif(2) * 100`%, 2010's = `r prop_2010s %>% signif(2) * 100`%; p-val = `r test_results$p_value %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE}
# view mosaic plot
Merged_Data_Anonymous2 %>%
ggplot() +
geom_mosaic(aes(x = product(Q14DegreeYear), fill = Q12Training)) +
labs(x="Year awarded highest degree",
y="Bioinformatics training",
caption = "Multiple answers permitted. Before 1980, after 2020, and NAs removed.") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
# fix Q3 column to remove internal comma
Merged_Data_Anonymous2$Q3TeachBioinfor <- Merged_Data_Anonymous2$Q3TeachBioinfor %>%
str_replace_all(pattern = coll(".,"),
replacement = ".")
# split multiple Q3 responses at commas
Merged_Data_Anonymous2 <- Merged_Data_Anonymous2 %>%
separate(col = Q3TeachBioinfor,
into = c("Q3_1", "Q3_2", "Q3_3", "Q3_4", "Q3_5"),
sep = ",",
remove = FALSE)
# collect multiple Q3 responses in one column
Merged_Data_Anonymous2 <- Merged_Data_Anonymous2 %>%
pivot_longer(cols = starts_with("Q3_"),
names_to = "Position3",
names_prefix = "Q3_",
values_to = "Q3TeachBioinfo",
values_drop_na = TRUE)
# truncate & filter so only 4 response types (none, some, substantial, dedicated)
Merged_Data_Anonymous2$Q3TeachBioinfo <- Merged_Data_Anonymous2$Q3TeachBioinfo %>% str_trunc(20)
Merged_Data_Anonymous2<- Merged_Data_Anonymous2 %>%
filter(Q3TeachBioinfo != " but do plan to d...")
# hypothesis test: chi-squared
test_results <- Merged_Data_Anonymous2 %>%
chisq_test(Q3TeachBioinfo ~ Q14DegreeYear)
```
Is there an association with instructor experience and bioinformatics teaching duties? The overall pattern of bioinformatics teaching duties -- with multiple responses allowed per respondent -- did not differ significantly based on year of degree award (p-val = `r test_results$p_value %>% signif(2)`).
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=8}
# view mosaic plot
Merged_Data_Anonymous2 %>%
ggplot() +
geom_mosaic(aes(x = product(Q14DegreeYear), fill = Q3TeachBioinfo)) +
labs(x="Year awarded highest degree",
y= "Do you teach bioinformatics?",
caption = "Multiple answers permitted. Before 1980, after 2020, and NAs removed.") +
theme_classic(base_size = 13) +
theme(legend.position = "none") +
theme(axis.text.x=element_text(angle=45,hjust=1))
```
## Is terminal degree year associated with NOT integrating bioinformatics?
The frequency of not integrating bioinformatics does not significantly differ between the decades of the terminal degree.
```{r echo=FALSE, warning=FALSE, message=FALSE}
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
rename(`Decade of Degree` = `Q14 In which year did you earn your highest academic degree?`) %>%
mutate(`Decade of Degree` = case_when(`Decade of Degree` == "2010-2019" ~ "2010s",
`Decade of Degree` == "2000-2009" ~ "2000s",
`Decade of Degree` == "1990-1999" ~ "1990s",
`Decade of Degree` == "1990-1999" ~ "1990s",
`Decade of Degree` == "1980-1989" ~ "1980s",
is.na(`Decade of Degree`) ~ "Unknown Decade",
TRUE ~ `Decade of Degree`))
# Select `Decade of Degree` & "I lack" level of challenge columns, replace names & make data tidy
teaching_df <- Merged_Data_Anonymous %>%
select(`Decade of Degree`,
Q3TeachBioinfor) %>%
mutate(DontTeachBioinfor = str_detect(Q3TeachBioinfor, "not"))
# Find decades with <30 participants (and unknown) to remove later
Decade_count <- Merged_Data_Anonymous %>%
count(`Decade of Degree`)
Decade_keep <- Decade_count %>%
filter(n >30) %>%
pull(`Decade of Degree`)
# Conduct test for association between degree decade and not teaching bioinformatics
# count "Don't teach bioinformatics" for each decade
NotTeachBioinfo_Decade_count <- teaching_df %>%
filter(`Decade of Degree` %in% Decade_keep) %>%
group_by(`Decade of Degree`) %>%
count(DontTeachBioinfor, name = "count") %>%
mutate(proportion = count / sum(count))
# hypothesis test: chi-squared after removing decades with few responses
teaching_df %>%
filter(`Decade of Degree` %in% Decade_keep) %>%
drop_na() %>%
chisq_test(DontTeachBioinfor ~ `Decade of Degree`)
```
```{r }
# plot with TeachBioinfor frequency by decade
NotTeachBioinfo_Decade_count %>%
filter(DontTeachBioinfor == FALSE) %>% # therefore, integrate bioinformatics
ggplot(aes(x=`Decade of Degree`,
y=proportion))+
geom_bar(stat = "Identity")+
labs(y = "percentage of respondents", x= "") +
scale_y_continuous(labels = scales::percent) +
theme_gray(base_size = 20, base_family = "sans") +
theme(line = element_line(colour = "black"),
rect = element_rect(fill = "white", linetype = 0, colour = NA))+
theme(legend.background = element_rect(),
legend.position = "bottom",
legend.title = element_blank()) +
theme(panel.grid.major =
element_line(colour = "grey"),
panel.grid.minor = element_blank(),
strip.background = element_rect())+
theme(axis.title.x=element_blank(),
axis.ticks.x=element_blank()) +
theme(strip.text.x = element_text(size = 18, face = "bold"))+
theme(plot.background = element_rect(fill = "white"))+
theme(panel.background = element_rect(fill = "white"))+
theme(panel.grid.major.y = element_blank())+
theme(axis.line = element_line(colour = "black", linewidth = 0.5))+
coord_flip()+
theme(axis.text = element_text(size = 18)) +
theme(panel.grid.minor=element_blank())
```
## Is level of training associated with NOT integrating bioinformatics? (New 3/2023)
Participants could choose multiple responses for their level of training. For this analysis, each participant was described by their highest level of training in the following order:
1. "At Least Some Coursework" includes "graduate degree", "post-graduate certificate", "graduate courses", "undergraduate degree" (and items below).
2. "At Least Workshops/Bootcamps" includes "workshops or bootcamp" (and items below).
3. "Self-taught Only" includes "self-taught# (and item below).
4. "No training" includes "no training/experience" only.
5. If no response (NA), then "Unknown Training".
```{r echo=FALSE, warning=FALSE, message=FALSE}
Merged_Data_Anonymous <- Merged_Data_Anonymous %>%
rename(`Bioinformatics Training` = `Q12 Which of the following best describes your level of bioinformatics training? Select ALL that apply.`) %>%
mutate(TrainingGroups = case_when(str_detect(`Bioinformatics Training`, "graduate") ~ "At Least Some Coursework",
str_detect(`Bioinformatics Training`, "workshops") ~ "At Least Workshops/Bootcamps",
str_detect(`Bioinformatics Training`, "self") ~ "Self-taught Only",
str_detect(`Bioinformatics Training`, "no training/experience") ~ "No Training",
is.na(`Bioinformatics Training`) ~ "Unknown Training",
TRUE ~ "Unknown Training"),
)
```
Perform chi-squared test for association between degree decade and level of bioinformatics training.
```{r echo=FALSE, warning=FALSE, message=FALSE}
# hypothesis test: chi-squared after removing decades with few responses
teaching_df2 <- Merged_Data_Anonymous %>%
select(TrainingGroups,
Q3TeachBioinfor) %>%
mutate(DontTeachBioinfor = str_detect(Q3TeachBioinfor, "not"))
teaching_df2 %>%
drop_na() %>%
chisq_test(DontTeachBioinfor ~ TrainingGroups)
# count "Don't teach bioinformatics" for each Training Group
NotTeachBioinfo_Training_count <- teaching_df2 %>%
drop_na() %>%
group_by(TrainingGroups) %>%
count(DontTeachBioinfor, name = "count") %>%
mutate(proportion = count / sum(count))
```
```{r echo=FALSE, warning=FALSE, message=FALSE}
NotTeachBioinfo_Training_count %>%
filter(DontTeachBioinfor == FALSE) %>%
ggplot(aes(x=factor(TrainingGroups, levels = c("Unknown Training", "No Training", "Self-taught Only",
"At Least Workshops/Bootcamps","At Least Some Coursework")),
y=proportion)) +
geom_bar(stat = "Identity")+
labs(y = "percentage of respondents", x= "") +
scale_y_continuous(labels = scales::percent) +
theme_gray(base_size = 20, base_family = "sans") +
theme(line = element_line(colour = "black"),
rect = element_rect(fill = "white", linetype = 0, colour = NA))+
theme(legend.background = element_rect(),
legend.position = "bottom",
legend.title = element_blank()) +
theme(panel.grid.major =
element_line(colour = "grey"),
panel.grid.minor = element_blank(),
strip.background = element_rect())+
theme(axis.title.x=element_blank(),
axis.ticks.x=element_blank()) +
theme(strip.text.x = element_text(size = 18, face = "bold"))+
theme(plot.background = element_rect(fill = "white"))+
theme(panel.background = element_rect(fill = "white"))+
theme(panel.grid.major.y = element_blank())+
theme(axis.line = element_line(colour = "black", linewidth = 0.5))+
coord_flip()+
theme(axis.text = element_text(size = 18)) +
theme(panel.grid.minor=element_blank())
```
Some training in bioinformatics, even if only self-taught or a short-term workshop/bootcamp, significantly increases the likelihood of teaching a course that integrates bioinformatics from 11% to >60%.
# Multiple component analysis (MCA) using FactoMineR
The data set was explored using multiple component analysis (Husson, F., Le, S., Pages, J. 2010. *Exploratory Multivariate Analysis by Example Using R*. Chapman and Hall.).
## Selection of active and supplementary variables
We will use the survey responses regarding barriers as the active variables, and the descriptor variables explored in Chapter 2 (gender, URM, etc.) will be used as supplementary variables. To start, we'll create a data frame only with the active variables, removing duplicate rows in the process.
```{r}
library(FactoMineR)
require(factoextra)
Active_Vars_df <- Merged_Data_Anonymous2 %>%
select(respID, `I lack expertise in bioinformatics.`:`My student population lacks interest in bioinformatics....30`) %>%
distinct()
```
If a respondent did not face a particular barrier, then their response to the challenge level questions was NA. These NA's will be changed to "Not a challenge".
```{r}
Active_Vars_df <- Active_Vars_df %>%
replace(is.na(.), "Not a challenge")
```
We will create another data frame with desired supplementary variables (removing duplicate rows). Then, we combine the two data frames and move the `respID` column to row names before preceding with MCA.
```{r}
Supp_Vars_df <- Merged_Data_Anonymous2 %>%
select(respID, Gender, URM, BASIC2018_bins_text.Current, URM, MSI_status, Q14DegreeYear) %>%
distinct()
All_Vars_df <- Active_Vars_df %>% left_join(Supp_Vars_df, by = "respID") %>%
column_to_rownames(var = "respID")
# shorten variable names for readability on MCA plots
All_Vars_df <- All_Vars_df %>%
rename(Expertise = `I lack expertise in bioinformatics.`,
Experience = `I lack experience in teaching bioinformatics....22`,
Time = `I lack time to restructure course(s).`,
Autonomy = `I lack the autonomy to add content to my course(s)....24`,
Space = `I lack space in my course(s) to add content....25`,
Materials = `I lack curricular materials....26`,
My_Tech = `I lack appropriate technical resources (internet access/software/hardware/IT support)....27`,
Student_Tech = `My student population lacks access to appropriate technical resources (internet access/software/hardware/IT support)....28`,
Prereqs = `My student population lacks prerequisite skills`,
Interest = `My student population lacks interest in bioinformatics....30`,
Carnegie = BASIC2018_bins_text.Current,
Degree_Year = `Q14DegreeYear`
)
```
```{r}
# following is tweaked from Help examples
results.MCA <- MCA(All_Vars_df, quali.sup = 11:15)
summary(results.MCA)
plot(results.MCA, invisible = c("var", "quali.sup"), cex=0.7)
plot(results.MCA, invisible = c("ind", "quali.sup"), cex=0.6)
plot(results.MCA, invisible = c("ind", "var"), cex=0.5, max.overlaps = 1000)
plot(results.MCA, invisible = c("quali.sup"), cex=0.8)
dimdesc(results.MCA)
plotellipses(results.MCA, invisible = c("ind"), keepvar = 11)
plotellipses(results.MCA, axes = c(1,2), invisible = c("ind"), keepvar = 12, max.overlaps = 10)
plotellipses(results.MCA, axes = c(1,2), invisible = c("ind"), keepvar = 14, max.overlaps = 10)
plotellipses(results.MCA, axes = c(1,4), invisible = c("ind"), keepvar = 15, max.overlaps = 10)
# following is adapted from FactoMineR YouTube video
plot(results.MCA, invisible = c("quali.sup", "ind"), label = c("var", "quali.sup"), autoLab = "y", cex=0.7)
plot(results.MCA, invisible = "ind", autoLab = "y", selectMod = "cos2 10", xlim = c(-4,4), cex=0.7)
```