Mauricio Garnier-Villarreal 01 February, 2024
- Welcome
- What is R and why should you learn it?
- Prerequisites
- How to use this resource
- The tutorials
- Progress
Welcome to Well Hello Stats. This is page to learn R for Social Scientists, is a series of tutorials that will teach you how to use R for research in the social sciences. Throughout the tutorials, you will learn how to install and set up R and RStudio, get your data into R, manage your data, and implement some of the most commonly used methods in quantitative social science research using R and RStudio.
There are many great resources out there to learn R. This series of tutorials is set up to teach you the necessary skills in a consistent approach.
R is an open-source statistical software language, that is currently among the most popular languages for research in the social sciences. In comparison to other popular software packages in social scientific research, such as SPSS and Stata, R has several notable advantages:
- R is a programming language, which makes it much more versatile. While R focuses on statistical analysis at heart, it facilitates a wide-range of features, and virtually any tool for data analysis can be implemented.
- The range of things you can do with R is constantly being updated. R is open-source, meaning that anyone can contribute to its development. In particular, people can develop new packages, that can easily and safely be installed from within R with a single command. Since many scholars and industry professionals use R, it is likely that any cutting-edge and bleeding-edge techniques that you are interested in are already available. You can think of it as an app-store for all your data-analysis needs!
- R is free. While for students this is not yet a big deal due to free or cheap student and university licences, this can be a big plus in the commercial sector. Especially for small businesses and free-lancers. Allowing to democratize the access to cutting data analysis methods, for people in situations that otherwise would not be able to have access to a proprietary program.
- The use of syntax base software improves our ability to reproduce/replicate our results, track down mistakes and fix them, and we can save and reuse syntax for future projects.
RStudio is the most commonly used editor for working with R. RStudio makes it easy to write and save code (the instructions for the tasks you want R to execute), to view and plot your data, and to manage your workspace (e.g., the code, data files, and output you are working with).
Our goal is to make this series of tutorials self-sufficient. This means that there are not prerequisites in terms of knowledge of working with R and RStudio. We will start from the very beginning, with how to install R and RStudio on your Computer, how to set up RStudio for an easy workflow, and the very basics of working with data. If you are familiar with other programming languages or a statistical analysis software (like Stata or SPSS), you will be able to learn R even faster.
Importantly, these tutorials are not a substitute for education in quantitative research methods. They do teach you how to implement different methods in R, but they do not cover questions about research design, what the best method might be for the question you are asking, how these methods work and what their assumptions are. Thus, you are responsible for making sure that your analyses are sound.
If you have not worked with R before, it is best to follow the series of tutorials from the beginning. Before we cover specific methods in the social sciences, we start with the installation of R and RStudio, clarify the most important basics for working with R, and teach you how to import data into R.
If you are generally familiar with R, you can skip tutorials on the R basics. If you want to follow tutorials on several methods, we recommend that you have a brief look at tutorial on downloading the data from the World Value Survey (WVS). The WVS will be used throughout most of the following tutorials. It is therefore a good idea to download the data set before continuing with the tutorials. If you are familiar with importing data as well you can jump right to the tutorial on methods.
If you are generally familiar with R, already have your own data set, and just want to know how to implement a specific method in R, you can simply jump to tutorial that covers the methods you are interested in.
In these tutorials we start and focus on base R data manipulation and work, instead of the tidyverse approach. This is because we consider that useRs should first be comfortable with the base R commands, and if desired can transition to use of the tidyverse as an extension of R instead of the default use.
The tutorials .Rmd
and .md
files are found in the tutorials
folder. And the following section links to the respective .md
files in
an structure format, so that you can navigate the tutorials from this
page.
Here you will find links for the respective tutorials, and a short description. They have been structure by increased complexity, kind of following a course.
- Install R and RStudio for Mac: how to step by step install R and RStudio on Mac operating system
- Install R and RStudio for Windows: how to step by step install R and RStudio on Windows operating system
- Setting up RStudio: explanation of RStudio and how to set it up.
- R Basics: basic use of R, data types, data structures, importing data, functions, install and load packages, working directory and saving your work.
- Download the WVS data set: where to request access to the World Value Survey (WVS) data set, used in a lot of the tutorials.
- Import data sets (long): introduction to data formats, Base R, use of packages foreign, haven and rio, general recommendations. Packages: rio, haven, foreign.
- Import data sets (short): introduction to data formats, use of package rio, general recommendations.
- Data management 1: setting up R session, import data set, recode items, create composite scores, variable calculations, selecting subsets. Packages: rio, car, psych.
- Data management 2: setting up R session, import data set, variable types, use the factor function.
- Descriptive Statistics: setting up R session, import data set, continuous items, categorical items, data frame summary. Packages: rio, summarytools.
- Basic plots: setting up R session, import data set, ggplot2 basics, histogram, scatter plot, bar plot, box plot. Packages: rio, ggplot2.
- Correlation: setting up R session, import data set, scatter plot, Pearson correlation, Spearman correlation, Kendal-tau correlation, extracting the matrices, correlogram, pairs plot. Packages: rio, psych, corrplot, ggplot2, GGally.
- Contingency tables
- Reliability: what is reliability analysis?, preparation, reliability analysis (Cronbach’s alpha, McDonald’s omega). Packages: rio, psych, car.
- t-test: setting up R session, import data set, difference in means, one sample t-test, two sample t-test, paired sample t-test, effect sizes, graphing your results. Packages: rio, effectsize, ggpubr.
- Repeated Measures - ANOVA: setting up R session, import data set, convert data from wide to long format, run repeated measures anova and mixed design RM-ANOVA, effect size, post-hoc pairwise and planned comparisons, plot effects. Packages: rio, dplyr, reshape2, marginaleffect, afex, sjlabelled, effectsize.
- MANOVA: introduction to MANOVA, setting up R session, import data set, MANOVA, effect sizes, homogeneity of variances, HE plots, LDA.
- Linear regression: setting up R session, import data set, simple linear regression, linear regression with a binary predictor, multiple linear regression, standardize solution, assumptions, effect size, plots, interpretation. Packages: rio, psych, effectsize, visreg, rockchalk, ggplot2.
- Moderation with
lm
: what is moderation, setting up R session, import data set, moderation analysis steps, categorical and continuous moderator, main effects, interaction models, effect size, probbing, plotting, interpretation. Packages: rio, effectsize, visreg, reghelper. - Moderation with the PROCESS macro: what is moderation, installing PROCESS macro, setting up R session, import data set, moderation analysis steps, categorical and continuous moderator, main effects, interaction models, effect size, probbing, plotting, interpretation. Packages: rio, PROCESS macro
- Mediation with path analysis: what is mediation, setting up R session, import data set, mediation analysis steps, total effect, indirect effect, NHST methods, recommendations and interpretation. Packages: rio, lavaan, semTools.
- Mediation with the PROCESS macro: what is mediation, installing PROCESS macro, setting up R session, import data set, mediation analysis steps, total effect, indirect effect, NHST methods, recommendations and interpretation. Packages: rio, PROCESS macro
- LCA with depmixS4 (categorical indicators): latent class analysis. depmixS4 basics, dichotomous indicator example, class enumeration, and interpretation. Packages: rio, depmixS4, sjlabelled, summarytools, ggplot2.
- LCA with tidySEM (categorical indicators): latent class analysis. tidySEM basics, dichotomous indicator example, class enumeration, and interpretation. Packages: rio, tidySEM, sjlabelled, summarytools, ggplot2, tidyr.
- HMM with depmixS4 (categorical indicators): hidden markov models, depmixS4 basics, dichotomous indicator example, time invariant and time variant HMM, class enumeration, plots. Packages: rio, depmixS4, sjlabelled, summarytools, ggplot2, ggseqplot, tidyr, TraMineR.
- Confirmatory Factor Analysis with continuous indicators: test theory, measurement model, data preparation, CFA, lavaan, estimation, model evaluation, reliability, presenting results. Packages: psych, lavaan, semTools, car, tidyr, ggplot2.
- Exploratory Factor Analysis with continuous indicators: test theory, measurement model, data preparation, EFA, estimation, rotation, factor enumeration, cross-validation. Packages: psych, lavaan, semTools, car, tidyr, ggplot2, patchwork.
As must things in life, these tutorials as a work in progress. So we will continue updating and adding new tutorials.
These tutorials started as a request from the Sociology department, as they are transitioning out of proprietary software. But we expect this to go beyond the departmental needs.
You are welcome to suggest new tutorials, and/or collaborate one.