-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.qmd
84 lines (59 loc) · 5.6 KB
/
intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Introduction {.unnumbered}
```{r}
#| label: libs
#| echo: false
#| message: false
library(igraph)
library(ggraph)
```
## What you will learn
## How this book is organized
## What you won't learn
## Prerequisites
*(This section is taken and slightly adapted from R for Data Science 2e)*
We’ve made a few assumptions about what you already know to get the most out of this book. You should be generally numerically literate, and it’s helpful if you have some basic programming experience already. If you’ve never programmed before, you might find [Hands on Programming](https://rstudio-education.github.io/hopr/) with R by Garrett to be a valuable adjunct to this book.
You need four things to run the code in this book: R, an IDE, a collection of "base" R packages, and a handful of other packages. Packages are the fundamental units of reproducible R code. They include reusable functions, documentation that describes how to use them, and sample data.
### R
To download R, go to CRAN, the **c**omprehensive **R** **a**rchive **n**etwork, <https://cloud.r-project.org>. A new major version of R comes out once a year, and there are 2-3 minor releases each year. It’s a good idea to update regularly. Upgrading can be a bit of a hassle, especially for major versions that require you to re-install all your packages, but putting it off only makes it worse. We recommend R 4.3.0 or later for this book.
### IDE
We make no assumption about your **i**ntegrated **d**evelopment **e**nvironment. RStudio is a popular choice for R programming, which you can download from <https://posit.co/download/rstudio-desktop/>. RStudio is updated a couple of times a year, and it will automatically let you know when a new version is out, so there’s no need to check back. It’s a good idea to upgrade regularly to take advantage of the latest and greatest features. For this book, make sure you have at least RStudio 2022.02.0.
But nothing will stop you to use other environments, such as vscode, emacs, or vi if you are already used to them.
### The "base" Packages
When it comes to performing data science tasks with R, there are many excellent packages (or even package ecosystems) to choose from. There is for instance the `tidyverse`, `data.table`, or `rpolars`. All have there advantages and disadvantages but ultimately, they allow to do similar tasks with varying syntax.
In the R world for networks, there are far less choices and for the most basic network analytic tasks, there are essentially two packages: `igraph` and `sna`.
We will discuss these packages here and motivate why we would recommend `igraph` as the goto package for standard network analytic tasks. But that does not render `sna` and its companion `network` void. On the contrary, there are some essential modelling tasks which can only be done within the realm of `network`.
Both `igraph` and `network` also provide data structures which facilitate to store and process network data. There also once existed the package `graph` but that is not available on CRAN anymore, only via Bioconductor.
The figure below shows how many packages on CRAN rely on those three packages (i.e. they are mentioned in `Depends`, `Imports`, or `Suggests`).
```{r}
#| label: crannet
#| echo: false
#| message: false
sg <- readRDS("data/crannet.RDS")
ggraph(sg, "stress") +
geom_edge_link0(
edge_color = "grey66", edge_width = 0.3,
arrow = arrow(
angle = 15, length = unit(0.15, "inches"),
ends = "last", type = "closed"
)
) +
geom_node_point(shape = 21, aes(fill = col, size = seed)) +
scale_fill_brewer(type = "qual", name = "") +
scale_size_manual(values = c(2, 5), guide = "none") +
guides(fill = guide_legend(override.aes = list(size = 5))) +
theme_void() +
theme(legend.position = "bottom") +
coord_equal(clip = "off")
```
The figure was produced with the help of the `cranet` package.
`igraph` seems to be clearly favored by the R community. So if you install a package for, say, signed network analysis,
changes are high that it depends on the graph structures provided by `igraph`. Besides the data structures,
the package offers a large variety of network analytic methods which are all implemented in C. The methods are well optimized and
also work nicely for large graphs.
The `network` package historically shares some commonalities with `igraphs` data structures. The package itself, though is really only providing the data structure and no analytic methods. The `sna` package ([link](https://cran.r-project.org/package=sna)) implements network analytic tools using the data structures of `network`.
Overall, the syntax and provided methods are very much comparable between `igraph` and `sna` and they are almost interchangeable in this regard. The advantage of igraph is its speed. I have run several benchmark tasks and `igraph` usually comes out on top. That being said, there is no real case to be made against `network`/`sna`. If you are into statistical modelling of networks, then that should actually be the preferred choice since the `ergm` package is build on top of `network`. In this case you probably also
want to look at the meta package `statnet` ([link](http://statnet.org/)) which includes `network`, `sna`, and `ergm` (among other packages).
Note that the package `intergraph` ([link](https://cran.r-project.org/package=intergraph)) can be used to quickly switch representations between `igraph` and `network`.
The `igraph` package will play a major role in the Part on **Descripitve Network Analysis**, while `network` will be more prominent for **Inferential Network Analysis**.
### Other packages
[LIST USED PACKAGES WITH DESCRIPTION]