-
Notifications
You must be signed in to change notification settings - Fork 0
/
fmm_read_bench.Rmd
221 lines (181 loc) · 7.1 KB
/
fmm_read_bench.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
title: "Benchmarking Sparse Matrix Market Read Operations"
author: "Rohit Goswami"
date: "2023-11-03"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Benchmarking Sparse Matrix Market Read Operations}
%\VignetteEngine{knitr::rmarkdown}
---
## Introduction
This vignette demonstrates a benchmark comparing the `readMM` function from the
`Matrix` package against the `read_fmm` function from the `fastMatMR` package.
Since `Matrix` does not support reading or writing dense matrices, we focus on
the sparse case.
## Loading Packages
First, we load the necessary packages:
```r
library(Matrix)
library(fastMatMR)
library(microbenchmark)
library(ggplot2)
```
## Benchmarking with Fixed Sparsity
We first benchmark for varying matrix sizes with fixed sparsity.
```r
# Function to create a sparse matrix of given size
create_sparse_matrix <- function(n, sparsity = 0.7) {
mat <- matrix(0, nrow = n, ncol = n)
for (i in 1:n) {
for (j in 1:n) {
if (runif(1) > sparsity) {
mat[i, j] <- rnorm(1)
}
}
}
return(Matrix(mat, sparse = TRUE))
}
# Define a range of matrix sizes
sizes <- c(10, 100, 500, 1000, 2000, 3000)
# Prepare data frame to store results
results_fixed_sparsity <- data.frame()
# Benchmarking
for (n in sizes) {
message("Benchmarking for matrix size: ", n, "x", n)
# Generate a sparse matrix of size n x n
testmat <- create_sparse_matrix(n)
write_fmm(testmat, "sparse.mtx")
# Run the benchmarks, we coerce to a sparse matrix for readMM for fairness
bm <- microbenchmark(
Matrix_readMM = as(readMM("sparse.mtx"), "CsparseMatrix"),
fastMatMR_read_fmm = fmm_to_sparse_Matrix("sparse.mtx"),
times = 10
)
bm$size <- n
results_fixed_sparsity <- rbind(results_fixed_sparsity, bm)
}
#> Benchmarking for matrix size: 10x10
#> Benchmarking for matrix size: 100x100
#> Benchmarking for matrix size: 500x500
#> Benchmarking for matrix size: 1000x1000
#> Benchmarking for matrix size: 2000x2000
#> Benchmarking for matrix size: 3000x3000
```
This is shown visually represented below:
```r
# Plotting
suppressWarnings(print(
ggplot(results_fixed_sparsity, aes(x = size, y = time, color = expr)) +
geom_point() +
geom_smooth(method = "loess") +
ggtitle("Benchmarking reads with fixed sparsity for 70% sparsity") +
xlab("Matrix Size") +
ylab("Time (ns)")
))
#> `geom_smooth()` using formula = 'y ~ x'
```
<div class="figure">
<img src="fixed-sparse-read-1.png" alt="plot of chunk fixed-sparse-read" width="800" />
<p class="caption">plot of chunk fixed-sparse-read</p>
</div>
## Benchmarking with Varying Sparsity
Now, we benchmark for varying sparsity patterns on a large matrix.
```r
# Sparsity levels to test
sparsity_levels <- seq(0.45, 0.95, by = 0.1)
# Prepare data frame to store results
results_varying_sparsity <- data.frame()
# Benchmarking
for (sparsity in sparsity_levels) {
message("Benchmarking for sparsity level: ", sparsity)
# Generate a sparse matrix of size 2000 x 2000 with varying sparsity
testmat <- create_sparse_matrix(2000, sparsity)
write_fmm(testmat, "sparse.mtx")
# Run the benchmarks
bm <- microbenchmark(
Matrix_readMM = as(readMM("sparse.mtx"), "CsparseMatrix"),
fastMatMR_read_fmm = fmm_to_sparse_Matrix("sparse.mtx"),
times = 10
)
bm$sparsity <- sparsity
results_varying_sparsity <- rbind(results_varying_sparsity, bm)
}
#> Benchmarking for sparsity level: 0.45
#> Benchmarking for sparsity level: 0.55
#> Benchmarking for sparsity level: 0.65
#> Benchmarking for sparsity level: 0.75
#> Benchmarking for sparsity level: 0.85
#> Benchmarking for sparsity level: 0.95
```
Now we can plot this:
```r
ggplot(results_varying_sparsity, aes(x = sparsity, y = time, color = expr)) +
geom_point() +
geom_smooth(method = "loess") +
scale_x_log10() +
scale_y_log10() +
ggtitle("Benchmarking reads with varying sparsity for 2000 entries") +
xlab("Sparsity Level (log10)") +
ylab("Time (ns, log10)")
#> `geom_smooth()` using formula = 'y ~ x'
```
<div class="figure">
<img src="varying-sparse-read-1.png" alt="plot of chunk varying-sparse-read" width="800" />
<p class="caption">plot of chunk varying-sparse-read</p>
</div>
## Conclusions
We see that though there are no statistically significant differences in speed
for small matrices, the `fastMatMR` package is significantly faster for large
matrices. This is because the `readMM` function from the `Matrix` reads data
into a triplet form, which gets slower for larger matrices.
## Session Info
This vignette was computed in advance, with the corresponding session info:
```r
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Arch Linux
#>
#> Matrix products: default
#> BLAS: /usr/lib/libblas.so.3.11.0
#> LAPACK: /usr/lib/liblapack.so.3.11.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Iceland
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.4.4 microbenchmark_1.4.10 Matrix_1.5-4.1
#> [4] fastMatMR_1.2.5 testthat_3.1.10
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.4 xfun_0.40 htmlwidgets_1.6.2 devtools_2.4.5
#> [5] remotes_2.4.2.1 processx_3.8.2 lattice_0.21-8 callr_3.7.3
#> [9] generics_0.1.3 vctrs_0.6.3 tools_4.3.1 ps_1.7.5
#> [13] parallel_4.3.1 tibble_3.2.1 fansi_1.0.4 highr_0.10
#> [17] pkgconfig_2.0.3 desc_1.4.2 lifecycle_1.0.3 farver_2.1.1
#> [21] compiler_4.3.1 stringr_1.5.0 brio_1.1.3 munsell_0.5.0
#> [25] decor_1.0.2 httpuv_1.6.11 htmltools_0.5.6 usethis_2.2.2
#> [29] later_1.3.1 pillar_1.9.0 crayon_1.5.2 urlchecker_1.0.1
#> [33] ellipsis_0.3.2 cachem_1.0.8 sessioninfo_1.2.2 nlme_3.1-162
#> [37] mime_0.12 commonmark_1.9.0 tidyselect_1.2.0 digest_0.6.33
#> [41] stringi_1.7.12 dplyr_1.1.2 purrr_1.0.2 labeling_0.4.3
#> [45] splines_4.3.1 rprojroot_2.0.3 fastmap_1.1.1 grid_4.3.1
#> [49] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 pkgbuild_1.4.2
#> [53] utf8_1.2.3 withr_2.5.0 prettyunits_1.1.1 scales_1.2.1
#> [57] promises_1.2.1 cpp11_0.4.6 roxygen2_7.2.3 memoise_2.0.1
#> [61] shiny_1.7.5 evaluate_0.21 knitr_1.43 miniUI_0.1.1.1
#> [65] mgcv_1.8-42 profvis_0.3.8 rlang_1.1.1 Rcpp_1.0.11
#> [69] xtable_1.8-4 glue_1.6.2 xml2_1.3.5 pkgload_1.3.2.1
#> [73] rstudioapi_0.15.0 R6_2.5.1 fs_1.6.3
```