/
userguide.mdpp
673 lines (524 loc) · 30.7 KB
/
userguide.mdpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
micompm - Multivariate independent comparison of observations
=============================================================
!TOC
## Introduction
### What is micompm?
_micompm_ is a [MATLAB]/[Octave] port of the original [micompr] [R] package for
comparing multivariate samples associated with different groups. It uses
principal component analysis to convert multivariate observations into a set of
linearly uncorrelated statistical measures, which are then compared using a
number of statistical methods. This technique is independent of the
distributional properties of samples and automatically selects features that
best explain their differences, avoiding manual selection of specific points or
summary statistics. The procedure is appropriate for comparing samples of time
series, images, spectrometric measures or similar multivariate observations.
If you use _micompm_, please cite reference [\[1\]][ref1].
### Basic concepts
Given _n_ observations (_m_-dimensional), PCA can be used to obtain their
representation in the principal components (PCs) space (_r_-dimensional). PCs
are ordered by decreasing variance, with the first PCs explaining most of the
variance in the data. At this stage, PCA-reshaped observations associated with
different groups can be compared using statistical methods. More specifically,
hypothesis tests can be used to check if the sample projections on the PC space
are drawn from populations with the same distribution. There are two possible
lines of action:
1. Apply a [MANOVA] test to the samples, where each observation has
_q_-dimensions, corresponding to the first _q_ PCs (dimensions) such that these
explain a user-defined minimum percentage of variance.
2. Apply a univariate test to observations in individual PCs. Possible tests
include the [_t_-test] and the [Mann-Whitney _U_ test] for comparing two
samples, or [ANOVA] and [Kruskal-Wallis test], which are the respective
parametric and non-parametric versions for comparing more than two samples.
The MANOVA test yields a single _p_-value from the simultaneous comparison of
observations along multiple PCs. An equally succinct answer can be obtained
with a univariate test using the [Bonferroni] correction or a similar method
for handling _p_-values from multiple comparisons.
Conclusions concerning whether samples are statistically similar can be drawn
by analyzing the _p_-values produced by the employed statistical tests, which
should be below the typical 1% or 5% when samples are significantly different.
The scatter plot of the first two PC dimensions can also provide visual,
although subjective feedback on sample similarity.
While the procedure is appropriate for comparing multivariate observations with
highly correlated and similar scale dimensions, assessing the similarity of
“systems” with multiple outputs of different scales is also possible. The
simplest approach would be to apply the proposed method to samples of
individual outputs, and analyze the results in a multiple comparison context.
An alternative approach consists in concatenating, observation-wise, all
outputs, centered and scaled to the same order of magnitude, thus reducing a
“system” with k outputs to a “system” with one output. The proposed method
would then be applied to samples composed of concatenated observations
encompassing the existing outputs. This technique is described in detail in
reference [\[1\]][ref1] in the context of comparing simulation outputs.
### Available functions
The typical _micompm_ workflow is based on the following functions:
* [grpoutputs] - Load and group outputs from files containing multiple
observations of the groups to be compared.
* [cmpoutput] - Compares output observations from two or more groups.
* [micomp] - Performs multiple independent comparisons of output observations.
* [micomp_show] - Generate tables and plots of model-independent comparison of
observations.
* [cmpassumptions] - Verify the assumptions for parametric tests applied to
the comparison of output observations.
Note that some of the statistical tests available in [MATLAB] and [Octave],
used in the [cmpoutput] and [cmpassumptions] functions, return slightly
different _p_-values. _micompm_ also provides and uses additional [helper] and
[3rd party] functions.
## The _micompm_ workflow
### Getting started
Clone or download _micompm_. Then, either: 1) launch [MATLAB]/[Octave] directly
from the `micompm` folder; or, 2) within [MATLAB]/[Octave], `cd` into the
`micompm` folder and execute the [startup] script.
Note that _micompm_ requires the [MATLAB Statistics toolbox] (when used with
[MATLAB]) or the [Octave statistics package] (when used with [Octave]).
### Data layout and data file format
The basic data layout used in _micompm_ consists of _n_ x _m_ matrices, each
matrix representing an output, where _n_ is the number of observations and _m_
corresponds to the number of variables or dimensions. Individual observations
in these matrices must be associated with a group. This association is
expressed with a _n_-dimensional integer vector, where its _i_ th value
corresponds to the _i_ th observation (row) of the output matrix. For example:
```
data =
0.5341 0.6815 0.0190 0.5497 46.4804
0.0900 3.3933 0.0495 8.5071 34.8334
0.1117 2.4759 0.0148 5.6056 29.1395
0.6523 0.0037 0.1980 5.6094 19.8188
0.7032 6.0581 0.1055 1.5949 12.6425
0.7911 4.2880 0.0959 3.4867 18.5939
groups =
1 1 1 2 2 2
```
In this example, the `data` matrix contains six five-dimensional
observations. The `groups` vector specifies that the first three observations
(rows) are associated with group 1, while the last three belong to group 2.
The [grpoutputs] function loads and groups outputs from files containing
observations of the groups to be compared. Each individual file represents an
observation, and must be comprised of numerical data with _m_ rows and _g_
columns, where rows correspond to dimensions or variables and columns to
different outputs. The [grpoutputs] function returns two variables: 1) a cell
array containing _g_ output matrices (_n_ x _m_); and, 2) a _n_-dimensional
integer vector defining the group each observation belongs to. In other words,
[grpoutputs] returns data ready to be used in other _micompm_ functions.
### Compare observations from two or more groups
The [cmpoutput] function compares observations from two or more groups and is
at the core of the _micompm_ toolbox. Its prototype is as follows:
```matlab
[npcs, p_mnv, p_par, p_npar, scores, varexp] = cmpoutput(ve, data, groups, summary)
```
The first parameter, `ve`, specifies the percentage of variance which must be
explained by the principal components (PCs) compared in the [MANOVA] test. More
precisely, it determines the number of PCs used in the test. The second
parameter, `data`, is the _n_ x _m_ output matrix containing the data to be
compared, while the third parameter is the _n_-dimensional vector specifying
the `groups` to which the observations in `data` belong to. The last parameter,
`summary`, is optional and defines the number of PCs to show the _p_-values for
in the case of univariate tests. It can be set to 0 in order to completely
suppress the comparison summary. Besides printing this summary, [cmpoutput]
returns the following information:
* `npcs` - Number of principal components which explain `ve` percentage of
variance.
* `p_mnv` - _P_-values for the [MANOVA] test for `npcs` principal components.
* `p_par` - Vector of _p_-values for the parametric test applied to groups
along each principal component ([_t_-test] for 2 groups, [ANOVA] for more than
2 groups).
* `p_npar` - Vector of _p_-values for the non-parametric test applied to groups
along each principal component ([Mann-Whitney _U_ test] for 2 groups,
[Kruskal-Wallis test] for more than 2 groups).
* `scores` - _n_ x (_n_ - 1) matrix containing projections of output data in
the principal components space. Rows correspond to observations, columns to
principal components.
* `varexp` - Percentage of variance explained by each principal component.
<a name="verifyassumptions"></a>
### Verify assumptions for the performed parametric tests
The [cmpoutput] function performs several statistical tests, including the
[_t_-test] (on each PC) and [MANOVA] (on the number of PCs that explain `ve`
percentage of variance). These two tests are parametric, which means they
expect samples to be drawn from distributions with particular characteristics,
namely that: 1) they are drawn from a normally distributed population; and, 2)
they are drawn from populations with equal variances. The [cmpassumptions]
function performs additional tests that verify these assumptions. It is invoked
as follows:
```matlab
[p_unorm, p_mnorm, p_uvar, p_mvar] = cmpassumptions(scores, groups, npcs, summary)
```
The function accepts as arguments the PCA `scores` returned by [cmpoutput],
the `groups` to which the observations in `scores` belong to, and the number of
PCs (for the multivariate comparison with [MANOVA]). The `summary` argument
plays a similar role to [cmpoutput]'s equivalent. [cmpassumptions] returns
_p_-values for the assumptions tests, namely:
* `p_unorm` - Matrix _p_-values from the [Shapiro-Wilk] test for univariate
normality, rows correspond to groups, columns to PCs.
* `p_mnorm` - Vector of _p_-values from the [Royston] test of multivariate
normality (on `npcs`), one _p_-value per group.
* `p_uvar` - Vector of _p_-values from the [Bartlett's] test for equality of
variances, one _p_-value per PC.
* `p_mvar` - _P_-value from the [Box's M] test for the homogeneity of
covariance matrices (on `npcs`).
_P_-values less than the typical 0.05 or 0.01 thresholds may be considered
statistically significant, casting doubt on the respective assumption. However,
as discussed in reference [\[1\]][ref1], analysis of these these _p_-values is
often more elaborate.
### Multiple comparisons and different outputs
The [micomp] function performs multiple comparisons of different outputs,
making use the functionality provided by [grpoutputs] and [cmpoutput]. It is
invoked as follows:
```matlab
c = micomp(nout, ccat, ve, varargin)
```
The `nout` parameter specifies the number of outputs to compare, including an
optional concatenated output. The centering and scaling method for the optional
concatenated output is given in the `ccat` parameter (available options are
'center', 'auto', 'range', 'iqrange', 'vast', 'pareto' or 'level'); if `ccat`
is set to 0 or '', the concatenated output is not generated. The `ve` argument
defines the percentage of variance explained by the _q_ principal components
(i.e. number of dimensions) used in the [MANOVA] test. The remaining arguments,
`varargin`, define the data and the comparisons to be performed.
The [micomp] function returns a struct with several fields containing the
results provided by [cmpoutput] for all comparisons and outputs.
### Visualization and publication quality tables
Plots and publication quality tables summarizing the performed comparisons can
be generated with the [micomp_show] function. The prototype of this function is
simple:
```matlab
[tbl, fids] = micomp_show(type, c, nout, ncomp)
```
The first parameter, `type`, defines the type of output to generate. If set to
0, a LaTeX table is generated, while setting it to 1 yields a plain text table.
Setting `type` to 2 also produces a plain text table, but additionally
generates a number of plots summarizing the performed comparisons. The second
parameter, `c`, is the struct returned by the [micomp] function. `nout` and
`ncomp` are the number of outputs and comparisons, respectively.
The [micomp_show] function returns `tbl`, containing the generated table, and
`fids`, which are the handles of the generated plots, if any.
## Tutorial
The tutorial uses the following dataset, which corresponds to the results
presented in reference [\[1\]][ref1]:
* [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.46848.svg)](http://dx.doi.org/10.5281/zenodo.46848)
Unpack the dataset to any folder and put the complete path to this folder in
variable `datafolder`:
```matlab
datafolder = 'path/to/dataset';
```
The dataset contains output from several implementations or variants of the
[PPHPC] agent-based model. The [PPHPC] model, discussed in reference
[\[2\]][ref2], is a realization of prototypical predator-prey system with six
outputs:
1. Sheep population
2. Wolves population
3. Quantity of available grass
4. Mean sheep energy
5. Mean wolves energy
6. Mean value of the grass countdown parameter
The following implementations or variants of the [PPHPC] model were used to
generate the data:
1. [Canonical NetLogo][pphpc_netlogo] implementation.
2. Parallel [Java][pphpc_java] implementation.
3. Parallel [Java][pphpc_java] variant with agent shuffling disabled.
4. Parallel [Java][pphpc_java] variant with a slight difference in one the of
simulation parameters.
The first two implementations strictly follow the [PPHPC] conceptual model
[\[2\]][ref2], and should generate statistically similar outputs. Variants 3
and 4 are purposefully misaligned, and should yield outputs with statistically
significant differences from the first two.
The datasets were collected under five different model sizes (100 _x_ 100, 200
_x_ 200, 400 _x_ 400, 800 _x_ 800 and 1600 _x_ 1600) and two distinct
parameterizations (_v1_ and _v2_), as discussed in reference [\[1\]][ref1]. For
the remainder of this tutorial we will focus on model size 400 _x_ 400 and
parameterization _v1_.
### Loading simulation data
The [grpoutputs] function can be used to load the model outputs into the format
required by [cmpoutput]. The idea here is to group outputs from implementation
1, which is the canonical model realization, with outputs from the remaining
implementations. For example, the following command will load data from
implementations 1 and 2, which are supposedly aligned:
```matlab
[o_12, g_12] = grpoutputs('range', [datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_ok'], 'stats400v1*.txt');
```
The `o_12` variable contains a cell array of seven matrices, each matrix
corresponding to one of the six outputs, plus a seventh concatenated output
(range scaled). Since the data contains 30 runs during 4001 iterations of each
model implementation, individual matrices have 60 rows and 4001 columns. The
seventh matrix, containing the concatenated output, contains 24006 columns
(4000 _x_ 6). In turn, `g_12`, a vector of length 60, specifies the
implementations to which the runs are associated with.
Similarly, outputs from implementations 1 and 3, the latter with a small
realization difference, can be loaded as follows:
```matlab
[o_13, g_13] = grpoutputs('range', [datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_noshuff'], 'stats400v1*.txt');
```
Finally, the following command groups outputs from implementations 1 and 4:
```matlab
[o_14, g_14] = grpoutputs('range', [datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_diff'], 'stats400v1*.txt');
```
### Comparing implementation outputs
Simulation outputs can be compared individually with the [cmpoutput] function.
For example, the following instructions compares the first output (sheep
population) of implementations 1 and 2, requesting that 90% of the variance be
explained by the PCs used in the [MANOVA] test:
```matlab
cmpoutput(0.9, o_12{1}, g_12);
```
Note that the first output is specified in the index of the `o_12` cell array.
The command produces the following output:
```
P-values for the parametric test (t-test)
-----------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.53 0.0206 0.319 0.763 0.378 0.756 0.308 0.783 ...
P-values for the non-parametric test (Mann-Whitney U test)
----------------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.483 0.0215 0.464 0.865 0.42 0.684 0.318 0.807 ...
P-value for the MANOVA test (24 PCs, 90.17% of variance explained)
-------------------------------------------------------------------
0.498
```
Since no _p_-values are significant (i.e., <0.01 or <0.05), implementations 1
and 2 seem to be aligned, at least with respect to the first output. In order
to draw more concrete conclusions, all six outputs should be compared.
Nonetheless, comparing only the optional concatenated output (in the 7th
position of the `o_12` cell array) provides a good summary of the overall
alignment of the two implementations:
```matlab
cmpoutput(0.9, o_12{7}, g_12);
```
This command yields:
```
P-values for the parametric test (t-test)
-----------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.879 0.0405 0.389 0.524 0.377 0.699 0.326 0.695 ...
P-values for the non-parametric test (Mann-Whitney U test)
----------------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.935 0.0224 0.395 0.438 0.234 0.438 0.304 0.559 ...
P-value for the MANOVA test (39 PCs, 90.63% of variance explained)
-------------------------------------------------------------------
0.565
```
The second PC _p_-values are slightly significant (<0.05). However, as
discussed in reference [\[1\]][ref1], a few significant _p_-values are to be
expected, and output misalignments are mostly reflected in the first PC
_p_-values. As such, and considering that the _p_-values are generally
non-significant, it is not possible to show that the implementations are
misaligned.
Typically, how does implementation or output misalignment manifests itself?
Comparing an output from implementations 1 and 3, the latter with a small
algorithmic difference with respect to the conceptual model, answers this
question:
```matlab
cmpoutput(0.9, o_13{1}, g_13);
```
In this case, the [cmpoutput] function produces the following summary:
```
P-values for the parametric test (t-test)
-----------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.0421 0.128 0.00385 0.767 0.37 0.000117 0.461 0.333 ...
P-values for the non-parametric test (Mann-Whitney U test)
----------------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.0468 0.105 0.00762 0.784 0.549 0.000168 0.569 0.549 ...
P-value for the MANOVA test (24 PCs, 90.82% of variance explained)
-------------------------------------------------------------------
1.44e-13
```
There are some significant univariate _p_-values, namely for PC01 (<0.05), PC03
and PC06 (both <0.01). However, the multivariate _p_-value, produced by the
[MANOVA] test for the first 24 PCs, is clearly significant. These results
suggest that implementations 1 and 3 generate statistically dissimilar
behaviors with respect to the sheep population output.
Finally, comparing the outputs of implementations 1 and 4 clarifies how
[cmpoutput] behaves when one of the input parameters of the model is modified
(as is the case of implementation 4):
```matlab
cmpoutput(0.8, o_14{2}, g_14);
```
To change things a bit, in this case we compare output 2 (wolves population),
and request that only 80% of the variance is explained by the PCs used in the
[MANOVA] test. The following summary is generated:
```
P-values for the parametric test (t-test)
-----------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
8.67e-67 0.966 0.906 0.8 0.935 0.695 0.854 0.929 ...
P-values for the non-parametric test (Mann-Whitney U test)
----------------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
3.02e-11 0.641 0.9 0.762 0.888 0.559 0.819 0.982 ...
P-value for the MANOVA test (2 PCs, 81.82% of variance explained)
------------------------------------------------------------------
9.4e-65
```
Results show highly significant univariate (PC01) and multivariate _p_-values.
Thus, it is possible to conclude that implementations 1 and 4 display
considerably different behavior.
### Assessing distributional assumptions
In the previous section the _p_-values of the [_t_][_t_-test] and [MANOVA]
tests were used to draw conclusions concerning the alignment of simulation
outputs. However, as [already discussed](#verifyassumptions), these tests make
a number of distributional assumptions. In the next example, the
[cmpassumptions] function is used to evaluate the assumptions underlying the
last comparison in the previous section, namely the comparison of the second
output of implementations 1 and 4.
First, [cmpoutput] is invoked again, but this time its return values are kept.
Additionally, the optional `summary` parameter is set to zero, since we do not
require the summary to be displayed:
```matlab
[npcs, p_mnv, p_par, p_npar, scores, varexp] = cmpoutput(0.8, o_14{2}, g_14, 0);
```
Now [cmpassumptions] can be invoked:
```matlab
cmpassumptions(scores, g_14, npcs);
```
For this comparison, [cmpassumptions] generates the following summary:
```
P-values for the Shapiro-Wilk test (univariate normality)
---------------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
Group 1 0.498 0.222 0.426 0.707 0.384 0.338 0.631 0.864 ...
Group 2 0.029 0.342 0.415 0.622 0.867 0.843 0.818 0.861 ...
P-values for the Royston test (multivariate normality, 2 dimensions)
--------------------------------------------------------------------
Group 1 0.247
Group 2 0.0564
P-values for Bartlett's test (equality of variances)
----------------------------------------------------
PC01 PC02 PC03 PC04 PC05 PC06 PC07 PC08
0.704 1.52e-07 0.437 0.0029 0.00234 0.694 0.664 0.0893 ...
P-value for Box's M test (homogeneity of covariance matrices on 2 dimensions)
-----------------------------------------------------------------------------
4.93e-06
```
The assumption of normality, the most crucial, seems to be verified. There are
no significant _p_-values in the univariate case ([Shapiro-Wilk] test), at
least for the eight first _p_-values. The same is true for the multivariate
comparison on two PCs (i.e., dimensions) according to the _p_-values yielded by
[Royston]'s test. The assumption of equal variances is not so clear. It stands
in the univariate case for the first PC (the most important), but doubt is cast
in a few less meaningful PCs, as shown by [Bartlett's] test _p_-values.
Multivariate homogeneity of covariance matrices for the first two PCs is not
confirmed by [Box's M] test. However, as discussed in reference [\[1\]][ref1],
this test is highly sensitive, and this assumption is not really critical when
sample size is equal for both groups, which is the case in this comparison.
Summarizing, these results indicate that parametric test assumptions are
essentially verified for the most critical tests.
### Simultaneous comparisons of multiple outputs
The [cmpoutput] function compares one output at a time. However, many “systems”
have more than one output; while outputs can be concatenated (via the
[grpoutputs] function), it may be preferable to have a more general idea of how
the comparison fares for individual outputs. Furthermore, it can also be useful
to perform several comparisons at the same time. The [micomp] function solves
this problem. In the code below, [micomp] is used to perform three simultaneous
comparisons (implementation 1 vs. 2, 1 vs. 3 and 1 vs. 4) of seven outputs (the
six model outputs, plus the additional concatenated output):
```matlab
c = micomp(7, 'range', 0.9, ...
{[datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_ok'], 'stats400v1*.txt'}, ...
{[datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_noshuff'], 'stats400v1*.txt'}, ...
{[datafolder '/nl_ok'], 'stats400v1*.txt', [datafolder '/j_ex_diff'], 'stats400v1*.txt'});
```
The variable returned by [micomp], `c`, contains the information provided by
[cmpoutput] for all comparison/output combinations. While this information can
be further processed by the user, _micompm_ also provides the [micomp_show]
function, which generates tables and plots summarizing the performed
comparisons:
```matlab
micomp_show(1, c, 7, 3)
```
The previous command invokes [micomp_show], requesting a plain text table
(first parameter set to 1) summarizing the comparisons contained in `c` (second
parameter) for seven outputs (third paramter) and three comparisons (fourth
parameter), generating the following table:
```
-------------------------------------------------------------------------------------------------------------------------------------------
| Comp. | Test | Output 1 | Output 2 | Output 3 | Output 4 | Output 5 | Output 6 | Output 7 |
-------------------------------------------------------------------------------------------------------------------------------------------
| Comp. 1 | #PCs | 24 | 32 | 25 | 36 | 43 | 26 | 39 |
| | MNV | 0.497962 | 0.190919 | 0.514572 | 0.0501458 | 0.544597 | 0.539895 | 0.565243 |
| | TT | 0.530184 | 0.836141 | 0.804125 | 0.783653 | 0.378008 | 0.804548 | 0.87892 |
| | MW | 0.482517 | 0.78446 | 0.673495 | 0.982307 | 0.149449 | 0.684323 | 0.935192 |
-------------------------------------------------------------------------------------------------------------------------------------------
| Comp. 2 | #PCs | 24 | 29 | 24 | 12 | 43 | 25 | 38 |
| | MNV | 1.43906e-13 | 6.14241e-30 | 1.19777e-08 | 1.29199e-59 | 0.236509 | 2.63812e-08 | 3.07441e-12 |
| | TT | 0.0420774 | 1.98361e-23 | 0.108483 | 3.63094e-67 | 0.0169031 | 0.108666 | 0.389861 |
| | MW | 0.0467558 | 3.01986e-11 | 0.10547 | 3.01986e-11 | 0.0260775 | 0.111987 | 0.395267 |
-------------------------------------------------------------------------------------------------------------------------------------------
| Comp. 3 | #PCs | 17 | 7 | 8 | 13 | 38 | 1 | 31 |
| | MNV | 6.17271e-44 | 6.50151e-78 | 5.98915e-71 | 6.22156e-55 | 3.12626e-24 | NaN | 6.55174e-19 |
| | TT | 3.59397e-39 | 8.6741e-67 | 1.89671e-58 | 5.77936e-62 | 1.74162e-43 | 2.80948e-85 | 9.77023e-25 |
| | MW | 3.01986e-11 | 3.01986e-11 | 3.01986e-11 | 3.01986e-11 | 3.01986e-11 | 3.01986e-11 | 3.33839e-11 |
-------------------------------------------------------------------------------------------------------------------------------------------
```
Setting the first parameter to 2, while returning the same plain text table,
also yields the plots below:
![plots](https://user-images.githubusercontent.com/3018963/30485300-db8cc0ca-9a24-11e7-8152-0b962c03d7d2.png)
Finally, setting the first parameter to zero will return the source code for a
LaTeX table. After compiling the returned code in a LaTeX document, the table
will show as follows:
![table](https://user-images.githubusercontent.com/3018963/30485509-87637f74-9a25-11e7-9985-0c27de1454ff.png)
## Limitations
_micompm_ has the following limitations when compared with the original R
[implementation][micompr]:
* It does not support outputs with different lengths.
* It does not directly provide _p_-values adjusted with the weighted Bonferroni
procedure.
* It is unable to directly apply and compare multiple MANOVA tests to each
output/comparison combination for multiple user-specified variances to explain.
* The generated LaTeX tables are not directly configurable using function
arguments.
<a name="unittests">
## Unit tests
The _micompm_ unit tests require the [MOxUnit] framework. Set the appropriate
path to this framework as specified in the respective instructions, `cd` into
the [tests] folder and execute the following instruction:
```
moxunit_runtests
```
The tests can take a few minutes to run.
## References
<a name="ref1"></a>
[\[1\]][ref1] Fachada N, Lopes VV, Martins RC, Rosa AC. (2017)
Model-independent comparison of simulation output. *Simulation Modelling
Practice and Theory*. 72:131–149. http://dx.doi.org/10.1016/j.simpat.2016.12.013
([arXiv preprint](http://arxiv.org/abs/1509.09174))
<a name="ref2"></a>
[\[2\]][ref2] Fachada N, Lopes VV, Martins RC, Rosa AC. (2015) Towards a
standard model for research in agent-based modeling and simulation. *PeerJ
Computer Science* 1:e36. https://doi.org/10.7717/peerj-cs.36
[ref1]: #ref1
[ref2]: #ref2
[NetLogo]: https://ccl.northwestern.edu/netlogo/
[PPHPC]: https://github.com/fakenmc/pphpc
[pphpc_netlogo]: https://github.com/fakenmc/pphpc/tree/netlogo
[pphpc_java]: https://github.com/fakenmc/pphpc/tree/java
[siunitx]: https://www.ctan.org/pkg/siunitx
[ulem]: https://www.ctan.org/pkg/ulem
[multirow]: https://www.ctan.org/pkg/multirow
[booktabs]: https://www.ctan.org/pkg/booktabs
[startup]: ../startup.m
[grpoutputs]: ../micompm/grpoutputs.m
[cmpoutput]: ../micompm/cmpoutput.m
[micomp]: ../micompm/micomp.m
[micomp_show]: ../micompm/micomp_show.m
[cmpassumptions]: ../micompm/cmpassumptions.m
[helper]: ../helpers
[3rd party]: ``../3rdparty
[micompr]: https://github.com/fakenmc/micompr
[R]: https://www.r-project.org/
[Matlab]: http://www.mathworks.com/products/matlab/
[MATLAB Statistics toolbox]: https://www.mathworks.com/products/statistics.html
[Octave]: https://gnu.org/software/octave/
[Octave statistics package]: https://octave.sourceforge.io/statistics/
[tests]: ../tests
[MOxUnit]: https://github.com/MOxUnit/MOxUnit
[MANOVA]: https://en.wikipedia.org/wiki/Multivariate_analysis_of_variance
[_t_-test]: https://en.wikipedia.org/wiki/Student%27s_t-test
[Mann-Whitney _U_ test]: https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
[ANOVA]: https://en.wikipedia.org/wiki/Analysis_of_variance
[Kruskal-Wallis test]: https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance
[Bonferroni]: https://en.wikipedia.org/wiki/Bonferroni_correction
[Shapiro-Wilk]: https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test
[Royston]: https://www.jstor.org/stable/2347291
[Box's M]:https://en.wikiversity.org/wiki/Box%27s_M
[Bartlett's]: https://en.wikipedia.org/wiki/Bartlett%27s_test