Skip to content

Commit

Permalink
Merge pull request #14692 from aeslaughter/vpp-stats-ci-14409
Browse files Browse the repository at this point in the history
Statistics and confidence intervals of serial or distributed VPP vectors
  • Loading branch information
loganharbour committed Feb 19, 2020
2 parents 47b9a1b + 8c31cf7 commit 421fe11
Show file tree
Hide file tree
Showing 57 changed files with 1,453 additions and 365 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
*.toc
*.snm
*.csv
*.csv.*
*.dylib
*.so
*.so.*
Expand Down
10 changes: 10 additions & 0 deletions framework/doc/content/bib/moose.bib
Original file line number Diff line number Diff line change
Expand Up @@ -280,3 +280,13 @@ @article{zhao2019
adsurl = {https://ui.adsabs.harvard.edu/abs/2019arXiv190801027Z},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

@article{tibshirani1993introduction,
title={An introduction to the bootstrap},
author={Tibshirani, Robert J and Efron, Bradley},
journal={Monographs on statistics and applied probability},
volume={57},
pages={1--436},
year={1993},
publisher={Chapman and Hall New York}
}
Original file line number Diff line number Diff line change
@@ -1,71 +1,126 @@
# StatisticsVectorPostprocessor

## Short Description

!syntax description /VectorPostprocessors/StatisticsVectorPostprocessor

## Description

The `StatisticsVectorPostprocessor` computes statistical information for each column of another `VectorPostprocessor` (VPP). The results are output in columns corresponding to the column-names of the original VPP. The rows of each column are the statistical measures the `StatisticsVectorPostprocessor` was asked to compute. In addition, the first column is named `stat_type` and contains the unique identifier for the type of statistical measure computed in that row.
The `StatisticsVectorPostprocessor` computes statistical values for each vector of other
`VectorPostprocessor` (VPP) objects. The results are output in vectors that are assigned names
based on the VPP and vector name (object-vector) and each entry in the vector correspoinding
the the desired statistics and optionally confidence level intervals.

The first column, named `stat_type` and contains an unique integer identifier for the type of
statistical measure computed and the confidence levels, if computed.

## Statistics

The statistics to compute are indicated by the
[!param](/VectorPostprocessors/StatisticsVectorPostprocessor/compute) parameter, which can contain
multiple values as listed below. This list also includes the associated numeric identifier
that is included in the `stat_type` vector output the VPP object.

measures are chosen using the `stats` input parameter. Note that multiple
statistical measures can be computed simultaneously by passing in more than one to the `stats` input
parameter. The current statistical measures (and their `stat_type` identifier) the
`StatisticsPostprocessor` can compute are:

- +minimum (0)+

`compute = min`\\
Computes the minimum value for the supplied vectors.

- +maximum (1)+

`compute = max`\\
Computes the maximum value for the supplied vectors.

- +sum (2)+

`compute = sum`\\
Computes the sum ($\Sigma$) of the supplied vectors $\vec{v}$, where $N$ is the length of the vector:

!equation
\Sigma = \sum_{i=1}^N{v_i}

- +mean (3)+

`compute = average`\\
Computes the average ($\bar{v}$) of the supplied vectors $\vec{v}$:

!equation
\bar{v} = \frac{\sum_{i=1}^{N}{v_i}}{N}

- +standard deviation (4)+

`compute = stddev`\\
Computes the standard deviation ($\sigma$) of the supplied vectors $\vec{v}$:

The statistical measures are chosen using the `stats` input parameter. Note that multiple statistical measures can be computed simultaneously by passing in more than one to the `stats` input parameter. The current statistical measures (and their `stat_type` identifier) the `StatisticsPostprocessor` can compute are:
!equation
\sigma = \sqrt{\frac{\sum_{i=1}^{N}{(v_i - \bar{v})^2}}{N-1}}

### Min: 0
- +2-Norm (5)+

`stats = min`
`compute = norm2`\\
Computes the 2-norm, $|v|_2$ of the supplied vectors $\vec{v}$, this is also known as the
Euclidean Norm or the "distance":

The minimum value within the column.
!equation
|v|_2 = \sqrt{\sum_{i=1}^{N}{{v_i}^2}}

### Max: 1
- +standard error (6)+

`stats = max`
`compute = stderr`\\
Computes the standard error ($\sigma_{\bar{v}}$) of the supplied vectors $\vec{v}$:

The maximum value within the column
!equation
\sigma_{\bar{v}} = \frac{\sigma}{\sqrt{N}}

### Sum: 2

`stats = sum`
## Confidence Levels

The sum of the column
Bootstrap confidence level intervals, as defined by [!cite](tibshirani1993introduction), are enabled
by specifying the desired levels using the
[!param](/VectorPostprocessors/StatisticsVectorPostprocessor/ci_levels) parameter and setting
the method of calculation using the
[!param](/VectorPostprocessors/StatisticsVectorPostprocessor/ci_method).
The levels listed should be in the range (0, 0.5]. For example, the levels 0.05, 0.1, and 0.5 provided
result in the computation of the 0.05, 0.1, 0.5, 0.9, and 0.95 confidence level intervals.

\begin{equation}
\Sigma = \sum_{i=1}^N{Vi}
\end{equation}
Enabling the confidence level intervals will compute the intervals for each level and each statistic
and the result will appear in the same output vector as the associated statistic calculation. The
`stat_type` vector will include decimal values where the ones place indicates the statistic
computed and the decimal corresponds with the confidence level.

### Average: 3
The available methods include the following:

`stats = average`
- +precentile+: Percentile bootstrap method as defined in Ch. 13 of [!cite](tibshirani1993introduction).

The average (mean) of the column
## Example 1: Statistics

\begin{equation}
\bar{V} = \frac{\sum_{i=1}^{N}{V_i}}{N}
\end{equation}
The following input file snippet demonstrates how to compute various statistics using the
`StatisticsVectorPostprocessor` object.

### Standard Deviation: 4
!listing statistics_vector_postprocessor.i block=VectorPostprocessors

`stats = stddev`
This block results in the following CSV file for the "stats" block of the input file. Notice
the first column corresponds with the numeric identifier for the statistics being computed.

The standard deviation of the values
!listing statistics_vector_postprocessor/gold/statistics_vector_postprocessor_out_stats_0001.csv

\begin{equation}
\sigma = \sqrt{\frac{\sum_{i=1}^{N}{(V_i - \bar{V})^2}}{N-1}}
\end{equation}

### The 2-Norm (Eucliden Norm): 5
## Example 2: Confidence Levels

`stats = norm2`
The following input file snippet demonstrates how to compute various statistics and
confidence levels using the `StatisticsVectorPostprocessor` object

The 2-norm (also known as the Euclidean Norm or the "distance")
!listing bootstrap_statistics_vector_postprocessor/bootstrap.i block=VectorPostprocessors

\begin{equation}
|V|_2 = \sqrt{\sum_{i=1}^{N}{{V_i}^2}}
\end{equation}
This block results in the following CSV file for the "stats" block of the input file. Notice
the first column corresponds with the numeric identifier for the statistics being computed.

!listing bootstrap_statistics_vector_postprocessor/gold/bootstrap_out_stats_0001.csv

## Important Notes

Note that this VPP only computes on processor 0. This is because that's all that is necessary for output - and the VPP it uses for its values may be doing the same.

!syntax parameters /VectorPostprocessors/StatisticsVectorPostprocessor

Expand Down
4 changes: 2 additions & 2 deletions framework/doc/content/syntax/VectorPostprocessors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ getScatterVectorPostprocessorValue('the_vpp_parameter_name', 'the_vector_name')
`getScatterVectorPostprocessorValue()` returns a `const ScatterVectorPostprocessorValue &`... which is a single scalar value that you don't index into. Each process receives the "correct" value and can just directly use it.
If the data in a VPP is naturally replicated on all processors a VectorPostprocessor should set `_is_broadcast = true` in its `validParams()` like so:
If the data in a VPP is naturally replicated on all processors a VectorPostprocessor should set `_auto_broadcast = false` in its `validParams()` like so:
```c++
params.set<bool>("_is_broadcast") = true;
params.set<MooseEnum>("_auto_broadcast") = "false";
```

This tells MOOSE that the data is already replicated and there is no need to broadcast it if another object is asking for it to be broadcast.
Expand Down
11 changes: 6 additions & 5 deletions framework/include/outputs/CSV.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,9 @@ class CSV : public TableOutput
* Generates a filename pattern for Vectorpostprocessors
* filebase + VPP name + time step + ".csv"
*/
std::string getVectorPostprocessorFileName(const std::string & vpp_name, bool include_time_step);
std::string getVectorPostprocessorFileName(const std::string & vpp_name,
bool include_time_step,
bool is_distributed);

private:
/// Flag for aligning data in .csv file
Expand Down Expand Up @@ -106,13 +108,12 @@ class CSV : public TableOutput
bool _create_latest_symlink;

/// Current list of VPP filenames for creating _LATEST/_FINAL symlinks
// The pair is composed of the complete filename (foo_variable_0001.csv) and the incomplete name
// (foo_variable) to which the _FINAL or _LATEST is to be applied.
std::vector<std::pair<std::string, std::string>> _latest_vpp_filenames;
// The pair is composed of the complete filename (foo_variable_0001.csv), the incomplete name
// (foo_variable) to which the _FINAL or _LATEST is to be applied, and the "is_distributed" flag
std::vector<std::tuple<std::string, std::string, bool>> _latest_vpp_filenames;

/**
* Returns the filename without the time/timestep information.
*/
std::string getVectorPostprocessorFilePrefix(const std::string & vpp_name);
};

3 changes: 2 additions & 1 deletion framework/include/problems/FEProblemBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -917,7 +917,8 @@ class FEProblemBase : public SubProblem, public Restartable
VectorPostprocessorValue & declareVectorPostprocessorVector(const VectorPostprocessorName & name,
const std::string & vector_name,
bool contains_complete_history,
bool is_broadcast);
bool is_broadcast,
bool is_distributed);

/**
* Whether or not the specified VectorPostprocessor has declared any vectors
Expand Down
84 changes: 84 additions & 0 deletions framework/include/utils/BootstrapStatistics.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
//* This file is part of the MOOSE framework
//* https://www.mooseframework.org
//*
//* All rights reserved, see COPYRIGHT for full restrictions
//* https://github.com/idaholab/moose/blob/master/COPYRIGHT
//*
//* Licensed under LGPL 2.1, please see LICENSE for details
//* https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "MooseTypes.h"
#include "MooseObject.h"
#include <vector>

class MooseEnum;
class MooseEnumItem;
class MooseRandom;

namespace Statistics
{
class Calculator;
class BootstrapCalculator;

/*
* Return available bootstrap statistics calculators.
*/
MooseEnum makeBootstrapCalculatorEnum();

/*
* Create const Bootstrap confidence level interface calculator for use by VectorPostprocessor
* objects.
*/
std::unique_ptr<const BootstrapCalculator> makeBootstrapCalculator(const MooseEnum &,
const libMesh::ParallelObject &,
const std::vector<Real> &,
unsigned int,
unsigned int);

/*
* Base class for computing bootstrap confidence level intervals. These classes follow the same
* design pattern as those Statistics.h.
*/
class BootstrapCalculator : public libMesh::ParallelObject
{
public:
BootstrapCalculator(const libMesh::ParallelObject &,
const std::vector<Real> &,
unsigned int,
unsigned int);
virtual ~BootstrapCalculator() = default;

virtual std::vector<Real>
compute(const std::vector<Real> &, const Calculator &, const bool) const = 0;

protected:
std::vector<Real> shuffle(const std::vector<Real> &, MooseRandom &, const bool) const;

// Confidence levels to compute in range (0, 1)
const std::vector<Real> _levels;

// Number of bootstrap replicates
const unsigned int _replicates;

// Random seed for creating boostrap replicates
const unsigned int _seed;
};

/*
* Implement percentile method of Efron and Tibshirani (2003), Chapter 13.
*/
class Percentile : public BootstrapCalculator
{
public:
Percentile(const libMesh::ParallelObject &,
const std::vector<Real> &,
unsigned int,
unsigned int);

virtual std::vector<Real>
compute(const std::vector<Real> &, const Calculator &, const bool) const override;
};

} // namespace Statistics
16 changes: 15 additions & 1 deletion framework/include/utils/MooseRandom.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,21 @@ class MooseRandom
return mts_lrand(&(_states[i].first));
}

/**
* This method returns the next random number (long format) from the specified generator
* within the supplied range.
*
* @param lower lower bounds of value
* @param upper upper bounds of value
* @param i the index of the generator
* @return the next random number in the range [0,max(uinit32_t)) with 32-bit number
*/
inline uint32_t randl(std::size_t i, uint32_t lower, uint32_t upper)
{
mooseAssert(_states.find(i) != _states.end(), "No random state initialized for id: " << i);
return rds_iuniform(&(_states[i].first), lower, upper);
}

/**
* This method saves the current state of all generators which can be restored at a later time
* (i.e. re-generate the same sequence of random numbers of this generator
Expand Down Expand Up @@ -178,4 +193,3 @@ dataLoad(std::istream & stream, MooseRandom & v, void * context)
{
loadHelper(stream, v._states, context);
}

0 comments on commit 421fe11

Please sign in to comment.