-
Notifications
You must be signed in to change notification settings - Fork 0
2017 January Design Notes Subsample As Parameter
Task is completed, notes below we do keep just in case.
This is design notes, it is sketchy and may be incorrect, feel free to change it.
Currently we have one special model parameter: subsample number (a.k.a. member or replica). It is created by runtime as integer [0,N] where N is number of subsamples specified as run option:
model.exe -General.Subsamples 16
Subsample number plays fundamental role in calculation of model Output Expressions. It is only parameter which used to calculate average (CV, SE, SD and all others) output values. For example if model runs with 16 subsamples then it will produce 16 values for each output accumulator and output expression value is an average of 16 accumulators across subsamples.
It may not be always necessary to have subsample number as special parameter; it can be any other model parameter or set of parameters which varies between model runs. And output expression(s) can be calculated as average (CV, SD, etc.) across any parameter values. However such "demote of subsample number" is quite significant change in model runtime.
Currently model run cycle looks like (extremely simplified):
- start model.exe and connect to database
- read all model parameters
- create modeling threads for each model subsample
- run modeling threads: do simulation
- write output accumulators for each subsample in database
- wait until all subsamples done (wait for exit from all modeling threads)
- calculate output expression values as average (CV,SE,SD,etc.) of accumulators across subsamples
- report on simulation success and exit from model main
If we decide to "demote subsample" or call it as "generalize parameters" then modeling cycle can look like:
- use some external utility to create modeling task and prepare set of input parameter (see Model Run: How to Run the Model)
- (optional) specify runtime expression to vary some model parameters, e.g. subsample number parameter
- run model until modeling task completed (until all input processed) and write all accumulators into database
- use some external utility to calculate output expressions as average (CV,SE,SD,etc.) across any parameter(s)
Questions and problems:
- How to specify model parameters generators (how to calculate model parameters at runtime). Now we have ompp code translated into c++ by omc compiler to do all derived (model-generated) parameters. It is not dynamic enough - we don't want and should not re-compile model to specify parameter(s) generator. We also have primitive subsample number parameter generator as [0,N]. Such primitive for-loop generators may be good in many situations but not enough.
Is it enough to have an ability in model runtime specify for-loop parameter(s) generator(s) and rely on external utilities (i.e. use our R package) to create more complex modeling tasks?
- Output expressions calculations. Now we use SQL to calculate averages and, in fact, that SQL allow to have almost arbitrary calculation, but it does aggregation across subsample number.
How to generalize SQL to aggregate across any parameter values, not only subsample number? Do we need to replace SQL with c++ code in model runtime? Do we need to create other "db_aggregator" utility instead of using model?
- How to specify parameter generators and output expressions to make it powerful enough and avoid re-inventing of R (Octave, Matlab, SPSS, SAS)?
Let's assume some hypothetical model with following input parameters:
- population by age and sex
- taxation level
- election outcome
- workforce strike longevity
- random generator seed And model output value is household income.
Model input parameters can be divided in following categories:
- "constant": where parameter values are known and does not changed during modeling
- population current and projected values assumed to be well known and fixed for our model
- "variable": parameter(s) which user want to change to study effect on modeling output results
- taxation level varies from 1% to 80% with 0.1% step
- "uncertainty": parameters where values are random
- election outcome parameter: Bernoulli distribution (binary) with mean = 0.6
- workforce strike: Poisson distribution with rate = 4
- random number generator seed
In order to study taxation level effect user run the model 800 times with different tax percent input value and calculate 800 average household income output values. Each output income value is an average of 32 "accumulator" values. Each "accumulator" value is a household income value produced by the model for specific combination of "uncertainty" parameters:
// create 32 input tuples of uncertainty parameters
//
int setId = database.CreateWorkset(); // input set of uncertainty parameters
bool isBluePartyWin = false; // election results: win of "blue" or "red" party
double strikeDays = 7.5; // number of strike days per year
int randomSeed = 12345; // random number generator seed
for (int k = 0; k < 32; k++) {
isBluePartyWin = Bernoulli(0.6);
strikeDays = SumOf_Poisson(4.0);
seed++;
// write "uncertainty" parameters into database input set: tuple number = k
database.WriteParameters(setId, k, isBluePartyWin, strikeDays, randomSeed);
}
// run the model
//
for (double tax = 1; tax < 82; tax += 0.1) {
model.exe -Parameter.Taxation tax -UncertaintyParameters setId
}
//
// plot output household income depending on taxation level
//
Pseudo code above can be implemented in Perl, R or using shell script. Also openM++ already support Modeling Task which allow to submit multiple inputs to the model and vary parameter(s) values similar to example above.
OpenM++ already have most of components required for our solution, please take a look at:
Following can be done to solve a problem from example above:
-
Use existing: R API to create Modeling Task with 800 values of taxation level parameter.
-
Add new: Create tools to generate uncertainty parameters. It can be command-line utilities, GUI tool(s) or part of model runtime. Last option would allow us to reuse existing c++ code.
-
Add new: Change database schema in order to store tuples of uncertainty parameters as part of model run input. Currently model is using only single input set of parameters (workset) with single value of each parameter. We need to change database schema and model run initialization (input parameters search in database) in order to supply all 32 tuples of uncertainty parameters for every model run.
-
Add new: Change parameters memory management in order to provide unique value of each uncertainty parameter to each modeling thread. Now all parameters have only one copy of values and it is shared between all subsamples (threads and processes); only subsample number is unique and not shared between threads (see model run on single computer). And with new runtime we need to make sure only "constant" and "variable" parameters (like population and taxation level above) are shared and "uncertainty" parameters (election outcome, strike, random seed) are unique for each thread.
-
Add new: In case if model run on MPI cluster, when there are multiple modeling processes, we need to correctly supply unique values of all uncertainty parameters to each process. Now only subsample number is unique.
-
Add new: Change database schema similar to (3) above for model run parameters. Model run contains full copy of input parameters. Today it is only one value for each parameter and we need to change it in order to store all 32 tuples of uncertainty parameters in model run results.
-
Use existing: Model Output Expressions for output results aggregation. No changes required. We not yet have capabilities to compare model run results similar to what ModgenWeb does, but this is out of problem scope.
We can split implementation into two steps:
- First do all necessary run time changes (items 3, 4, 5 and 6 above). That would allow us to run the model with uncertainty parameters created by external tools, for example by R.
- Second is to implement "parameters generators" (item 2 above) to make it convenient to model user.
During that two steps process it is also necessary to implement some compatibility logic to supply parameter "Subsample" in order to keep existing models working.
Note:
We should also solve ambiguity of "subsample" term, inherited from Modgen. It can be a model integer parameter with name "Subsample" and in that case
it same as any other model parameter, no any kind of special meaning or treatment required. It is also can be used as "uncertainty tuple number"
and may not be necessary exposed to modeling code, it can be internal to model runtime and visible in database schema as sub_id
to order accumulator values
and make it comparable between model runs.
- Windows: Quick Start for Model Users
- Windows: Quick Start for Model Developers
- Linux: Quick Start for Model Users
- Linux: Quick Start for Model Developers
- MacOS: Quick Start for Model Users
- MacOS: Quick Start for Model Developers
- Model Run: How to Run the Model
- MIT License, Copyright and Contribution
- Model Code: Programming a model
- Windows: Create and Debug Models
- Linux: Create and Debug Models
- MacOS: Create and Debug Models
- MacOS: Create and Debug Models using Xcode
- Modgen: Convert case-based model to openM++
- Modgen: Convert time-based model to openM++
- Modgen: Convert Modgen models and usage of C++ in openM++ code
- Model Localization: Translation of model messages
- How To: Set Model Parameters and Get Results
- Model Run: How model finds input parameters
- Model Output Expressions
- Model Run Options and ini-file
- OpenM++ Compiler (omc) Run Options
- OpenM++ ini-file format
- UI: How to start user interface
- UI: openM++ user interface
- UI: Create new or edit scenario
- UI: Upload input scenario or parameters
- UI: Run the Model
- UI: Compare model run results
- UI: Aggregate and Compare Microdata
- UI: Filter run results by value
- UI: Disk space usage and cleanup
- UI Localization: Translation of openM++
- Authored Model Documentation
- Built-in Attributes
- Censor Event Time
- Create Import Set
- Derived Tables
- Entity Attributes in C++
- Entity Function Hooks
- Entity Member Packing
- Entity Tables
- Enumerations
- Events
- Event Trace
- External Names
- Generated Model Documentation
- Illustrative Model
Align1
- Lifecycle Attributes
- Local Random Streams
- Memory Use
- Microdata Output
- Model Code
- Model Documentation
- Model Languages
- Model Localization
- Model Metrics Report
- Model Resource Use
- Model Symbols
- Parameter and Table Display and Content
- Population Size and Scaling
- Symbol Labels and Notes
- Tables
- Test Models
- Time-like and Event-like Attributes
- Use Modules
- Weighted Tabulation
- File-based Parameter Values
- Oms: openM++ web-service
- Oms: openM++ web-service API
- Oms: How to prepare model input parameters
- Oms: Cloud and model runs queue
- Use R to save output table into CSV file
- Use R to save output table into Excel
- Run model from R: simple loop in cloud
- Run RiskPaths model from R: advanced run in cloud
- Run RiskPaths model in cloud from local PC
- Run model from R and save results in CSV file
- Run model from R: simple loop over model parameter
- Run RiskPaths model from R: advanced parameters scaling
- Run model from Python: simple loop over model parameter
- Run RiskPaths model from Python: advanced parameters scaling
- Windows: Use Docker to get latest version of OpenM++
- Linux: Use Docker to get latest version of OpenM++
- RedHat 8: Use Docker to get latest version of OpenM++
- Quick Start for OpenM++ Developers
- Setup Development Environment
- 2018, June: OpenM++ HPC cluster: Test Lab
- Development Notes: Defines, UTF-8, Databases, etc.
- 2012, December: OpenM++ Design
- 2012, December: OpenM++ Model Architecture, December 2012
- 2012, December: Roadmap, Phase 1
- 2013, May: Prototype version
- 2013, September: Alpha version
- 2014, March: Project Status, Phase 1 completed
- 2016, December: Task List
- 2017, January: Design Notes. Subsample As Parameter problem. Completed
GET Model Metadata
- GET model list
- GET model list including text (description and notes)
- GET model definition metadata
- GET model metadata including text (description and notes)
- GET model metadata including text in all languages
GET Model Extras
GET Model Run results metadata
- GET list of model runs
- GET list of model runs including text (description and notes)
- GET status of model run
- GET status of model run list
- GET status of first model run
- GET status of last model run
- GET status of last completed model run
- GET model run metadata and status
- GET model run including text (description and notes)
- GET model run including text in all languages
GET Model Workset metadata: set of input parameters
- GET list of model worksets
- GET list of model worksets including text (description and notes)
- GET workset status
- GET model default workset status
- GET workset including text (description and notes)
- GET workset including text in all languages
Read Parameters, Output Tables or Microdata values
- Read parameter values from workset
- Read parameter values from workset (enum id's)
- Read parameter values from model run
- Read parameter values from model run (enum id's)
- Read output table values from model run
- Read output table values from model run (enum id's)
- Read output table calculated values from model run
- Read output table calculated values from model run (enum id's)
- Read output table values and compare model runs
- Read output table values and compare model runs (enun id's)
- Read microdata values from model run
- Read microdata values from model run (enum id's)
- Read aggregated microdata from model run
- Read aggregated microdata from model run (enum id's)
- Read microdata run comparison
- Read microdata run comparison (enum id's)
GET Parameters, Output Tables or Microdata values
- GET parameter values from workset
- GET parameter values from model run
- GET output table expression(s) from model run
- GET output table calculated expression(s) from model run
- GET output table values and compare model runs
- GET output table accumulator(s) from model run
- GET output table all accumulators from model run
- GET microdata values from model run
- GET aggregated microdata from model run
- GET microdata run comparison
GET Parameters, Output Tables or Microdata as CSV
- GET csv parameter values from workset
- GET csv parameter values from workset (enum id's)
- GET csv parameter values from model run
- GET csv parameter values from model run (enum id's)
- GET csv output table expressions from model run
- GET csv output table expressions from model run (enum id's)
- GET csv output table accumulators from model run
- GET csv output table accumulators from model run (enum id's)
- GET csv output table all accumulators from model run
- GET csv output table all accumulators from model run (enum id's)
- GET csv calculated table expressions from model run
- GET csv calculated table expressions from model run (enum id's)
- GET csv model runs comparison table expressions
- GET csv model runs comparison table expressions (enum id's)
- GET csv microdata values from model run
- GET csv microdata values from model run (enum id's)
- GET csv aggregated microdata from model run
- GET csv aggregated microdata from model run (enum id's)
- GET csv microdata run comparison
- GET csv microdata run comparison (enum id's)
GET Modeling Task metadata and task run history
- GET list of modeling tasks
- GET list of modeling tasks including text (description and notes)
- GET modeling task input worksets
- GET modeling task run history
- GET status of modeling task run
- GET status of modeling task run list
- GET status of modeling task first run
- GET status of modeling task last run
- GET status of modeling task last completed run
- GET modeling task including text (description and notes)
- GET modeling task text in all languages
Update Model Profile: set of key-value options
- PATCH create or replace profile
- DELETE profile
- POST create or replace profile option
- DELETE profile option
Update Model Workset: set of input parameters
- POST update workset read-only status
- PUT create new workset
- PUT create or replace workset
- PATCH create or merge workset
- DELETE workset
- POST delete multiple worksets
- DELETE parameter from workset
- PATCH update workset parameter values
- PATCH update workset parameter values (enum id's)
- PATCH update workset parameter(s) value notes
- PUT copy parameter from model run into workset
- PATCH merge parameter from model run into workset
- PUT copy parameter from workset to another
- PATCH merge parameter from workset to another
Update Model Runs
- PATCH update model run text (description and notes)
- DELETE model run
- POST delete model runs
- PATCH update run parameter(s) value notes
Update Modeling Tasks
Run Models: run models and monitor progress
Download model, model run results or input parameters
- GET download log file
- GET model download log files
- GET all download log files
- GET download files tree
- POST initiate entire model download
- POST initiate model run download
- POST initiate model workset download
- DELETE download files
- DELETE all download files
Upload model runs or worksets (input scenarios)
- GET upload log file
- GET all upload log files for the model
- GET all upload log files
- GET upload files tree
- POST initiate model run upload
- POST initiate workset upload
- DELETE upload files
- DELETE all upload files
Download and upload user files
- GET user files tree
- POST upload to user files
- PUT create user files folder
- DELETE file or folder from user files
- DELETE all user files
User: manage user settings
Model run jobs and service state
- GET service configuration
- GET job service state
- GET disk usage state
- POST refresh disk space usage info
- GET state of active model run job
- GET state of model run job from queue
- GET state of model run job from history
- PUT model run job into other queue position
- DELETE state of model run job from history
Administrative: manage web-service state
- POST a request to refresh models catalog
- POST a request to close models catalog
- POST a request to close model database
- POST a request to open database file
- POST a request to cleanup database file
- GET the list of database cleanup log(s)
- GET database cleanup log file(s)
- POST a request to pause model run queue
- POST a request to pause all model runs queue
- PUT a request to shutdown web-service