Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Simulation API #522

Open
wants to merge 24 commits into
base: master
from

Conversation

@lrennels
Copy link
Collaborator

commented Jun 10, 2019

for Issue #521

PSA on the Forum: https://forum.mimiframework.org/t/psa-simulation-api/69

@grantmcdermott I tagged the MCS users I know of on the forum, but I don't think you've used it yet so I couldn't tag you ... take a look!

@lrennels lrennels changed the title Improved Simulation API WIP Improved Simulation API Jun 10, 2019

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2019

Some Notes:

  • We have two new types: Simulation has been renamed to SimulationDef, and then we have the new type SimulationInstance returned by run. These are referred to as sim_def and sim_inst as function args, and sd and si as local vars. This all mimics our use of ModelInstance and ModelDef.
  • Moved models and results from SimulationDef out into SimulationInstance which holds a copy of the Simulation (which is mutated a bit with trials etc.) as well a the models vector and results Dictionary.
  • Removed the trials and models_to_run keyword args from the run function, since they do not make as much sense with the new API. If they become desirable we can discuss reimplementing.
  • Changed callback function signatures to take SimulationInstance , not a SimulationDef in their arguments. This way they can access the models vector held by SimulationInstance .
  • SimulationInstance{T} is currently parameterized by <: AbstractSimulationData, but could be parameterized by <:SimulationDef{T1} where T1 <: AbstractSimulationData
  • run now has a keyword argument results_in_memory::Bool = true so that unless it is set to false results will be stored in the results Dictionary, and in the results_output_dir if one is specified.

Post 6/11 Meeting:

  • I have preserved the type for results::Vector{Dict{Tuple, DataFrame}} and optionally add an extra column to the DataFrame which is a String of the tuple representing the scenario. This differs from the original plan to use a 3D matrix, but for now makes less alterations. I added another issue to handle this optimization in the next iteration #525.
  • Since the results of a SimulationInstance are internally stored as a DataFrame, calling getindex on a Simulation instance is defined as
function Base.getindex(sim_inst::SimulationInstance, comp_name::Symbol, datum_name::Symbol; model::Int = 1)

and returns the appropriate DataFrame. This can be further filtered by the user using Query etc. I did not implement getdataframe(sim::SimulationInstance ...) since that seemed redundant.

  • if we have results_in_memory = false, how often do we want to reset_results!? Right now it's after every trial which might be too much overhead?
  • I have implemented streaming of results to CSV files using CSVFiles.savestreaming. The old folder structure has been maintained, but files are now saved as comp_var.csv in anticipation of needing uniqueness in new iterations of Mimi ... so structure is now scen_name\model_index\datum_key.csv. If there is only one model, or no scenario, that level of folder structure is omitted.

6/18:

  • moved trials, current_trial, and current_data to SimulationInstance from SimulationDef, since this is not information that has to do with the def and allows the sim_def attribute of the SimulationInstance to remain unchanged.

@lrennels lrennels referenced this pull request Jun 10, 2019

Open

Simulation API (Consolidated) #521

3 of 3 tasks complete
@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2019

I now think SimulationResults is not the best description of what we have here. Two things bothered me as I read through Lisa's (otherwise excellent!) changes:

  1. SimulationResults has a field results. The other elements in the struct (i.e., sim, models) are not "results", so there's a concept/name mismatch here.

  2. This is exemplified by calling the pre_trial_func with SimulationResults. That seems wrong, doesn't it? Sure, the struct provides access to a copy of the Simulation, but see #1. ;~)

Naming is hard, of course. Should we rename the old Simulation => SimulationDef, and the current SimulationResults => SimulationInstance in keeping with the rest of Mimi?

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2019

Ah interesting I see the issue there, I would be ok with SimulationDef and SimulationInstance, that seems consistent. I also really didn't like the callback functions being called on SimulationResults but also didn't want to stick models into the Simulation, so perhaps this helps to clear up that problem.

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 11, 2019

From 6/11 on Results Saving and Streaming

  • To save results in files, create a tree structure of scenario --> model --> variable like we do now. Right now we have an error here where trials are overwritten, which is a problem in _store_trial_results, so we need to fix that and then implement streaming with CSVFiles.save_streaming(filename, file) and CSVFiles.save_streaming(stream, file) ... see tests for information on how to do this.
  • To save results internally, use a multi-dimensional array with an extra dimension for model or scenario
  • To access results internally we can write getindex and getdataframe to flatten out the above.
@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented on src/mcs/montecarlo.jl in f2cc944 Jun 13, 2019

You might reduce redundancy a bit by distilling the part that depends on scen_name:

pairs = Any[typeof(value) => value, :trialnum => trialnum]
has_scen && push!(pairs, :scen => scen_name)
trial_df = DataFrame(pairs)

This comment has been minimized.

Copy link
Collaborator Author

replied Jun 13, 2019

Good idea!

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented on src/mcs/montecarlo.jl in f2cc944 Jun 13, 2019

if results_output_dir !== nothing

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented on src/mcs/montecarlo.jl in f2cc944 Jun 13, 2019

streams = Dict{String, IOStream} assigns the type. You need streams = Dict{String, IOStream}()

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented on src/mcs/montecarlo.jl in f2cc944 Jun 13, 2019

I think we should rename this _save_trial_results since it's only called from _store_trial_results and can't really be called outside all the context built up here.

lrennels added some commits Jun 13, 2019

@lrennels lrennels requested a review from rjplevin Jun 13, 2019

results_df = DataFrame([typeof(value), Int, String], [datum_name, :trialnum, :scen], 0)
else
results_df = DataFrame([typeof(value), Int], [datum_name, :trialnum], 0)
end
results[datum_key] = results_df
end

This comment has been minimized.

Copy link
@rjplevin

rjplevin Jun 13, 2019

Collaborator

This is probably a matter of taste, but in my experience redundancy always (eventually) creates more work because you have to change multiple things rather than one. I'd write this as follows, which slightly longer but without redundancy:

types = [typeof(value), Int]
names = [datum_name, :trialnum]
if has_scen
   push!(types, String)
   push!(names, :scen)
end
results_df = DataFrame(types, names, 0)
results[datum_key] = results_df
if haskey(streams, filename)
CSVFiles.savestreaming(streams[filename], trial_df)
else
streams[filename] = CSVFiles.savestreaming(filename, trial_df)
end

This comment has been minimized.

Copy link
@rjplevin

rjplevin Jun 13, 2019

Collaborator

This should work without the CSVFiles. before savestreaming. Without specifying CSVFiles, we can optionally save to any recognized file format that supports streaming.

# Initiate the SimulationInstance and set the models and trials for the copied
# sim held within sim_inst
sim_inst = SimulationInstance{typeof(sim_def.data)}(sim_def)
set_models!(sim_inst, models)

This comment has been minimized.

Copy link
@rjplevin

rjplevin Jun 13, 2019

Collaborator

If you define SimulationInstance in terms of a type T, you can avoid this explicit call to typeof(sim_def.data). For example:

julia> struct SimDef{T}
         data::T
         end

julia> struct SimInst{T}
         sim_def::SimDef{T}
         end

julia> sd = SimDef((10, 20))
SimDef{Tuple{Int64,Int64}}((10, 20))

julia> SimInst(sd)
SimInst{Tuple{Int64,Int64}}(SimDef{Tuple{Int64,Int64}}((10, 20)))

This comment has been minimized.

Copy link
@rjplevin

rjplevin Jun 14, 2019

Collaborator

Yes, and I think you can drop the where clause from

sim_def::SimulationDef{T} where T <: AbstractSimulationData

since the constructor enforces this.

This comment has been minimized.

Copy link
@lrennels

lrennels Jun 14, 2019

Author Collaborator

Ah ok, I tried this and got some complaints about my SimulationInstance call to new but I'll play with it a bit more, the elusive {T} syntax ...

This comment has been minimized.

Copy link
@rjplevin

rjplevin Jun 14, 2019

Collaborator

You might need new{T}(args...)

@rjplevin
Copy link
Collaborator

left a comment

A few minor changes, then it should be ready to go.

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 14, 2019

@davidanthoff we will be ready to merge this, and then merge the changes to the models and rebuild Mimi so all dependency tests pass, after CSVFiles savestreaming PR is merged so order of operations:

  1. merge @rjplevin 's savestreaming branch in CSVFiles
  2. merge this PR (I can do that or you can)
  3. merge simapi PRs for models
  4. tag new Mimi release (I can do that or you can), which will rebuild and tag the new docs etc.
@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2019

Nice work! I appreciate your attention to detail and process.

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 14, 2019

aka minor but useful OCD

@lrennels lrennels changed the title WIP Improved Simulation API Improved Simulation API Jun 18, 2019

@lrennels lrennels requested a review from rjplevin Jun 19, 2019

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 19, 2019

@rjplevin I just moved trials, current_trial, and current_data to SimulationInstance from SimulationDef. Can you take a look at that particular commit, and specifically the changes to functions and iteration support to make sure it seems like the cleanest/most straightforward way to represent things? We could always add methods that take a sim_inst::SimulationInstance and call the function on the sim_inst.sim_def in some cases if it's cleaner.

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented Jun 19, 2019

Does this work?

IteratorInterfaceExtensions.getiterator(sim_inst::SimulationInstance) =
   SimIterator{sim_inst.sim_def.nt_type}(sim_inst)

I would expect the T is required, i.e.,

IteratorInterfaceExtensions.getiterator(sim_inst::SimulationInstance{T}) where T = 
    SimIterator{sim_inst.sim_def.nt_type, T}(sim_inst)

Or, we could define an "outside" constructor:

SimIterator(sim_inst::SimulationInstance{T}) where T = 
   SimIterator{sim_inst.sim_def.nt_type, T}(sim_inst)

and then simply:

IteratorInterfaceExtensions.getiterator(sim_inst::SimulationInstance) = SimIterator(sim_inst)

I don't remember why the iterator needs to be parameterized onNT, since this is always available as sim_inst.sim_def.nt_type. Could be an artifact. I'd say leave it for now and we can see about removing it later.

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 19, 2019

@rjplevin hmm neither option works clean off what you wrote, but I can dig a little deeper to see if one does provide a better way to simplify things

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented Jun 19, 2019

I guess the underlying MCSData type associated with sim_inst is carried through to the SimIterator. I expected the explicit version of this to work in any case. Is this pushed? Hard to say more without seeing the code.

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 19, 2019

The explicit version being the first option withT?

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 19, 2019

Just pushed that one, the failing tests start in test_defmcs.jl

@lrennels

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 20, 2019

Ah ok I think I figured it out, our SimIterator was written like this:

struct SimIterator{NT, T}
    sim_inst::SimulationInstance{T}

    function SimIterator{NT}(sim_inst::SimulationInstance{T}) where {NT <: NamedTuple, T <: AbstractSimulationData}
        return new{NT, T}(sim_inst)
    end
end

So if we add the explicit getiterator with T then we just need to change the constructor to

    function SimIterator{NT, T}(sim_inst::SimulationInstance{T}) where {NT <: NamedTuple, T <: AbstractSimulationData}
        return new{NT, T}(sim_inst)
    end

I did that and now it works!

@rjplevin

This comment has been minimized.

Copy link
Collaborator

commented Jun 20, 2019

Cool.

lrennels added some commits Jun 20, 2019

@lrennels lrennels requested a review from davidanthoff Jun 27, 2019

lrennels added some commits Jun 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.