Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to solve that will dump the solution. #42

Closed
wants to merge 4 commits into from

Conversation

josiahjohnston
Copy link
Contributor

Add an option to solve that will dump the solution. I find inspection of small example problems a useful tool during development.

@bmaluenda
Copy link
Member

I like the idea of dumping results and the instance to the terminal for development.
I have 2 comments:
-We should eventually write a more precise way of dumping the instance so only populated sets and parameters are printed. This would reduce the amount of information dumped so inspection is easier.
-The results.write() method only gives me information on the solver process and not the actual results. The instance.display() function worked for me in that regard, to dump variable values at the optimum, the objective function's value and information on the constraints. I tried with both GLPK and Gurobi and had the same issue.
Does it work the same way for you? In that case, maybe it's better to use the display() function to dump variable values as well.

@josiahjohnston
Copy link
Contributor Author

I have mixed feelings about the first comment. I agree on the need to streamline the process of completely comparing results from two runs, and instance.pprint() is adequate, but not optimized for that review. I almost always use it in conjunction with diff to highlight the changes relative to a reference model.
Knowing which sets are empty can be an important part of inspection, so I'm a little hesitant to remove them from the dump. If I use a diff tool to compare two results, then the empty sets that they have in common won't be highlighted and won't really slow me down.

Yes, results.write only gives summary info about the solution, not detailed info on decision variable values. In the past, I've used instance.pprint() rather than instance.display(). I just compared the two and pprint() provides more details than display(). display() shows variables, the objective function value, and constraints. pprint() shows all that plus sets, parameters, expressions, BuildChecks (along with equations for each), as well as a comprehensive list of model components. For brevity display() is better, but for a comprehensive check, pprint() wins out.

I actually don't like it dumping to the screen, and would prefer all of that output to go to a log file. I didn't have time to figure out how to do that cleanly, but I wanted to add a partial solution that was adequate.

@mfripp
Copy link
Member

mfripp commented Aug 23, 2016

Maybe it would be best to change --dump to something like --show-solver-results, and give it the limited functionality of results.write().

Then if people want to run instance.pprint(), they can use a custom reporting module. A very simple reporting module would look like this:

def post_solve(m, outputs_dir):
    m.pprint()

Adding this module's name to modules.txt will automatically create the extra output.

Alternatively, we could write a standard reporting module, save_results.py, and have it provide the arguments to control instance.pprint() (possibly saving results to a file), among other things. This module could also hold the code currently in utilities._save_generic_results(), and maybe provide some arguments for controlling that.

On a partially related note:

It would be nice if this function and _save_generic_results() could associate results files with m.options.scenario_name somehow. I usually append the scenario name to the output file, but another option would be to have a separate outputs subdirectory for each scenario (takes a little more digging to inspect the results).

Also, eventually I'd like to have a general-purpose tabular reporting system, where users can call a helper function to register any expressions that they want reported, along with the indexes and (optional) filename to use for them. Then the reporting module would create a single output file for each specified filename and/or indexing set, and would automatically create a column for each expression that's been assigned to that file (there would probably also be some standard expressions registered automatically). I know tables like this aren't great for saving results in a database, but lately I find it easiest to pull data directly from the output files into Excel or Pandas, and this works very well.

josiahjohnston and others added 2 commits August 24, 2016 15:20
… of small example problems a useful tool during development.
… verbose dump to most of the examples and updated their output files. A full dump of prior example solutions could be useful for debugging. I didn't dump the 3zone toy results because that dump files was 432K, and I didn't want to clog up the repo.
@josiahjohnston
Copy link
Contributor Author

I like the features you are describing Matthias. Maybe we can paste those into a new issue so the idea will stick around after this pull request is closed.

I moved that functionality into a custom export module and updated most of the examples to include a full dump. Any comments on this version?

@mfripp
Copy link
Member

mfripp commented Aug 25, 2016

A couple of questions about the custom export modules:

At one point I abandoned utilities.save_results() and module.save_results(), in favor of a more general post_solve() function. But I see that someone has added code to solve.main() to call save_results(). I think this arrangement is redundant (and the two now both call utilities._save_generic_results()). So I would recommend moving the code from module.save_results() to module.post_solve() and dropping utilities.save_results() and utilities._save_results(). The main difference I see is that save_results() passes a results object and post_solve() doesn't. If this object is actually needed outside of solve.solve(), we could add a line to solve.solve() like model.results = results. Then other modules could access it when their post_solve() is called. (We could delete the model.results object before solving, to avoid increasing the memory footprint for iterated models.)

Would it be better to exclude the dump files from the git repository? I usually omit outputs from the repository to save space (and they're making it a little hard to make sense of this commit).

I would also recommend moving the results dump code from solve.main() to solve.post_solve(), if it's not moved into a completely separate reporting module.

@mfripp
Copy link
Member

mfripp commented Aug 25, 2016

I just copied the general reporting suggestions into a couple of new issues. So we can close this pull request once the dumping code is finalized without fear of losing them.

@bmaluenda
Copy link
Member

I think I prefer the alternative of having a single dumping module in the export directory which accepts command line arguments as options. The user could be able to specify the desired amount of output:

  • Only solver information (results.write)
  • Model instance key information (instance.display)
  • Complete information on the instance (instance.pprint)
    With each option implying the one before; in a hierarchical order.

Also, I would include an option to have the dump printed additionally on the terminal if wanted. This could be achieved with the Logging class. This feature would be useful when working with toy inputs sets while developing, for quick eye inspection.

As for your last commit, I agree that the dump files could be kept out of the repository, so we don't experience difficulties reading through the commits (we would have hundreds of changed lines every time we modified an example or added a new one). If we finally go with the alternative of having those 2 modules (dump_verbose and dump_concise) in the export directory, then their documentation should be a little more specific, since both are identical, but they call a different function.

Regarding the last issue, I noticed the same conflict between save_results() and post_solve(). Would there be any reason to require passing the results object to the output processing function? When writing output files, I have just been querying the instance itself with the write_table() function and custom code. The only use case I can think of would be using the results.write() function if we go with the alternative of having a single module which accepts the options I described at the beginning.

Besides that, I think that the post_solve() name is better, since it may involve more actions than just saving results to files. I have some code to process results and draw & save plots in my exporting module, which strictly speaking is not only "saving results", but also analyzing them ("post-solve processing").

@mfripp
Copy link
Member

mfripp commented Aug 25, 2016

This sounds like a good plan. We could do this with something like--display-results [1-3].

If we assume this output should always go to stdout or a log file which is capturing stdout, then we don't need to add any more options. If we might want to write the results of these commands to some standard file instead of stdout/logfile, then it becomes trickier. We could handle that by adding a parallel argument like --save-results [1-3]. (This could get confused with the standard code for saving results in data files, but I can't think of a better name for it.)

I think I renamed the callback from save_results() to post_solve() because these functions may save or display results. If they might also be used for more detailed analysis, then that's another step in this direction. post_solve() also gives a nice symmetry with the pre_solve() callback, which is called just before the model is solved, and pre_iterate() and post_iterate() which are called during each iteration of iterated models.

@bmaluenda
Copy link
Member

I will work on standardizing the post_solve() function as soon as I can.

I also want to add an --output-times option to print processing times for different parts of the code, such as model instantiation, solving, post solving and others. This is a quick way of checking timings without having to profile (could be useful for new users). The PySP module has something similar and it's handy.

@josiahjohnston
Copy link
Contributor Author

I'm ok with migrating from save_results() to post_solve(). I like the clarity of the former name, but see how post_solve() can sound more like a catch-all. At some point, it could be useful to add two separate layers for post-processing and exporting results (numerical data, summaries, figures, etc). If you are working iteratively, you may want to do post-processing between runs to nab some key summary values, but only export results periodically or after final completion.

At the moment, I don't have a good use case for splitting these into separate functions, so I'm fine with a wait-and-see approach to adding complexity. Alternately, if it's easier to clean the codebase by keeping post-processing separate from export (i.e. saving numeric and graphical files in a results directory), I'm fine with that too.

Benjamin, are you volunteering for some of the work on that? If so, thank you! :)
I also like your idea about integrating profiling that is easy to use.

I'll cut the dump files out of the repo. I'd be fine with cutting all of the results except the objective function value if y'all want, although I would put that in a separate commit.

Regarding the interface, I'm cool with merging into one dump module with a command line option. I'd rather have the output of instance.pprint separate from instance.display because display is a subset of pprint, so having 1-3 in additive order isn't so nice for me. I also don't see anyone caring much about the results.write() output by itself. We could do a --dump-level=[1-2], where 1 is instance.display() and 2 is instance.pprint()

I'll post an implementation of that when I get a chance, hopefully later tonight.

…le. Command line options control the level of detail and whether the output is also sent to stdout.
@bmaluenda
Copy link
Member

I like it, seems good to go in my opinion.

As a side note: the copyright header still lists the year 2015. Shouldn't that be changed to 2016 in all scripts? I'm not sure if that is how licenses work.

josiahjohnston added a commit that referenced this pull request Sep 7, 2016
…le. Command line options control the level of detail and whether the output is also sent to stdout. Pull request #42
@josiahjohnston
Copy link
Contributor Author

OK, I just pushed the simple switch_mod.export.dump solution. Thanks for the reviews and conversation :)

Benjamin, I'm not sure about the copyright header .. I've seen people list multiple years or a range of years there, like "Copyright 2015-2016". If we could do "Copyright 2015-present", that would save having to redo it later, but I don't if that's legit.

@josiahjohnston josiahjohnston deleted the add_dump_option branch September 7, 2016 23:05
staadecker pushed a commit that referenced this pull request Jan 28, 2023
staadecker pushed a commit that referenced this pull request Jan 28, 2023
staadecker pushed a commit that referenced this pull request Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants