Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multirun w/ "output_frequency" and "output_frequency_n" produces incomplete results #116

Closed
nkruskamp opened this issue May 31, 2021 · 15 comments · Fixed by #118
Closed

Comments

@nkruskamp
Copy link
Contributor

I am trying to use pops_multirun with a 10 year simulation.

I only want outputs every 5 years, so I set output_frequency=year and output_frequency_n=5.

However, I still get outputs for all 10 years of the simulation. This is not the expected behavior. I would expect output rasters to have only two layers, not 10.

A second issue is the results are incomplete spatially. See below for example: the gray area at the top of the region should have results. This does not occur when these parameters are left default.

image

@wenzeslaus
Copy link
Member

output_frequency_n is meant to be used with output_frequency="every_n_steps". The documentation is probably incomplete, but it should cover your use case.

@nkruskamp
Copy link
Contributor Author

nkruskamp commented Jun 1, 2021

I'm not sure I understand what you mean.

If I wanted an output every 5 years, what is the correct set of parameters to achieve that? This is what I currently tried:

output_frequency = "year",
output_frequency_n = 5,

The weather data is in weekly time steps, which is controlled by time_step = "week", so that is not related. Correct?

In either case, however, the results I'm getting in the example image are not usable and I'm not sure where that error is occurring.

@wenzeslaus
Copy link
Member

I don't know about the spatial error. That looks serious, but what you need for output is:

output_frequency = "every_n_steps",
output_frequency_n = x,

where x is a number of weeks in a year times 5 assuming you have time_step = "week".

In other words, output_frequency is generally dependent on time_step. Then, monthly, yearly, etc. are special cases. Finally, year is a synonym for yearly, month for monthly. So it works differently than for time step which is creating confusion.

The reason is that this was a reasonable way how to support the different use cases without making things too complicated. Maybe something to revisit, definitively something to document better.

@ChrisJones687
Copy link
Member

ChrisJones687 commented Jun 1, 2021

Also not sure about the spatial error. I will take a look @nfkruska is this happening at all scales and can you point me to the code for running this.

I opened an issue in pops-core for the output_frequncy pops-core #142.

@nkruskamp
Copy link
Contributor Author

nkruskamp commented Jun 2, 2021

@wenzeslaus I misunderstood and thought you were using "every_n_steps" as an alias for year month etc. I get it now! I will try to update my parameters and see what happens.

@ChrisJones687 I will need to do some more testing to 1) correct my use of the parameters and if that doesn't fix the issue, 2) see if scale is playing a role. The code is in the NCSU repo under 7c_pops_multirun.R

@nkruskamp
Copy link
Contributor Author

Actually it looks like every_n_steps isn't an accepted argument here. See here:

if (checks_passed && !(output_frequency %in% list(

But I'm a bit confused then on the logic of how to specify the output_frequency_n if output_frequency != time_step. Is that parameter still given in the measure of time_step?

@ChrisJones687
Copy link
Member

I was planning on adding a modifier for output_frequency and output_frequency_n so that it would do conversions for the user from time steps to other units. There was never a use case for it previously, and I forgot to add a To Do in the code so that I knew to come back to this.

I might do this in the scheduling core and then we don't have to solve it in all interfaces.

@nkruskamp
Copy link
Contributor Author

Where else might the parameter be used? I am trying to reduce the RAM usage on multirun since the extra outputs max out memory quickly.

@wenzeslaus
Copy link
Member

Hm, the rpops and PoPS Core handling of this is really different. "Number of units" in time step is always 1 (set_step_num_units(1)) (aka m below). Thanks to that, time_step in rpops works as every_n_steps in Core.

Perhaps the naming in Core is just bad, but you can already do output daily, weekly, monthly, yearly, each step, and each nth step. What you can't do now in Core is to combine different units in simulation step and output step while outputting only every nth of the whatever unit you used, e.g., you can't say "output every two months" with simulation step "day". However, you can do every nth day, week, month, or year, although that's only if your simulation step is (one) day, week, month, or year, respectively, i.e., the units need to match. If you simulation step is more than one of the time unit, e.g., m days, than your output can still be something meaningful, but you are constrained to n being multiple of m, e.g., simulation step two day with output every four days.

I think it is possible to allow for arbitrary combinations in Core, but it didn't seem worth the trouble so far. What might be sufficient is to allow specifying output frequency in the same units as the simulation units if the n is a multiple of m. For example, you could say "simulation step is 2 days and output frequency/step is 4 days", but saying "simulation step is 2 days and output frequency/step is 5 days" would fail and "simulation step is 30 days and output frequency/step is 2 months" would fail too. However, this seem like a pain to check this ahead of time in R consistently with the Core, but perhaps there should be exception catching code in src/pops.cpp to handle these cases.

@wenzeslaus
Copy link
Member

Where else might the parameter be used?

I'm not sure what you mean, but perhaps my other comment helps.

I am trying to reduce the RAM usage on multirun since the extra outputs max out memory quickly.

Well, it would defeat the whole point of convenience, but the r.pops.spread in GRASS does not store the outputs in memory, it outputs them to disk. Also the weather code there is ready for some potential memory consumption reduction.

@nkruskamp
Copy link
Contributor Author

What you can't do now in Core is to combine different units in simulation step and output step while outputting only every nth of the whatever unit you used

So it sounds like I need to set time_step = "week", output_frequency = "time_step", output_frequency_n = (5 * 52) to get an output every 5 years? That way I'm using a consistent unit throughout?

I see the convenience factor by giving some control over output_frequency separately from time_step but it does seem to introduce an extra step and confusion from the users side. I think from that perspective it's easier for the user to understand everything in a single unit and convert accordingly.

Where else might the parameter be used?

My assumption is in pops_multirun it is used to control how often a raster output is saved. So when I see defaults set as output_frequency = "year" and output_frequency_n = 1, I take that to mean a 10 year simulation with have 10 output layers, 1 for every year. What I was trying to do was set output_frequency_n = 5 so that a raster layer is saved only every 5 years. So my question here means, can this parameter be used to control that same output for other methods like validate where I could have the function save 5 and 10 year validation data. However, right now the validate function saves the output at the end of the whole duration of the simulation, so can I control that to avoid running validate with different duration?

re: GRASS

I'm not sure I can switch my workflow to GRASS right now, I'll need to see how much effort that might take. But currently there is a lot that won't run for me (see my other issue #115 ) that's been limiting on the data side of things. But, saving to disc and doing statistics at the end may need to be considered in the python and R implementations, because I think what is currently happening, especially for multirun, is 100 iterations of data x the output frequency is being held at once in addition to the input data. So for my 10 year simulation, that's 1000 2d arrays held in memory. I could be way off on my understanding of how that is actually working.

@ChrisJones687
Copy link
Member

As soon as a merge the PR it should work with.

time_step = "week", output_frequency = "every_n_steps", output_frequency_n = (5 * 52)

@nkruskamp
Copy link
Contributor Author

Thanks @ChrisJones687 !

@ChrisJones687 ChrisJones687 linked a pull request Jun 3, 2021 that will close this issue
@nkruskamp
Copy link
Contributor Author

This is my bad for opening what should probably be two issues in one, but the incomplete spatial results are still an issue. I changed 4 parameters at once, seasonality and output_frequency (ex below). I have since removed the output_frequency parameters but still get incomplete results. I will need to investigate where that issue is coming from.

  season_month_start = 2,
  season_month_end = 9,
  # output_frequency = "time_step",
  # output_frequency_n = (5 * 52),

@ChrisJones687
Copy link
Member

I have moved the spatial issue to a separate issue #119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants