Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Trouble adding boundary penalties in matrix format #92
I am finding troubles addint boundary penalties to my conservation problem using matrix format (dgCMatrix). The problem is built with features as a matrix, and it is possible to solve normally before adding these penalties. However, after boundary penalties are added, the problem becomes infeasible.
This is how my problem looks when printed before adding the boundary penalties:
Notice that planning units and features were used in matrix format to build the problem. I am able to solve the problem up to this point.
However, I would like to
I built the matrix of boundaries using the boundary_matrix() function on a raster from which I made the matrix of planning units. I noticed that this function on a raster does not ignore NA, so I needed to remove rows and columns in the resulting matrix to leave only those that match positions of pixels that are not NA. Like this:
I haven't seen too many of these, but the matrix looks good to me. This is a fragment:
However, using this matrix makes the problem unfeasible:
Later, I tried modifying a bit further the matrix, following what the function
But the result was exactly the same.
Do you see something wrong in the way I am calculation the matrix of boundaries? Is the problem somewhere else?
Any help is much appreciated. And thanks a lot for all the development in the package, it's being great to use!
PD: I had to build the problem using a self-constructed dgCMatrix for features because the number of species was too large, and the problem did not finished building itself in raster format after 3 weeks.
Thanks for getting in touch. I can't see anything immediately incorrect with the code you've posted, so I'm going to need a bit more information in order to reproduce it and find a fix. Could you please try running your code using the built-in planning unit and feature data (
Hi and thanks a lot for offering the help!
I tried your suggestion and I was able to solve the problem with boundary penalties for the example data using my code to make the
Can you think of any case where adding boundary penalties makes a minimum set objective problem unfeasible? If I am thinking it right, penalties could make this type of problems expensive, but not unfeasible...
I still think that the problem is in my boundary matrix, because the problem can be solved before adding boundary penalties. I have tried converting my planning units from raster to polygon (with
The most surprising thing for me was that adding penalties with a penalty of 0 also made the problem unfeasible. Wouldn't it be equivalent not to use penalties and using a 0 penalty?
PD: I used
No worries. I'm sorry you're having this problem with prioritizr.
Yeah, that's strange. The boundary penalties should never make a feasible problem become infeasible.
What happens when you remove the locked in and locked out constraints but leave in the boundary penalties?
What happens when you run it with 10 species? If you still have the problem with 10 species, would you mind sending me data for the planning units and the first 10 species, so I can track down what the issue is?
Yeah, when you specify a penalty of zero this means that the penalties should not be applied - this is bizarre.
Which version of prioritizr are you running? If you haven't already, could you please try it with the latest version of prioritizr on GitHub?
Ah ok, it looks like we might need to try skipping more things when
Great, and thanks again!
So, I found out new things. With my new testing, I am thinking that the problem is not in the boundary penalties anymore. Instead, it seems related with a combination in locked-out constraints, the random "id" given to newly declared conservation problems and, weirdly, something changing in some environment (gurobi's options?). Does that make any sense?
The combination failing in my tests is: a conservation problem (with a certain random and internally generated "id") with locked-out constraints that is being solved for the second (or 3rd, 4th...) time. It seems to me that the combination also requires that the problem is built using rij_matrix (and that the matrix is large?).
Now, the long explanation: when I run problems for the first time (with or without boundary penalties, but with locked-out constraints) it is always feasible (as I am using targets adjusted to be feasible). As a reminder, it is a min_set_objective, with binary decisions that uses gurobi as a solver.
But then, if I try solving again the same problem (right after repeating the same line of code, with no other intermediary code run), it is infeasible!
There are some differences in gurobi's output when I solve the problem the first time and when it is the 2nd, 3rd... The first few lines are the same always (description of the models in rows, columns, integer variables, coefficient statistics...). But then they are different: the output for the first time continues with "found heuristic solution: objective XXXX", and then start a presolving that takes ~120s and prints 8-9 lines as it removes more rows and columns. Contrastingly, the output from the second, third time... starts the presolve right after the "coefficient statistics" lines (with the four "range", "matrix", "objective", "bounds" and "RHS"). So, there is no "Found heuristic solution: ..." line. Also, presolve only prints one line and ends in ~4s.
In summary, it seems to me that when a conservation problem is built (or read from an RDS object), an "id" is randomly given to it. When this problem is solved the first time, something in gurobi options or temporary files stores precomputed data for that "id" problem, and later produces the infeasible error when trying to solve the problem with the same "id".
Is this possible? Sorry, I do not really know the details for how does the package communicates with gurobi.
I have reproduced this problem ~20 times always with the same result, using slightly different conservation problems. In all my tests I have used the same planning units (for the north of South America, ~55000 PUs). I found the problem using ~13000 species as features, using a different table of ~13000 species (same species, but future projections of distributions, so a different table), using a table of ~4000 different species, and using the ~4000 future projections of these ones. In all those tests, the first time is feasible, from the second solving try in advance, infeasible (and eventually R crashes in the 3rd or 4th). However, and this is very weird, I do not find the problem with the example (and small) data in the prioritizr package (also built using matrixes) (it does not become infeasible no matter how many times I solve it). Nonetheless, I am not sure that the problem only affects large conservation problems built with matrixes. I do not have as large problems built with raster data as to test it.
I have made these tests on a window computer (32 Gb RAM and 20 cores) using R 3.4.4 with gurobi 8 and last version of prioritizr (v22.214.171.124), but I have also repeated some of them in a mac laptop (8 Gb and 4 cores) using R 3.5, gurobi 8 and the same last version of prioritizr (v126.96.36.199).
I can continue my work now by re-reading a saved RDS with the problem before every solving step... but it is not ideal (the RDS is ~200Mb, takes time), and I am really intrigued about what is going on! Moreover, maybe this will affect me or other people at some point... So, if you have any clue...
Thanks very much!!!
Yeah, it's definitely not ideal and sounds like a serious bug. Thank you again for raising this. It can't really debug this without a reproducible example though. Could you please email me a copy of your script so I can exactly what you're doing?
Also, could you try running the code below to verify if prioritizr is copying internal objects correctly:
Thank you very very much!
I sent you the script to your email.
As per the code you gave me, there is no output from the
For the record, I have been able to test that the same large problem (>13000 sp) does not turn infeasible after a first solving when using raster-built conservation problems.
So, the main point is that the issue, if there is one, does not seem to affect raster-problems.
Thanks again, and look forward hearing from you!
Thank you very much for your help @javierfajnolla and for persisting with this. I really appreciate it.
Yeah, the code I posted was testing if constraints were getting added correctly to problem objects, and they were being added correctly, so it didn't throw an error.
Thanks to you, I was able work out that the issue was occurring due to a bug in the
@javierfajnolla, if you're interested in why you were getting infeasible problems upon attempting to solve the problem a second time, this bug was causing the wrong planning units to be locked out when you tried solving a problem a second time, and because some of these planning units were needed to meet the targets, the problem became infeasible.
Thank you again @javierfajnolla, if we catch up sometime, remind me to buy you a beer!
Awesome, I really like beer!
But there is really no need, it pays off with all the work you all have done with prioritizr. I am happy to help as part of the community.
Thanks for the explanation, the errors make sense now. I am still intrigued about where were locked planning units changing. Because the conservation problems were not changing when solving them for the first time... Was it at some files that the package is storing somewhere?
Anyway, thanks for the update!
The prioritizr package doesn't save data in files on the computer's disk building or solving problems, it's all stored in the computer's memory (which is why you can save problem objects using the
The specific lines in the source code causing the bug were 242--244 in R/add_manual_locked_constraints (see here for diff: 0807865#diff-f3eee206eec8d70d5f3a7bc903bbcc21). Specifically, when prioritizr compiled the problem it sent the locked in/out data to a c++ function (rcpp_apply_locked_constraints; https://github.com/prioritizr/prioritizr/blob/master/src/rcpp_apply_locked_constraints.cpp) which has some code that subtracts one from the zone and planning unit ids (lines 11--12) because c++ uses base-zero indexing. However, because Rcpp let's you access the same objects as the R interpreter, this caused the data stored in the problem to also have one subtracted from the planning unit ids (resulting incorrect planning units being locked in/out) and one subtracted from the zone ids (occasionally resulting in a segfault R crash). Since we only use the locked data once when compiling the problem, this means we get the correct solution when solving the problem once, but if we try solving the problem again, we would be using the planning unit and zone ids that have already had one subtracted from them, and the c++ code would subtract one again.
To fix this, I wrapped the planning unit ids and zone ids in