max-nondef-actions should be sanity checked #9

fdouglis · 2020-11-13T16:45:49Z

I'm new to RDDL, and as I first started with playing with it, I created something that had no actions on the one hand, but max-nondef-actions set very high on the other. I'm not sure what prompted such a silly configuration. When I added actions, I saw that rddlsim blew up --- with more than one action, it could run overnight without getting through the first generation.

It seems like what's happening is that it happily accepts max-nondef-actions > num_possible_actions. I believe there is some code that sanity checks to say if you want concurrent actions but the maximum is 1, you blew it, and if you don't want concurrent actions but the maximum is > 1, you blew it. However, I think it should also make sure that max-nondef-actions is feasible. with it set that high, it simply recursed doing the same work over and over. If I understand it correctly, it should never be greater than the number of possible actions. In addition, there is probably some value where it would be reasonable to warn that someone may be getting carried away.

Here is a trivial rddl file that demonstrates it blowing up. If I set it to 2 for the 2 actions it runs quickly, and if I set max-nondef-actions to 4, say, it runs almost as quickly but there's a brief pause on my system. At 100, as set here, it hangs.

blowup.rddl.txt

The text was updated successfully, but these errors were encountered:

thomaskeller79 · 2020-11-13T17:12:03Z

Hi @fdouglis welcome to RDDL!

Can you be a bit more specific what you mean with "rddlsim blew up" and " If I set it to 2 for the 2 actions it runs quickly"? How did you run rddlsim? Can you provide the command that you used such that the error can be reproduced?

fdouglis · 2020-11-13T18:52:51Z

Thanks @thomaskeller79 for the prompt response!

I used the following command:
java -Xms100M -Xmx500M -classpath ./bin:./lib/commons-math3-3.2.jar:./lib/grappa1_4.jar:./lib/java_cup.jar:./lib/jlex.jar:./lib/xercesImpl.jar:./lib/xml-apis.jar rddl.sim.Simulator ..../blowup.rddl rddl.policy.RandomBoolPolicy blowup1

By "blew up" I meant that when I ran my own, less simplified, rddl file, it ran for many hours without making progress. If I hit ^\ it dumped a backtrace that included the following deep stack:

"main" #1 prio=5 os_prio=0 cpu=3983178.85ms elapsed=3991.69s tid=0x00007f06c0017000 nid=0x1108 runnable [0x00007f06c9fd8000] java.lang.Thread.State: RUNNABLE at rddl.RDDL$PVAR_EXPR.sampleLTerms(RDDL.java:2617) at rddl.RDDL$PVAR_EXPR.sample(RDDL.java:2562) at rddl.RDDL$AGG_EXPR.sample(RDDL.java:2404) at rddl.RDDL$COMP_EXPR.sample(RDDL.java:3513) at rddl.RDDL$QUANT_EXPR.sample(RDDL.java:3137) at rddl.State.checkStateActionConstraints(State.java:403) at rddl.ActionGenerator.buildBooleanActionSet(ActionGenerator.java:81) at rddl.ActionGenerator.buildBooleanActionSet(ActionGenerator.java:86) ... at rddl.ActionGenerator.buildBooleanActionSet(ActionGenerator.java:86)

That last line continues for many lines.

If I set max-nondef-actions=2 in the attached rddl, instead of 100, it runs just fine. My statement about "for the two actions" was probably misplaced -- if you have several variables I guess you could choose to simultaneously set one action term to be true for multiple instances. (I'm sure I'm using the wrong terms here... I mean you might have foo(x1) and foo(x2) both set to true in one round.) The point however is that if I set it to a small number it is ok, and if I set it to a large number -- which is clearly larger than the combinatorial possible set of actions in my original RDDL -- it "works too hard" and doesn't progress. I suspect if I instrumented it with a global variable that counted the number of recursions it would be obvious what's happening.

So you could probably compute the maximum number of possible actions that could be selected and abort if that number is higher, or shrink it down to that maximum (to allow someone to say "unlimited" without it actually trying that number. But in addition, it seems like either it should be sanity-checked to a number that can reasonably run, or maybe be optimized in a different way. Let's say we have one action foo(?x) and 10 values of x. If max-nondef-actions is 10, any or all of foo(x_i) might be true, for 2^10 possible outcomes. But the way it processes it right now, it considers all possible permutations of those outcomes as well, even though the order doesn't matter -- in one descent it sets x1 and then x7, and in a different descent it sets x7 and then x1 ... I think? And again, assuming I'm reading it right (quite possibly otherwise), at the end of ActionGenerator, is it pulling out a possible action and excluding it from other actions that might be set, or passing it down? For the simple test, with many "allowed" actions but few "possible" actions I would expect it to pick something for foo, pick something for bar, and have no more choices -- so even though actions_left is 98, it should realize there is nothing left to do and move on. Even if it wants to pick every possible combination of actions, that would be a fairly small number.

Hope this helps.

thomaskeller79 · 2020-11-18T19:27:46Z

Sorry for taking so long to answer, but I didn't manage to verify this earlier. Now I was finally able to do so, and I can verify that there is something wrong.

I'll try to explain how max-nondef-action should work: the behavior I would expect here is that things get slower when we increase max-nondef-actions from 1 to 6 because the number of ground actions increases in this range: there are 6 different action variables in the grounded instance (foo(x1), foo(x2), bar(y1), bar(y2), bar(y3) and bar(y4)), and hence the following number of ground actions:

7 with max-nondef-actions = 1: one for each action variable where only that action variable is true (i.e., 6), and one where no action variable is true
22 with max-nondef-actions = 2: the same 7 as above, and additionally 6 choose 2 = 15 where exactly 2 action variables are true
42 with max-nondef-actions = 3: the same 22 as above, and additionally 6 choose 3 = 20 where exactly 3 action variables are true
57 with max-nondef-actions = 4: the same 42 as above, and additionally 6 choose 4 = 15 where exactly 4 action variables are true
63 with max-nondef-actions = 5 (the rule should be clear by now)
64 with max-nondef-actions = 6

Values higher than max-nondef-action = 6 do not increase the number of ground actions any further, because it is not possible to make more action variables true than all action variables, and the number of ground actions hence remains the same even for higher values of max-nondef-actions. Note that the order in which the action variables are assigned is irrelevant.

Despite all of this, I can confirm that your command makes the Simulator (or the RandomPolicy, I am not certain what these two tools do) slower and slower, in particular beyond max-nondef-actions=6 even though that shouldn't happen. @ssanner, do you have any idea where this problem comes from? @fdouglis, I am sorry I cannot help you any further than that (well, I can try to help if Scott points us to the right spot).

fdouglis · 2020-11-18T19:34:20Z

Thanks, I appreciate the info. I suppose that in general a sane value of max-nondef-actions will keep things reasonable; the question is whether there is a good way to point out that the value is not so sane, like when I tried effectively infinity.

I eventually got my scenario to run OK, and have been working ever since on getting it to run in Prost. Instead I'm finding there are things that are unimplemented (such as a switch statement on x(?y)); things that it complains about without really helping to find why it is complaining, but which were actually bugs (such as my defining FOO(?x) and then referencing foo(?x) -- rddlsim surprisingly didn't complain); and finally, just random exceptions I'm still trying to track down. Guess it's time to pester that github repo issues list instead! :)

thomaskeller79 · 2020-11-19T07:38:14Z

Setting it to the minimum of the provided value and the number of grounded action variables should resolve this issue (not the issue that a large number makes planning slow, but that cannot be avoided because the number of ground actions is exponential in that value).

Personally, I am not using max-nondef-actions at all, since it is just a special form of an action-precondition anyway and can easily be encoded as one (as in the probabilistic IPC2018 domains).

fdouglis · 2020-11-19T14:54:51Z

Yes, that's just what I was getting at initially -- if one puts a higher value there to mean "unlimited" it should be smart about what to do in a given context, or complain that the number exceeds the maximum legit value, or something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max-nondef-actions should be sanity checked #9

max-nondef-actions should be sanity checked #9

fdouglis commented Nov 13, 2020

thomaskeller79 commented Nov 13, 2020

fdouglis commented Nov 13, 2020

thomaskeller79 commented Nov 18, 2020

fdouglis commented Nov 18, 2020

thomaskeller79 commented Nov 19, 2020

fdouglis commented Nov 19, 2020

max-nondef-actions should be sanity checked #9

max-nondef-actions should be sanity checked #9

Comments

fdouglis commented Nov 13, 2020

thomaskeller79 commented Nov 13, 2020

fdouglis commented Nov 13, 2020

thomaskeller79 commented Nov 18, 2020

fdouglis commented Nov 18, 2020

thomaskeller79 commented Nov 19, 2020

fdouglis commented Nov 19, 2020