Skip to content

Debugging sessions

Adrian Palacios edited this page Dec 6, 2017 · 4 revisions

A typical debugging sessions proceeds as follows:

  1. The user evaluates the program forward.
  2. The user inspects looks for an abnormal behaviour.
  3. When the user finds something wrong, she rolls back a process until some event of interest.
  4. The user performs some Manual steps until the bug is found.

From here, we explain each part in more detail.

Evaluate program

Concurrency bugs can either appear since the beginning of the evaluation or after some time. Thus, the user should first exercize the program in order to skip the initialization and look for abnormal behaviours. How many steps should the evaluation move forwards depends on:

  • The program.
  • The bugs that we are looking for.

For instance, in the dining philosophers example (simple version) it is enough to evaluate the program forward for about 600 steps. In contrast, the complex version requires about 2000 steps.

State and Trace inspection

In this part, the user should look for an abnormal behaviour in the program by looking at the State and Trace tabs. Which one should the user look at first depends on many things, but here are some guidelines:

  • If the behaviour appears since the beginning of the program, it is most likely that the bug is local to some process. In that case, you should look at the State tab.
  • On the contrary, if the behaviour is shown after some time, that means it is induced. Thus, the bug depends on the global state, and in this case it is recommended that you check the Trace tab first.

Of course, sometimes it is not easy to say whether the behaviour is induced or not, or perhaps in your program this behaviour is also induced by some local state. Hence, checking the complementary tab sometimes can give us more information about this behaviour than the one we looked first.

For instance, in the simple version of the dining philosophers, the behaviour is shown at the beginning. When we look at the State tab, we can discern without too much effort the reason for its abnormal behaviour. On the contrary, the complex version requires some time to be shown, and only then, one will know there is an abnormal behaviour by carefully checking the Trace tab.

In this part, it might be particularly useful to show the full environment (to know the binding of all the variables) or full histories (to know exactly the performed actions).

When the user finds an abnormal behaviour, she can then proceed to roll back the state up to some event of interest, when it is not clear what it is causing it. If the user can not find anything, perhaps it is because there is nothing to find. In that case, moving the evaluation either forward or backward is recommended. Since rollbacks can tell you causal information, it is also recommended to move your program by using rollbacks, as we explain in the next part.

Rollbacks

Also depending on the kind of behaviour shown, the roll backs you might use can be different:

  • In behaviours that are due only to some concurrent code (i.e., spawn, send, receive), the associated rollbacks (roll spawn, roll send, roll receive) are the most efficient ones.
  • In behaviours that also involve some sequential code, roll var has proven to be quite powerful.

After performing a roll back, the Roll Log tab is focused automatically. The information printed in this tab tells the user about the propagated rollbacks from one process to the dependent ones. That means that this information is all about the causal relations in your system. Hence, sometimes this information can also be useful to diagnose an abnormal behaviour.

Manual control

Once the user has rolled back the state up to a point of interest, she can switch to the manual mode in order to precisely control the evaluation. In this part, moving a few steps forward or backward in the evaluation, and looking at which transitions are enabled, can tell you more things about the behaviour.

However, this procedure is a bit exhausting and it is recommended that you leave it until you are focused on a particular block of code that you suspect is provoking the abnormal behaviour.