Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeslot sequence anomaly #519

Closed
jecollins opened this issue Apr 13, 2012 · 5 comments
Closed

Timeslot sequence anomaly #519

jecollins opened this issue Apr 13, 2012 · 5 comments

Comments

@jecollins
Copy link
Member

I ran a boot session, on the controllable-capacity branch (freshly updated), on my fast desktop machine on campus. In timeslot 255 I got a NPE in CompetitionControlService:

Exception in thread "Thread-7" java.lang.NullPointerException
    at org.powertac.server.CompetitionControlService.activateNextTimeslot(CompetitionControlService.java:641)
    at org.powertac.server.CompetitionControlService.step(CompetitionControlService.java:586)

In the trace log, we see that the timeslot serial number has gone off the rails:

115401 INFO  server.CompetitionControlService: Elapsed time: 29
115401 INFO  server.CompetitionControlService: Wait for tick 255
115472 ERROR server.CompetitionControlService: current timeslot serial is 256, should be 255
115472 INFO  server.CompetitionControlService: Deactivated timeslot 256, start 2009-10-20T16:00:00.000Z
115472 ERROR repo.TimeslotRepo: Timeslot 280: start:2009-10-21T16:00:00.000Z != previous.end:2009-10-21T16:00:00.000Z

On tick 255, we should be deactivating timeslot 255, not 256. On tick 254, it was correct:

114238 INFO  server.CompetitionControlService: Wait for tick 254
115372 INFO  server.CompetitionControlService: Deactivated timeslot 254, start 2009-10-20T14:00:00.000Z
115372 INFO  server.CompetitionControlService: Activated timeslot 278, start 2009-10-21T14:00:00.000Z
115372 INFO  server.CompetitionControlService: step at 2009-10-20T14:00:00.000Z
@jecollins
Copy link
Member Author

It seems that the java process was out to lunch for a spell. This session was run with a timeslot length of 400 msec, but there is a gap of 1167 msec between the end of 253 and the beginning of 254. This causes the definition of "current timeslot" to be incorrect.

@ghost ghost assigned jecollins Apr 13, 2012
@jecollins
Copy link
Member Author

This is fixed by comparing the actual time with the expected time at the beginning of each tick. If it's off by more than a small delta, then the clock is re-based and a sim-resume message is sent to brokers to re-base their clocks.

@jecollins
Copy link
Member Author

There's still a small possibility of an error here, if the delay happens in the context switch after the tick timer goes off. So probably the right way to handle this is to move the clock-correction code into the main server thread.

@jecollins jecollins reopened this May 19, 2012
@jecollins
Copy link
Member Author

Dan Urieli is encountering this problem repeatedly, on a 2 Gb machine. Probably it's hitting swap-based pauses. We need to finish closing the window on this.

@jecollins jecollins reopened this Nov 6, 2012
@jecollins
Copy link
Member Author

Changed handling of timeslot start overrun, by picking next timeslot in sequence regardless of the time, and letting the pause mechanism deal with the overrun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant