Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corelog chokes coredevice #986

Closed
jordens opened this issue Apr 23, 2018 · 6 comments
Closed

corelog chokes coredevice #986

jordens opened this issue Apr 23, 2018 · 6 comments

Comments

@jordens
Copy link
Member

jordens commented Apr 23, 2018

from artiq.experiment import *

class TestPause(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("scheduler")

    def run(self):
        for i in range(10):
            print(self.k()*1e-9)

    @kernel
    def k(self):
        self.core.break_realtime()
        t0 = self.core.get_rtio_counter_mu()
        self.scheduler.check_pause()
        return self.core.get_rtio_counter_mu() - t0

On a freshly started coredevice with artiq_run and nothing else running:

(py35) rj@murray:~/src/artiq-setup (master)$ artiq_run repository/test_pause.py
WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (4.0.dev+849.g4fe09fdd.dirty) and software (4.0.dev+850.gada0a278) versions
0.00244876
0.00230096
0.002301528
0.00260904
0.002303112
0.002300456
0.002300416
0.002300376
0.002609592
0.002606632

Now starting aqctl_corelog:

(py35) rj@murray:~/src/artiq-setup (master)$ aqctl_corelog -p 1068 --bind ::1 10.0.16.118 &
[2] 3890
(py35) rj@murray:~/src/artiq-setup (master)$ INFO:firmware.runtime:ARTIQ runtime starting...
INFO:firmware.runtime:software version 4.0.dev+849.g4fe09fdd
INFO:firmware.runtime:gateware version 4.0.dev+849.g4fe09fdd.dirty
INFO:firmware.runtime:log level set to INFO by default
INFO:firmware.runtime:UART log level set to INFO by default
INFO:firmware.runtime:press 'e' to erase startup and idle kernels...
INFO:firmware.runtime:continuing boot
INFO:firmware.board_artiq.si5324:Si5324 is locked
INFO:firmware.runtime:using MAC address 54-10-ec-aa-09-72
INFO:firmware.runtime:using IP address 10.0.16.118
INFO:firmware.runtime.rtio_mgt:using external startup RTIO clock
INFO:firmware.runtime.mgmt:management interface active
INFO:firmware.runtime.session:accepting network sessions
INFO:firmware.runtime.session:running startup kernel
INFO:firmware.runtime.kern_hwreq:resetting RTIO
INFO:firmware.runtime.session:startup kernel finished
INFO:firmware.runtime.session:no connection, starting idle kernel
INFO:firmware.runtime.session:no idle kernel found
INFO:firmware.runtime.session:new connection from 10.0.16.10:56778
INFO:firmware.runtime.session:no connection, starting idle kernel
INFO:firmware.runtime.session:no idle kernel found
INFO:firmware.runtime.mgmt:new connection from 10.0.16.10:41518
(py35) rj@murray:~/src/artiq-setup (master)$ ps ax | grep aqctl
 3890 pts/4    Sl     0:00 /home/rj/src/conda/envs/py35/bin/python /home/rj/src/conda/envs/py35/bin/aqctl_corelog -p 1068 --bind ::1 10.0.16.118
 4046 pts/4    S+     0:00 grep --color=auto aqctl
(py35) rj@murray:~/src/artiq-setup (master)$ ps ax | grep artiq
 4051 pts/4    S+     0:00 grep --color=auto artiq
(py35) rj@murray:~/src/artiq-setup (master)$

And reattempting the experiment:

(py35) rj@murray:~/src/artiq-setup (master)$ artiq_run repository/test_pause.py
INFO:firmware.runtime.session:new connection from 10.0.16.10:56786
WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (4.0.dev+849.g4fe09fdd.dirty) and software (4.0.dev+850.gada0a278) versions
0.454824784
0.4547608
0.454760808
0.454776616
0.454777624
0.454917736
0.454914816
0.454761288
0.454760688
0.45493312

Persists after killing aqctl_corelog:

(py35) rj@murray:~/src/artiq-setup (master)$ kill 3890
[2]-  Terminated              aqctl_corelog -p 1068 --bind ::1 10.0.16.118
(py35) rj@murray:~/src/artiq-setup (master)$ artiq_run repository/test_pause.py
WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (4.0.dev+849.g4fe09fdd.dirty) and software (4.0.dev+850.gada0a278) versions
0.454673616
0.454611728
0.45461164
0.454633296
0.454632704
0.454683088
0.454769352
0.454617472
0.454620032
0.45477808

But after a while (in this case ~5 minutes) it recovers:

(py35) rj@murray:~/src/artiq-setup (master)$ artiq_run repository/test_pause.py
WARNING:artiq.coredevice.comm_kernel:Mismatch between gateware (4.0.dev+849.g4fe09fdd.dirty) and software (4.0.dev+850.gada0a278) versions
0.002502056
0.002502744
0.002680816
0.002681608
0.002683608
0.002680888
0.002681624
0.00268192
0.002680776
0.002683944
  • Maybe or may not be identical or related to Device slowdown after extended run time #979
  • No dashboard, no master, running in the repro above.
  • Experiments submitted through artiq_master and artiq_run are also different. On artiq_master the RPCs take ~5ms while on artiq_run they are around 2.5ms
  • I haven't checked whether the break_realtime() and/or the fact that the scheduler.check_pause() is used as an RPC matter.
@whitequark
Copy link
Contributor

Experiments submitted through artiq_master and artiq_run are also different. On artiq_master the RPCs take ~5ms while on artiq_run they are around 2.5ms

That's probably because of moninj. Moninj, session, and corelog all compete for CPU time, and since we use a round-robin scheduler without any kind of readiness notification mechanism, adding one more thread increases the average latency by poll time plus schedule time plus (in case of corelog) two context switch times.

@jordens jordens modified the milestones: 4.0, 3.7 Apr 23, 2018
@jordens
Copy link
Member Author

jordens commented Apr 23, 2018

And poll time plus schedule time plus two ctx switches are 400k cycles? Or is that the scheduler tick?

@whitequark
Copy link
Contributor

The scheduler is cooperative so it doesn't have ticks. It switches to the next thread once the current one blocks on something.

@whitequark
Copy link
Contributor

400k cycles were spent rotating the log ring buffer in place. I am now observing exact same performance with or without aqctl_corelog.

@sbourdeauducq
Copy link
Member

@whitequark Please fix this in release-3 as well.

@sbourdeauducq sbourdeauducq reopened this May 6, 2018
whitequark added a commit that referenced this issue May 6, 2018
@whitequark
Copy link
Contributor

Ah sorry, didn't notice the milestone.

@jordens jordens moved this from pain to done in dashboard-jordens May 9, 2018
@jordens jordens removed this from done in dashboard-jordens May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants