Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rigol: Floating point exception #803

Open
penguin42 opened this issue Sep 20, 2023 · 18 comments
Open

Rigol: Floating point exception #803

penguin42 opened this issue Sep 20, 2023 · 18 comments
Labels
bug Something isn't working driver Hardware drivers

Comments

@penguin42
Copy link

Connecting to a DS1102Z-E Rigol via LAN, clicked on the single shot button and got:
Warning: Ran out of data after 0 points
Warning: Attempted to bind an empty buffer
Floating point exception (core dumped)

#0 0x00000000008cc374 in WaveformArea::RasterizeAnalogOrDigitalWaveform(std::shared_ptr, vk::raii::CommandBuffer&, bool)
(this=0x33729f0, channel=std::shared_ptr (use count 3, weak count 0) = {...}, cmdbuf=..., clearPersistence=false)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1603
#0 0x00000000008cc374 in WaveformArea::RasterizeAnalogOrDigitalWaveform (this=0x33729f0, channel=..., cmdbuf=..., clearPersistence=false)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1603
1603 int64_t offset_samples = (offset - data->m_triggerPhase) / data->m_timescale;

(gdb) p *data
$2 = {_vptr.WaveformBase = 0x9af840 <vtable for UniformWaveform+16>, m_timescale = 0, m_startTimestamp = 16951
70282,
m_startFemtoseconds = 23995876312255, m_triggerPhase = 0, m_flags = 0 '\000', m_revision = 0}

ngscopeclient/scopehal-apps#1 0x00000000008cd42f in WaveformArea::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&, bool)
(this=0x33729f0, cmdbuf=..., chans=std::vector of length 1, capacity 1 = {...}, clearPersistence=clearPersistence@entry=false)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1545
ngscopeclient/scopehal-apps#2 0x00000000008db504 in WaveformGroup::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&, bool)
(this=, cmdbuf=..., channels=std::vector of length 1, capacity 1 = {...}, clearPersistence=clearPersistence@entry=false)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformGroup.cpp:164
ngscopeclient/scopehal-apps#3 0x0000000000801436 in MainWindow::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&) (this=, cmdbuf=, channels=)
at /discs/more/git/scopehal-apps/src/ngscopeclient/MainWindow.cpp:434
ngscopeclient/scopehal-apps#4 0x00000000008831cc in Session::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&) (this=, cmdbuf=, channels=)
at /discs/more/git/scopehal-apps/src/ngscopeclient/Session.cpp:2166
ngscopeclient/scopehal-apps#5 0x00000000008e7edc in RenderAllWaveforms(vk::raii::CommandBuffer&, Session*, std::shared_ptr)
(cmdbuf=..., session=0x3714770, queue=std::shared_ptr (use count 5, weak count 0) = {...})
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformThread.cpp:167
ngscopeclient/scopehal-apps#6 0x00000000008e8a34 in WaveformThread(Session*, std::atomic) (session=0x3714770, shuttingDown=0x37148a0)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformThread.cpp:144
ngscopeclient/scopehal-apps#7 0x00007fb1950e31f3 in std::execute_native_thread_routine(void
) (__p=0x3d778e0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
ngscopeclient/scopehal-apps#8 0x00007fb194dcb897 in start_thread (arg=) at pthread_create.c:444
ngscopeclient/scopehal-apps#9 0x00007fb194e5261c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@azonenberg
Copy link
Collaborator

Looks like what's happening here is that the scope is doing something the driver doesn't like during acquisition (header_blocksize = 0 in RigolOscilloscope.cpp:848).

The driver fails in a rather unusual way (creates a waveform with zero points, rather than the expected way of sending "no data" which is to set the waveform pointer to null) which then causes a bunch of the GUI logic to choke.

Can you re-run with --debug --trace SCPISocketTransport (assuming you're using the LAN transport) on the command line and upload the output? Probably best to make it an attachment as the dump will likely be large.

@penguin42
Copy link
Author

Hmm, please find attached a log generated by:

./src/ngscopeclient/ngscopeclient --debug --trace SCPISocketTransport > log 2>&1

although it ends in:
[SCPISocketTransport::ReadRawData] Got 250001 bytes
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STAR 22250001
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STOP 22500000
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:DATA?
[SCPISocketTransport::ReadRawData] Got 2 bytes
[SCPISocketTransport::ReadRaw

which makes me wonder whether it's complete.

I restarted ngscopeclient and did it again, but it failed much faster, and this time I left out the redirects, so here is where it ended on my terminal:

Connecting to SCPI device at ronnie:5555
[SCPISocketTransport::SendCommand] [ronnie] Sending *IDN?
[SCPISocketTransport::ReadReply] [ronnie] Got RIGOL TECHNOLOGIES,DS1102Z-E,DS1ZE231706403,00.06.02
[SCPISocketTransport::SendCommand] [ronnie] Sending :WAV:FORM BYTE
[SCPISocketTransport::SendCommand] [ronnie] Sending :WAV:MODE RAW
[SCPISocketTransport::SendCommand] [ronnie] Sending :TIM:VERN ON
[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:DISP?
[SCPISocketTransport::ReadReply] [ronnie] Got 1
[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN2:DISP?
[SCPISocketTransport::ReadReply] [ronnie] Got 0
[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:DISP ON
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:MODE?
[SCPISocketTransport::ReadReply] [ronnie] Got EDGE
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:SOUR?
[SCPISocketTransport::ReadReply] [ronnie] Got CHAN1
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:LEV?
[SCPISocketTransport::ReadReply] [ronnie] Got 0.000000e+00
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:SLOPE?
[SCPISocketTransport::ReadReply] [ronnie] Got POS
[SCPISocketTransport::SendCommand] [ronnie] Sending :TIM:MAIN:OFFS?
[SCPISocketTransport::ReadReply] [ronnie] Got 0.0000000e+00
[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:OFFS?
[SCPISocketTransport::ReadReply] [ronnie] Got 0.000000e+00
[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:RANGE?
[SCPISocketTransport::ReadReply] [ronnie] Got 4.000000e-02
[SCPISocketTransport::SendCommand] [ronnie] Sending :SING
[SCPISocketTransport::SendCommand] [ronnie] Sending *WAI
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT?
[SCPISocketTransport::ReadReply] [ronnie] Got STOP
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN1
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE?
[SCPISocketTransport::ReadReply] [ronnie] Got 0,2,1200,1,5.000000e-11,-3.000000e-08,0,2.000000e-04,0,127
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STAR 1
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STOP 1200
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:DATA?
[SCPISocketTransport::ReadRawData] Got 2 bytes
[SCPISocketTransport::ReadRawData] Got 9 bytes
Warning: Ran out of data after 0 points
Warning: Attempted to bind an empty buffer
Segmentation fault (core dumped)

which makes me wonder if it left the scope in an odd state.

@penguin42
Copy link
Author

log.txt

@penguin42
Copy link
Author

I get the feeling there's something timing going on here;
I broke out tcpdump and got a slightly different failure:

[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:RANGE?
[SCPISocketTransport::ReadReply] [ronnie] Got 4.000000e-02
[SCPISocketTransport::SendCommand]         [ronnie] Sending :SING
[SCPISocketTransport::SendCommand]         [ronnie] Sending *WAI
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT?
[SCPISocketTransport::ReadReply] [ronnie] Got STOP
[SCPISocketTransport::SendCommand]     [ronnie] Sending WAV:SOUR CHAN1
[SCPISocketTransport::SendCommand]     [ronnie] Sending WAV:PRE?
[SCPISocketTransport::ReadReply]     [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0
Floating point exception (core dumped)

@penguin42
Copy link
Author

Oh, I think this is as simple that it hasn't actually triggered yet - there's no data; and depending on the timing it might be in the middle of the capture.

@penguin42
Copy link
Author

Manually triggering on the scope panel and then throwing ngscopeclient at it and it's working.

@azonenberg
Copy link
Collaborator

Very interesting, because it's supposed to be polling the trigger status and not trying to download data until it's actually triggered.

As a guess, this is a bug where it can't tell the difference between "trigger stopped because the scope just booted and has no data" and "trigger stopped because we armed the trigger and then it triggered".

@azonenberg
Copy link
Collaborator

Moving to scopehal as this is a driver bug, zero point waveforms aren't well defined so it's not surprising the GUI chokes if given one.

@azonenberg azonenberg transferred this issue from ngscopeclient/scopehal-apps Sep 23, 2023
@azonenberg azonenberg added bug Something isn't working driver Hardware drivers labels Sep 23, 2023
@azonenberg
Copy link
Collaborator

So it looks like there's logic in the driver (https://github.com/glscopeclient/scopehal/blob/master/scopehal/RigolOscilloscope.cpp#L705) to work around this problem.

Can you poke around here a bit and figure out if the check isn't triggering or something else is going on?

@penguin42
Copy link
Author

So it looks like there's logic in the driver (https://github.com/glscopeclient/scopehal/blob/master/scopehal/RigolOscilloscope.cpp#L705) to work around this problem.

Can you poke around here a bit and figure out if the check isn't triggering or something else is going on?

I think the problem is that (sometimes, timing) that the :TRIG:STAT? sent immediately after the *WAI hasn't yet got into the wait state;

[SCPISocketTransport::SendCommand] [ronnie] Sending :SING
[SCPISocketTransport::SendCommand] [ronnie] Sending *WAI
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT?
[SCPISocketTransport::ReadReply] [ronnie] Got STOP
<<>
PollTrigger, got: STOP
PollTrigger, else case for: STOP Armed=1 protocol=1 Live=0
PollTrigger: Unsticky Rigol special
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN1
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE?
[SCPISocketTransport::ReadReply] [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN2
[SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE?
[SCPISocketTransport::ReadReply] [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0
Warning: 0 timescale (Hack to avoid the divide 0 crash)
Warning: 0 timescale

Typed on the SCPI console:
[SCPISocketTransport::SendCommand] [ronnie] Sending TRIG:STAT?
[SCPISocketTransport::ReadReply] [ronnie] Got WAIT

so when I got around to typing a TRIG:STAT? by hand in the SCPI console it had got to WAIT (I'd turned off the trigger input connected to the probes so it was just sitting waiting; perhaps it's blatantly ignoring your *WAI ?

@azonenberg
Copy link
Collaborator

Interesting, might be a firmware race condition.

How about adding a small time delay after the *WAI before sending TRIG:STAT? Does that make it work reliably? It'll hurt performance but we have hacks like this in the Siglent driver for similar reasons. (Even LeCroy has one or two annoying races but they only come into play for multi-scope setups)

@penguin42
Copy link
Author

I've got back to prodding this a bit.

a) I think actually in single-shot mode, as long as you had a trigger level set that actually did trigger it might have worked.
b) But in repeated mode, it normally hits the problem I hit above.

c) I had a poke at the firmware, I'm not convinced the *WAI really does anything, but I may be wrong.
d) The *CLS below seems mostly to fix the repeated mode, and the *OPC? adds a forced roundtrip that seems to make it more stable.
e) I still occasionally see the 0 timescale warning I added; normally when I'm in a state when I didn't actually get a trigger.

I'm currently running with the following patches

commit 06e2b40f6dfb3389c15a2a7ceeb792014366bdd3
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Mon Nov 20 18:20:47 2023 +0000

    Rigol: Refactor triggering to share code
    
    Factor out the 3 copies of hte trigger setup into a helper.

diff --git a/scopehal/RigolOscilloscope.cpp b/scopehal/RigolOscilloscope.cpp
index fa51c15a..14954dde 100644
--- a/scopehal/RigolOscilloscope.cpp
+++ b/scopehal/RigolOscilloscope.cpp
@@ -714,6 +714,21 @@ Oscilloscope::TriggerMode RigolOscilloscope::PollTrigger()
        }
 }
 
+void RigolOscilloscope::SetTriggerHelper()
+{
+       if(m_protocol == DS_OLD)
+       {
+               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
+               m_transport->SendCommand(":RUN");
+       }
+       else
+       {
+               m_transport->SendCommand(":SING");
+               m_transport->SendCommand("*WAI");
+       }
+       m_triggerArmed = true;
+}
+
 bool RigolOscilloscope::AcquireData()
 {
        //LogDebug("Acquiring data\n");
@@ -905,15 +920,8 @@ bool RigolOscilloscope::AcquireData()
                if(m_protocol == DS_OLD)
                {
                        m_transport->SendCommand(":STOP");
-                       m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-                       m_transport->SendCommand(":RUN");
-               }
-               else
-               {
-                       m_transport->SendCommand(":SING");
-                       m_transport->SendCommand("*WAI");
                }
-               m_triggerArmed = true;
+               SetTriggerHelper();
        }
 
        //LogDebug("Acquisition done\n");
@@ -925,34 +933,14 @@ void RigolOscilloscope::Start()
 {
        //LogDebug("Start single trigger\n");
        lock_guard<recursive_mutex> lock(m_mutex);
-       if(m_protocol == DS_OLD)
-       {
-               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-               m_transport->SendCommand(":RUN");
-       }
-       else
-       {
-               m_transport->SendCommand(":SING");
-               m_transport->SendCommand("*WAI");
-       }
-       m_triggerArmed = true;
+       SetTriggerHelper();
        m_triggerOneShot = false;
 }
 
 void RigolOscilloscope::StartSingleTrigger()
 {
        lock_guard<recursive_mutex> lock(m_mutex);
-       if(m_protocol == DS_OLD)
-       {
-               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-               m_transport->SendCommand(":RUN");
-       }
-       else
-       {
-               m_transport->SendCommand(":SING");
-               m_transport->SendCommand("*WAI");
-       }
-       m_triggerArmed = true;
+       SetTriggerHelper();
        m_triggerOneShot = true;
 }
 
diff --git a/scopehal/RigolOscilloscope.h b/scopehal/RigolOscilloscope.h
index 2433f57e..54f778f2 100644
--- a/scopehal/RigolOscilloscope.h
+++ b/scopehal/RigolOscilloscope.h
@@ -138,6 +138,9 @@ protected:
 public:
        static std::string GetDriverNameInternal();
        OSCILLOSCOPE_INITPROC(RigolOscilloscope)
+
+private:
+       void SetTriggerHelper();
 };
 
 #endif

commit f774db9ad3659998780bfdf5025e6a1119c84023 (HEAD)
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Tue Nov 21 00:59:50 2023 +0000

    Rigol: Trigger timing
    
    On my DS1102Z-E, I often get triggering problems.
    The *CLS seems to help repeated triggers work.
    The *OPC? enforces a round trip that also seems to make
    it more solid.

diff --git a/scopehal/RigolOscilloscope.cpp b/scopehal/RigolOscilloscope.cpp
index 14954dde..d67674e2 100644
--- a/scopehal/RigolOscilloscope.cpp
+++ b/scopehal/RigolOscilloscope.cpp
@@ -723,8 +723,10 @@ void RigolOscilloscope::SetTriggerHelper()
        }
        else
        {
+               m_transport->SendCommand("*CLS");
                m_transport->SendCommand(":SING");
                m_transport->SendCommand("*WAI");
+               auto opcResponsea = m_transport->SendCommandImmediateWithReply("*OPC?");
        }
        m_triggerArmed = true;
 }
commit a6ae346be64d9f915b78df2e6c5deffa87cef9b6 (HEAD -> master)
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Mon Nov 20 16:34:58 2023 +0000

    WaveformArea: Survive empty capture
    
    Avoid a divide-by-0 crash on a broken capture with
    a 0 timescale.

diff --git a/src/ngscopeclient/WaveformArea.cpp b/src/ngscopeclient/WaveformArea.cpp
index b8105f1..2fc2132 100644
--- a/src/ngscopeclient/WaveformArea.cpp
+++ b/src/ngscopeclient/WaveformArea.cpp
@@ -1616,6 +1616,12 @@ void WaveformArea::RasterizeAnalogOrDigitalWaveform(
                channel->PrepareToRasterize(0, 0);
                return;
        }
+       if (data->m_timescale == 0.0)
+       {
+               LogWarning("0 timescale\n");
+               return;
+       }
+
        size_t w = m_width;
        size_t h = m_height;
        if(channel->GetStream().GetType() == Stream::STREAM_TYPE_DIGITAL)

@azonenberg
Copy link
Collaborator

I think this might be related to #817 ?

Also, please test with latest (10899c0) which includes a bunch of refactoring and bugfixes in the Rigol driver plus converts it to use the new queued command API. If this does not fix the bug, please rebase your patches on that and let me know how it works.

@azonenberg
Copy link
Collaborator

Also, as a general rule we don't use the "normal" trigger mode in scopehal drivers because of the potential for race conditions (getting channel 1 from one trigger event and channel 2 from another). We normally run single trigger all the time, and "normal" trigger in ngscopeclient is simply a flag that tells the driver to automatically re-arm the trigger once AcquireData() completes.

@penguin42
Copy link
Author

Yeh #817 is probably similar; well on head it doesn't get the div/0 any more, but it's not happy - still lots of blank captures in the "normal" mode.
Adding my *CLS helps that, but I am seeing something (in both modes?) which is I'm often only seeing the channel 2 capture with channel 1 blank; I don't think I've seen that before.

@azonenberg
Copy link
Collaborator

@penguin42 We fixed a bunch more bugs (2e74c6e, 0e26e31, and a few others) that might also have been causing your issue.

Working any better w/ latest?

@penguin42
Copy link
Author

penguin42 commented Dec 2, 2023

at 02b300df83bcd2ed9ed34c573dbedf4e9511121f main and 9564b05 it's better but still giving me some captures where nothing is showing in ch.1:
bad-capture-no-ch1

(Now with matching capture, still doing it)
Note, being a Rigol, the captures take a good few seconds each; it's always the Ch.1 that's AWOL.

@penguin42
Copy link
Author

Observation: This pair of channels is a USB pair; sometimes it captures a quite section, and sometimes it captures a burst of activity. Most (all?) of the cases where I'm missing ch.1 is when there's a burst of activity rather than the background tick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working driver Hardware drivers
Projects
None yet
Development

No branches or pull requests

2 participants