Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAC synchronization across Sayma cards #795

Closed
sbourdeauducq opened this issue Jul 20, 2017 · 48 comments
Closed

DAC synchronization across Sayma cards #795

sbourdeauducq opened this issue Jul 20, 2017 · 48 comments
Assignees
Labels
area:gateware area:sawg smart arbitrary waveform generator (f.k.a. phaser) area:sayma

Comments

@sbourdeauducq
Copy link
Member

No description provided.

@jordens
Copy link
Member

jordens commented Nov 2, 2017

depends on m-labs/jesd204b#5

@sbourdeauducq sbourdeauducq modified the milestones: 4.0, 5.0 Jan 11, 2018
@jbqubit
Copy link
Contributor

jbqubit commented Mar 29, 2018

@sbourdeauducq Has this been tested on the M-Labs setup?

@sbourdeauducq sbourdeauducq modified the milestones: 5.0, 4.0 Jul 9, 2018
@sbourdeauducq
Copy link
Member Author

The code is there and somewhat works intermittently, but I cannot do anything until sinara-hw/sinara#567 is resolved.

@sbourdeauducq
Copy link
Member Author

@gkasprow Can you expedite the rework, testing, and shipment of the replacement Sayma?

@gkasprow
Copy link
Collaborator

gkasprow commented Jul 16, 2018

I'm solving #475 . It was caused by at least 3 independent factors. 2 were found and fixed, the last one I'm trying to identify but I think I'm very close.

@sbourdeauducq
Copy link
Member Author

The microtca mess is annoying but it's not blocking other people's developments and experiments, unlike this.

@gkasprow
Copy link
Collaborator

I solved the issue. I will ship one Sayma AMC ASAP.

@sbourdeauducq
Copy link
Member Author

I cannot use it without another rtm, can you ship that as well?

@gkasprow
Copy link
Collaborator

sure.

@sbourdeauducq sbourdeauducq removed this from the 4.0 milestone Oct 8, 2018
@jbqubit
Copy link
Contributor

jbqubit commented Mar 27, 2020

In email today with @marmeladapk and @hartytp, @sbourdeauducq said

Sayma v2 DAC synchronization doesn't quite work and I don't understand why.

By this do you mean synchronization between DAC chip on a single Sayma AMC v2? What debugging steps did you do?

@sbourdeauducq
Copy link
Member Author

DAC to FPGA. I cannot test between DACs since only one DAC is working on the board I have.

@hartytp
Copy link
Collaborator

hartytp commented Mar 28, 2020

Thanks for the summary. Is there a write up of any of the symptoms you see? Anyway, it sounds like synchronisation between a single DAC and TTL is not reliable yet.

@jbqubit
Copy link
Contributor

jbqubit commented Mar 30, 2020

I've kept reminding Creotech to ship more hardware.

If you're not getting what you need out of Creotech or me or Tom or Xilinx to progress on your work make more noise. Let's get the hardware sorted so the synchronization testing can progress.

DAC -> FPGA synch?

There are unexplained/unpredictable/obscure bugs and I don't have a timeframe.

Do please get on with trying to reproduce these bugs and create Issues. Understood that you may not have a timeline for fixing bugs which you've not seen reproduce. What's your timeline for running tests on the hardware to try to reproduce the bugs?

@hartytp
Copy link
Collaborator

hartytp commented Mar 30, 2020

Do please get on with trying to reproduce these bugs and create Issues. Understood that you may not have a timeline for fixing bugs which you've not seen reproduce. What's your timeline for running tests on the hardware to try to reproduce the bugs?

Do you mean me or @sbourdeauducq? Right now I don't have a good description of the bugs from @sbourdeauducq. Also, AFIACT @sbourdeauducq isn't having any trouble reproducing these issues, so I don't see that me reproducing them as well would help. I can do it but it will take a non-negligible amount of my time without contributing much clear value to the project.

AFAICT no one has done any real work/testing on Sayma for months now. It would be useful to understand what the issues are in more detail, who is going to work on them and when.

@sbourdeauducq
Copy link
Member Author

Right now I don't have a good description of the bugs from @sbourdeauducq.

Install the beta firmware (with synchronization) and reboot the board a few times while looking at the log. You'll see the sync errors, unless this is a problem with my board in particular.

@hartytp
Copy link
Collaborator

hartytp commented Mar 30, 2020

Can you post a log?

Anyway, currently I do not know how the synchronisation process works in any details. It's been heavily rewritten/modified since I last used Sayma. As I don't think there are any docs, I don't expect that I would understand the log messages without investing a significant amount of time reverse engineering the process from the source code. I can do that, but it would be time consuming and I'm not clear that it would add any value to the project.

I'm not clear on the responsibilities or expected time commitments for Sayma. I naively assume that the plan here is for @sbourdeauducq to investigate the synchronisation issues on hw he already has. But please correct me if that's not the plan/not possible.

@jbqubit
Copy link
Contributor

jbqubit commented Mar 30, 2020

There are unexplained/unpredictable/obscure bugs and I don't have a timeframe.
Do please get on with trying to reproduce these bugs and create Issues.

This was in reply to @sbourdeauducq. I'd like to know more about the "unexplained/unpredictable/obscure bugs."

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Apr 2, 2020

we know DRTIO works well.

We know that it works well:

  1. on Kasli.
  2. with the old siphaser alignment algorithm, which has since been replaced (cc58318) but not tested as thoroughly as the first version.

@hartytp If you want to help, could you validate, with the latest code, that you get reproducible RTIO clock phases (and to what tolerance?) between AMC and Kasli, and between AMC and RTM? You can check the outputs of the three Si5324s after the message "INFO(board_artiq::si5324::siphaser): calibration successful" on the satellites. If that doesn't work, this should be fixed before further synchronization attempts.

@hartytp
Copy link
Collaborator

hartytp commented Apr 2, 2020

OK. Will give it a go next time I have access to my lab.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Apr 2, 2020

Okay so it seems that the synchronization problems are three-fold:

  • The most frequent issue is caused by the calibration routine setting sysref_ddmtd_phase_fpga to 0, and then subsequent sync attempts fail. Likely this is a straightforward bug in the sync algorithms that was exposed by different timings on the v2 hardware. It can be worked around by setting another appropriate value into the flash manually and not running calibration again. Then the synchronization logs often look clean (which suggests siphaser also works, though it still would be good to specifically test it to make sure). I'll look at the DAC output synchronization against Urukul shortly, for those boots where the sync log was clean.
  • Marginal AMC DDMTD jitter on SAWG builds. This was seen before on Sayma v1 (where it was much worse to the point of rendering the DDMTD core completely unusable - VCCINT problems?) and the solution will be to move the DDMTD core back to the RTM side. Most of the time an error does not happen, and it never happens on non-SAWG builds. This is reported in the log like this:
[     8.213517s]  INFO(satman::jdcg::jesd204sync): testing DDMTD stability (raw=true, tolerance=4)...    
[     8.508891s]  INFO(satman::jdcg::jesd204sync):   ...passed, peak-peak jitter: 2                      
[     8.514808s]  INFO(satman::jdcg::jesd204sync): testing DDMTD stability (raw=false, tolerance=1)...   
[     9.916696s] ERROR(satman::jdcg::jesd204sync):   ...excessive peak-peak jitter: 2 (min=74 max=76 center_offset=64)
[     9.925652s] ERROR(satman::jdcg::jesd204sync): failed to align SYSREF at FPGA: excessive DDMTD peak-peak jitter
  • General DRTIO corruption as mentioned above, when that happens the sychronization code also reports all sorts of errors.

@hartytp
Copy link
Collaborator

hartytp commented Apr 2, 2020

General DRTIO corruption as mentioned above, when that happens the sychronization code also reports all sorts of errors.

Where is the DRTIO corruption described?

@sbourdeauducq
Copy link
Member Author

#795 (comment)

@hartytp
Copy link
Collaborator

hartytp commented Apr 2, 2020

Thanks! Uuurgh, nasty! Any plan for investigating?

@dnadlinger
Copy link
Collaborator

visible symptom is a storm of broken aux packets

FWIW, we very rarely (once every few weeks between 5 DRTIO links) see bursts of invalid aux packet errors on Kasli <-> Kasli links as well. Might not be related.

@jbqubit
Copy link
Contributor

jbqubit commented Apr 2, 2020

I'm not clear on the responsibilities or expected time commitments for Sayma.

Contracts spell out who is doing what.

General DRTIO corruption and sysref_ddmtd_phase_fpga are definitely in the M-Labs wheelhouse.

"Marginal AMC DDMTD jitter on SAWG builds." If this never happens on non-SAWG builds it suggests the hardware is working correctly. Given how subtle these configuration-dependent "features" are in Vivado/Ultrascale/GTH I'm not sure that @hartytp can substantially contribute.

It makes most sense for @hartytp to work on testing aspects that are part of his contractual focus and/or tap his unique skills.

@sbourdeauducq
Copy link
Member Author

There is still this bug where outputs are randomly 180 degrees out of phase (aka "55ns bug" since I've been using a 9MHz waveform to test, but it really is an inversion - which becomes obvious when changing the frequency). This can't be caused by JESD204 sync problems. @jordens Any idea? Can that be inside SAWG e.g. something that isn't reset?

@jordens
Copy link
Member

jordens commented Apr 7, 2020

What are you doing exactly, what do you expect and what are you observing?
There is a lot of channel and sample shuffling in JESD and the DAC. I can imagine a lot of situations where this happens, independent of SAWG.
If you think this is really an inversion in SAWG, you should verify that using the envelope and not the phase. I.e. time a change (in amplitude or phase) on two channels. Or use the DC spline.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Apr 7, 2020

Start this and keep it running:
https://github.com/m-labs/artiq/blob/master/artiq/examples/kasli_sawgmaster/repository/sines_urukul_sayma.py

Reboot Sayma and measure phase difference between Urukul and Sayma after each Sayma reboot. It randomly jumps by 180 degrees instead of remaining constant.

If you think this is really an inversion in SAWG, you should verify that using the envelope and not the phase. I.e. time a change (in amplitude or phase) on two channels. Or use the DC spline.

Can you try doing that? Or try other experiments to find out where the problem is?

@jordens
Copy link
Member

jordens commented Apr 7, 2020

Start this and keep it running:

Can that be reduced to an MRE?

measure phase difference between Urukul and Sayma

How would I do that?

Can you try doing that? Or try other experiments to find out where the problem is?

Not in the next days. Probably much more efficient if you continue debugging it on-site.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Apr 7, 2020

Can that be reduced to an MRE?

I am open to suggestions on how to reduce that code. I don't see any obvious ones.

How would I do that?

The second Red Pitaya that I connected has one channel on Sayma and the other on Urukul.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Apr 8, 2020

After replacing SAWG with a simple DDS core there are no more inversions and synchronization at 2.4GHz often works. So the issue is either in SAWG or the sines_urukul_sayma.py experiment. @jordens you are the most familiar with SAWG so please look into it.

Probably much more efficient if you continue debugging it on-site.

I'm just also using the Red Pitaya and the remote development infra, and using a physical oscilloscope etc. would not make it more efficient.

@sbourdeauducq
Copy link
Member Author

I posted this before but here is the script I am using to measure the Urukul/Sayma phase difference.

import socket
import numpy as np
import matplotlib.pyplot as plot
from scipy import signal, constants


class RPSCPI:
    def connect(self, host):
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.sock.connect((host, 5000))
        self.sock_f = self.sock.makefile()

    def close(self):
        self.sock_f.close()
        self.sock.close()

    def sendmsg(self, msg):
        self.sock.send(msg.encode() + b"\r\n")

    def recvmsg(self):
        return self.sock_f.readline().strip()

    def trigger(self):
        self.sendmsg("ACQ:START")
        self.sendmsg("ACQ:TRIG NOW")
        while True:
            self.sendmsg("ACQ:TRIG:STAT?")
            if self.recvmsg() == "TD":
                break

    def get_data(self, channel):
        self.sendmsg("ACQ:SOUR{}:DATA?".format(channel))
        buff_string = self.recvmsg()[1:-1].split(',')
        return np.array(list(map(float, buff_string)))


rp = RPSCPI()
rp.connect("127.0.0.1")
try:
    rp.trigger()
    y1 = rp.get_data(1)
    y2 = rp.get_data(2)

    t = np.arange(y1.shape[0])/125e6
    y = np.c_[y1, y2].T
    z = signal.decimate(y*np.exp(1j*2*np.pi*9e6*t), q=10, ftype="fir", zero_phase=True)[:, 10:]
    z = signal.decimate(z, q=10, ftype="fir", zero_phase=True)[:, 10:]
    angle = np.angle(np.mean(z[0]*z[1].conj()))

    print(angle)

    plot.plot(y1)
    plot.plot(y2)
    plot.show()

finally:
    rp.close()

@jordens
Copy link
Member

jordens commented Apr 8, 2020

So the issue is either in SAWG or the sines_urukul_sayma.py experiment.

We've had plenty of examples in the past where this simple logic was plain wrong.

I'm just also using the Red Pitaya and the remote development infra, and using a physical oscilloscope etc. would not make it more efficient.

That's intentionally ignoring the most important differences.
As I have explained before, some of the unresolved major problems are:

  • long network round trips making interactive work impossible and everything else slow
  • no documentation of the setup, no maintenance of that documentation
  • long round trip delays across time zones for questions and discussion

@sbourdeauducq
Copy link
Member Author

We've had plenty of examples in the past where this simple logic was plain wrong.

What is your better idea then?

@sbourdeauducq
Copy link
Member Author

That's intentionally ignoring the most important differences.

The network delays are small compared to the compilation/flashing/boot time of Sayma, and are not really important for grabbing waveforms with the script above. What is the interactive work you want to do?

Most of the information about the setup is in your inbox, if you have questions please ask.

@jbqubit
Copy link
Contributor

jbqubit commented Aug 22, 2020

@jordens Have you looked into this?

@jordens
Copy link
Member

jordens commented Dec 15, 2020

Code
from artiq.experiment import *

class SinesUrukulSayma(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.sawgs = [self.get_device("sawg"+str(i)) for i in range(4)]
        self.basemod = self.get_device("basemod_att0")
        self.rfsws = [self.get_device("sawg_sw"+str(i)) for i in range(4)]
        self.ttl = [self.get_device("ttl_mcx"+str(i)) for i in range(4)]

    # DRTIO destinations:
    # 0: local
    # 1: Sayma AMC
    # 2: Sayma RTM
    @kernel
    def drtio_is_up(self):
        for i in range(3):
            if not self.core.get_rtio_destination_status(i):
                return False
        return True

    @kernel
    def run(self):
        f = 50*MHz
        sawg_ftw = self.sawgs[0].frequency0.to_mu(f)

        self.core.reset()

        print("waiting for DRTIO ready...")
        while not self.drtio_is_up():
            pass
        print("OK")

        delay(10*ms)
        self.core.reset()
        delay(10*ms)

        rfsw = self.rfsws[3]
        sawg = self.sawgs[0]
        ttl = self.ttl[0]

        ttl.output()

        delay(10*ms)
        self.basemod.reset()
        delay(10*ms)
        self.basemod.set(3., 3., 3., 3.)
        delay(10*ms)

        rfsw.on()
        delay(1*ms)

        sawg.reset()
        delay(1*us)
        sawg.config.set_clr(1, 1, 1)
        delay(100*us)
        # align to coarse RTIO period
        at_mu(now_mu() & ~0x7)
        sawg.frequency0.set_mu(sawg_ftw)
        delay_mu(1<<8)
        sawg.phase0.set_mu(0)  # sawg_ftw*now_mu() >> 17)
        delay_mu(1<<8)
        sawg.amplitude1.set(.4)
        delay_mu(1<<9)
        sawg.amplitude1.set(.0)
        delay_mu(1<<8)
        ttl.on()
        delay_mu(1<<9)
        ttl.off()

image

  • 3: Sayma MCX TTL0 out
  • 4: Sayma SAWG0 out 50 MHz, brown traces being infinite persistence
  • Sayma v2 AMC and RTM (the one with broken DAC1 note)
  • 150 MHz RTIO Satellite behind Kasli v1.1 Master
  • ARTIQ 1ce505c (sayma_amc -V satellite --sfp)
Kasli log
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |
| |  | | |___) | (_) | |___
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2020 M-Labs Limited

Bootloader CRC passed
Gateware ident 6.unknown.beta;saymamaster
Initializing SDRAM...
Read leveling scan:
Module 1:
00000000000011111111111000000000
Module 0:
00000000000011111111111000000000
Read leveling: 17+-5 17+-5 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000009s]  INFO(runtime): ARTIQ runtime starting...
[     0.003933s]  INFO(runtime): software ident 6.unknown.beta;saymamaster
[     0.010474s]  INFO(runtime): gateware ident 6.unknown.beta;saymamaster
[     0.017038s]  INFO(runtime): log level set to INFO by default
[     0.022763s]  INFO(runtime): UART log level set to INFO by default
[     0.029142s]  INFO(runtime::rtio_clocking): using internal RTIO clock (by default)
[     0.307714s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     7.179313s]  INFO(board_artiq::si5324):   ...locked
[     7.210244s]  INFO(runtime): network addresses: MAC=80-1f-12-4b-9a-70 IPv4=10.34.16.105 IPv6-LL=fe80::821f:12ff:fe4b:9a70 IPv6=no configured address
[     7.227612s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 1 1 0; 3: 2 0; 4: 2 1 0; }
[     7.242160s]  INFO(runtime::mgmt): management interface active
[     7.256472s]  INFO(runtime::session): accepting network sessions
[     7.271873s]  INFO(runtime::session): running startup kernel
[     7.276358s]  INFO(runtime::session): no startup kernel found
[     7.282160s]  INFO(runtime::session): no connection, starting idle kernel
[     7.288987s]  INFO(runtime::session): no idle kernel found
[     7.294408s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
[    10.341471s]  INFO(runtime::session): new connection from 10.34.16.10:39242
[    10.395059s]  INFO(runtime::kern_hwreq): resetting RTIO
[    31.160400s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[    31.367861s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 1 packets
[    31.590554s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
[    31.597387s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
[    31.603746s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128
[    31.610971s]  INFO(runtime::rtio_mgt::drtio): [DEST#2] destination is up
[    31.617317s]  INFO(runtime::rtio_mgt::drtio): [DEST#2] buffer space is 128
[    31.626890s]  INFO(runtime::kern_hwreq): resetting RTIO
[    31.641872s]  INFO(runtime::session): no connection, starting idle kernel
[    31.647773s]  INFO(runtime::session): no idle kernel found
[    32.024457s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link is down
[    32.029452s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is down
[    32.036324s]  INFO(runtime::rtio_mgt::drtio): [DEST#2] destination is down
[    32.874024s]  INFO(runtime::session): new connection from 10.34.16.10:39268
[    32.927782s]  INFO(runtime::kern_hwreq): resetting RTIO
[    53.295260s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[    53.502907s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 2 packets
[    53.714975s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
[    53.721903s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
[    53.728179s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128
[    53.735504s]  INFO(runtime::rtio_mgt::drtio): [DEST#2] destination is up
[    53.741750s]  INFO(runtime::rtio_mgt::drtio): [DEST#2] buffer space is 128
Sayma log
 __  __ _ ____         ____
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |
| |  | | |___) | (_) | |___                                                                                                                                                                                                                                                                                                                                                                                                              
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader                                                                                                                                                                                                                                                                                                                                                                                                                         
Copyright (c) 2017-2020 M-Labs Limited

Bootloader CRC passed                                                                                                                                                                                                                                                                                                                                                                                                                    
Gateware ident 6.unknown.beta;satellite.sawg
Initializing SDRAM...
DQS initial delay: 113 taps                                                                                                                                                                                                                                                                                                                                                                                                              
Write leveling scan:
Module 3:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000111100000000000000000000000000000000000000000000                          
Module 2:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111100000000000000000000000000000000000                          
Module 1:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000                          
Module 0:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000                          
DQS initial delay: 113 taps
Write leveling: 92 116 133 128 done
Read leveling scan:
Module 3:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 238+-90 234+-96 212+-91 209+-93 done
SDRAM initialized
Memory test passed

Loading slave FPGA gateware...
  magic: 0x5352544d, length: 0x0019de0c
  DONE before loading
  ...done
Booting from flash...
Starting firmware.
[     0.000007s]  INFO(satman): ARTIQ satellite manager starting...
[     0.004747s]  INFO(satman): software ident 6.unknown.beta;satellite.sawg
[     0.011444s]  INFO(satman): gateware ident 6.unknown.beta;satellite.sawg
[     0.271861s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     2.193364s]  INFO(board_artiq::si5324):   ...locked
[     2.401645s]  INFO(satman): uplink is up, switching to recovered clock
[     2.434705s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     4.058243s]  INFO(board_artiq::si5324):   ...locked
[     7.356846s]  INFO(board_artiq::si5324::siphaser): calibration successful, lead: 194, width: 432 (347deg)
[     7.473460s]  INFO(satman::repeater): [REP#0] link RX became up, pinging
[     8.589086s]  INFO(satman::repeater): [REP#0] remote replied after 12 packets
[     8.623903s]  INFO(satman::jdcg::jdac): DAC-0 initializing...
[     9.222681s]  INFO(satman::jdcg::jdac):   ...done
[     9.225993s]  INFO(satman::jdcg::jdac): DAC-1 initializing...
[     9.785916s] ERROR(satman::jdcg::jdac): JESD DAC basic request failed (dacno=1, reqno=3)
[     9.792615s]  INFO(satman::jdcg::jesd204sync): testing DDMTD stability (raw=true, tolerance=4)...
[    10.240512s]  INFO(satman::jdcg::jesd204sync):   ...passed, peak-peak jitter: 4
[    10.246431s]  INFO(satman::jdcg::jesd204sync): testing DDMTD stability (raw=false, tolerance=1)...
[    10.812547s]  INFO(satman::jdcg::jesd204sync):   ...passed, peak-peak jitter: 1
[    10.818467s]  INFO(satman::jdcg::jesd204sync): testing HMC7043 SYSREF slip against DDMTD...
[    12.128281s]  INFO(satman::jdcg::jesd204sync):   ...passed
[    12.132375s]  INFO(satman::jdcg::jesd204sync): determining SYSREF S/H limits...
[    12.790629s]  INFO(satman::jdcg::jesd204sync):   SYSREF S/H average limits (DDMTD phases): 61 125
[    12.798110s]  INFO(satman::jdcg::jesd204sync):   SYSREF S/H maximum limit deviation: 0 0
[    12.806185s]  INFO(satman::jdcg::jesd204sync):   ...done
[    12.811518s]  INFO(satman::jdcg::jesd204sync): using FPGA SYSREF DDMTD phase target from config: 120
[    12.820599s]  INFO(satman::jdcg::jesd204sync): aligning SYSREF with RTIO clock...
[    14.127290s] ERROR(satman::jdcg::jesd204sync): failed to align SYSREF at FPGA: failed to reach SYSREF DDMTD phase target
[    14.136790s]  INFO(satman::jdcg::jesd204sync): using DAC SYSREF delay from config: 17
[    14.144585s]  INFO(satman::jdcg::jesd204sync): verifying SYSREF margins at DAC-0...
[    14.176512s] ERROR(satman::jdcg::jesd204sync):   rotation at delay=17 is 0 delay steps from target (FAIL)
[    14.184688s] ERROR(satman::jdcg::jesd204sync): failed to align SYSREF at DAC: insufficient SYSREF margin at DAC
[    14.676338s]  INFO(satman): TSC loaded from uplink
[    14.679736s]  INFO(satman::jdcg::jesd204sync): aligning SYSREF with RTIO TSC...
[    14.724329s]  INFO(satman::jdcg::jesd204sync):   ...done
[    14.728250s]  INFO(satman::jdcg::jesd204sync): resynchronizing DAC-0
[    14.735675s]  INFO(satman::jdcg::jesd204sync): resynchronizing DAC-1
[    14.866644s]  INFO(satman): rank: 1
[    14.868742s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 1 1 0; 3: 2 0; 4: 2 1 0; }
[    14.919746s]  INFO(satman): resetting RTIO 

There are two non-determinisms and one problem.

  1. Sometimes (~4e-2) it outputs 300 MHz (600 MS/s) large scale data.
  2. Sometimes (~1e-1) e-there is a 32 sample delay.
  3. Most of the time the delay is either 0 or 2 samples.

Note that:

  • there is no "180 phase shift"
  • no other category of behavior that I've seen in a couple hundred reboots
  • one RTIO period is 4 samples

I'd propose:

  1. Ensure delay determinism across two TTL MCX out on different Saymas (or Kasli-Sayma). This shouldn't factor in the above case though but would cover only DRTIO and exclude JESD and SAWG.
  2. Verify JESD, DAC setup and CDCs by using the simple satellite to output a gated (using a single-bit RTIO channel) fixed 50 MHz square wave (needs to be done from the RTIO CD!). Ensure that delay is deterministic.

@sbourdeauducq sbourdeauducq closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:gateware area:sawg smart arbitrary waveform generator (f.k.a. phaser) area:sayma
Projects
None yet
Development

No branches or pull requests

7 participants