Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAC testing #141

Open
marmeladapk opened this issue Dec 9, 2019 · 35 comments
Open

DAC testing #141

marmeladapk opened this issue Dec 9, 2019 · 35 comments
Milestone

Comments

@marmeladapk
Copy link
Member

@jbqubit asked me to write about our problems with getting DAC output. This was the only real obstacle for us to finish HT3 currently (because DAC output is required for tests in crate) and to distribute hardware.

We were using AD IP-core for JESD and with it we couldn't get a stable JESD link. Initial phases went ok but it failed at synchronization phase.
Recently @atcher0 used Xilinx IP-core for JESD, which established the link. Today we got output on all channels (JESD subclass 1, so with synchronization)! The only problem is that some channels have 10x the signal than others, however we still need to write controls for attenuators properly, so this may solve itself quickly.

In parallel I was testing if ARTIQ was providing DAC output. I'm using eb271f38 from ARTIQ. Initially I got DAC sync failed: no sync lock. This was caused by no signal on HMC7043 SYSREF outputs. After fixing this (pull request m-labs/artiq#1402 ) I got stuck on failed to align SYSREF at DAC: no rotation seen when scanning DAC SYSREF delay. When I checked SYSREF input, the signal was changing phase, so I don't know the cause.

Anyway, I used Kasli as SAWGMaster + code from sines_2sayma example. I used testmod, which outputs the signal directly. No signal is present at any output. Btw heads up @sbourdeauducq, we think that baluns on testmods may be broken, because we get output on basemods and not on testmods (or pin assignment is wrong).

I think we'll work in parallel on Xilinx IP-core (we've got eval license until March) and I can work with @sbourdeauducq on ARTIQ, since we have the same setup now.

@gkasprow
Copy link
Member

gkasprow commented Dec 9, 2019

baluns are extremely fragile and can be damaged easily

@jbqubit
Copy link
Collaborator

jbqubit commented Dec 9, 2019

@sbourdeauducq Did you receive AFEs from Creotech?

@jbqubit
Copy link
Collaborator

jbqubit commented Dec 9, 2019

The only problem is that some channels have 10x the signal than others, however we still need to write controls for attenuators properly, so this may solve itself quickly.

@marmeladapk What do you see on TestMod? If your baluns are broken replace them.

@atcher0
Copy link

atcher0 commented Dec 9, 2019

Screenshot of DAC outputs (from BaseMod):
image

@hartytp
Copy link
Collaborator

hartytp commented Dec 9, 2019

Cool!

@jbqubit
Copy link
Collaborator

jbqubit commented Dec 9, 2019

Great!

@sbourdeauducq
Copy link
Member

Anyway, I used Kasli as SAWGMaster + code from sines_2sayma example.

For checking if the DAC has any output at all, I suggest compiling the AMC gateware with the --without-sawg option instead. This is faster to compile, and there's a much simpler stack with fewer things that can go wrong. Every DAC channel outputs a signal at all times (hardcoded in gateware).

Did you receive AFEs from Creotech?

Yes, I did.

@marmeladapk
Copy link
Member Author

Today, after following @sbourdeauducq instructions, I used artiq release-5 (without sawg) and I got output on one of the DACs.

obraz

@jbqubit
Copy link
Collaborator

jbqubit commented Jan 16, 2020

Did the other channels fail or were they not tried?

@sbourdeauducq
Copy link
Member

They fail, the JESD core for DAC0 fails systematically on all boards with a "not ready" error.

@marmeladapk
Copy link
Member Author

marmeladapk commented Jan 16, 2020 via email

@jbqubit
Copy link
Collaborator

jbqubit commented Jan 17, 2020

Recap from chat with @marmeladapk. He sent a working example (.bit) to M-Labs that exercises the DACs using Creotech+Xilinx IP. His example produces RF output on all channels on his boards. And it checks a DAC register to confirm if SYSREF signal is present on the DAC side. He prefers to wait to send replacement hardware to M-Labs until more is known about this being a hardware vs ARTIQ config issue.

@sbourdeauducq
Copy link
Member

Trying to narrow down the bug...

The "not ready" error on DAC0 is still present with the JESD core modified like this:

diff --git a/jesd204b/link.py b/jesd204b/link.py
index d3e9ef3..c142956 100644
--- a/jesd204b/link.py
+++ b/jesd204b/link.py
@@ -292,9 +292,7 @@ class JESD204BLinkTX(Module):
             source.data.eq(cgs.source.data),
             source.ctrl.eq(cgs.source.ctrl),
             # start ILAS on first LMFC after jsync is asserted
-            If(jsync & jref_rising,
-                NextState("ILAS")
-            )
+            NextState("USER_DATA")
         )
 
         # Initial Lane Alignment Sequence

But it goes away when it is modified like this:

diff --git a/jesd204b/core.py b/jesd204b/core.py
index 1a6f563..e80bd38 100644
--- a/jesd204b/core.py
+++ b/jesd204b/core.py
@@ -141,7 +141,7 @@ class JESD204BCoreTXControl(Module, AutoCSR):
 
             self.jsync.status.eq(core.jsync_sys),
 
-            self.ready.status.eq(core.ready)
+            self.ready.status.eq(1)
         ]
 
         # restart monitoring

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 19, 2020

This also makes the error disappear:

diff --git a/jesd204b/core.py b/jesd204b/core.py
index 1a6f563..505f2cc 100644
--- a/jesd204b/core.py
+++ b/jesd204b/core.py
@@ -72,7 +72,7 @@ class JESD204BCoreTX(Module):
             self.submodules += link
             links.append(link)
             self.comb += [
-                link.reset.eq(link_reset),
+                link.reset.eq(0),
                 link.jsync.eq(self.jsync_jesd),
                 link.jref.eq(self.jref)
             ]

link_reset is generated like this:

self.comb += link_reset.eq(~reduce(and_, [phy.transmitter.init.done for phy in phys]))

It could be that some of the GTH transceivers on the DAC0 lanes fail to initialize (marginal operating conditions?).

@marmeladapk How did you clock the transceivers in your tests (with other JESD core) where the DACs worked? I'm running them from the Si5324 (+HMC830/HMC7043) and it could be that excessive jitter aggravates the problem?

@enjoy-digital
Copy link

@sbourdeauducq: the AD9154 is able to report a status for the CGS/ILAS, i think you should make the firmware output it (it was possible in previous firmware IIRC). First see if CGS is passing (with the registers of the AD9154 or by looking at the JSYNC since JSYNC is asserted when CGS is detected on all lanes). Then if CGS is fine, get the ILAS status from the AD9154. Also, you should be able to use PRBS without the AD9154 fully syncing, we only need to have the AD9154 PLL to be locked for that. This would help narrowing down the issue.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 19, 2020

Thanks for the information.
Yes, the DAC diagnostics code is still there.
But it seems to me that the JESD core fails without any involvement of the DAC. The JSYNC line is only used in link.py, and when I ignore it with this patch:

-            If(jsync & jref_rising,
-                NextState("ILAS")
-            )
+            NextState("USER_DATA")
         )

then the problem is still present.

This patch:

-                link.reset.eq(link_reset),
+                link.reset.eq(0),

seems to indicate that the problem is coming from link_reset, which itself is derived from the GTH - the latter, like most XilinxⓇ hardware blocks, being a notorious troublemaker prone to non-deterministic failures (the JESD code is the same for both DACs, but fails mostly on DAC0).

Am I missing out something?

@enjoy-digital
Copy link

Indeed, you also have to check that the GTH are initialized correctly and init going to READY state on all lanes: https://github.com/m-labs/jesd204b/blob/master/jesd204b/phy/gth_init.py#L111

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 19, 2020

Using cdr_clk_clean_2 or cdr_clk_clean_3 to clock JESD has no visible impact on the behavior.

@sbourdeauducq
Copy link
Member

Indeed, you also have to check that the GTH are initialized correctly and init going to READY state on all lanes: https://github.com/m-labs/jesd204b/blob/master/jesd204b/phy/gth_init.py#L111

With this patch, there's still "ready timeout" on DAC0 and not on DAC1:

diff --git a/jesd204b/phy/gth_init.py b/jesd204b/phy/gth_init.py
index e048a0a..8bd3ca4 100644
--- a/jesd204b/phy/gth_init.py
+++ b/jesd204b/phy/gth_init.py
@@ -7,7 +7,7 @@ from migen.genlib.misc import WaitTimer
 
 class GTHInit(Module):
     def __init__(self, sys_clk_freq, rx):
-        self.done = Signal()
+        self.done = Signal(reset=1)
         self.restart = Signal()
 
         # GTH signals
@@ -49,10 +49,11 @@ class GTHInit(Module):
         startup_fsm = ResetInserter()(FSM(reset_state="RESET_ALL"))
         self.submodules += startup_fsm
 
+        done = Signal()
         ready_timer = WaitTimer(1*sys_clk_freq//1000)
         self.submodules += ready_timer
         self.comb += [
-            ready_timer.wait.eq(~self.done & ~startup_fsm.reset),
+            ready_timer.wait.eq(~done & ~startup_fsm.reset),
             startup_fsm.reset.eq(self.restart | ready_timer.done)
         ]
 
@@ -109,5 +110,5 @@ class GTHInit(Module):
             If(Xxphaligndone_rising, NextState("READY"))
         )
         startup_fsm.act("READY",
-            self.done.eq(1)
+            done.eq(1)
         )

@enjoy-digital
Copy link

I'm not sure what's the patch is doing, be the checks that needs to be done are:

  • are all the GTH initialized correctly if a the clock is provided and restart pulsed once?
  • if so, then check if CGS is received correctly by the DAC. If not, then maybe do a check with another clock if you have doubt with the one used.

@sbourdeauducq
Copy link
Member

This patch:

-                link.reset.eq(link_reset),
+                link.reset.eq(0),

seems to indicate that the problem is coming from link_reset

Ignore this. The behavior is non-deterministic and it could just have been chance.

@sbourdeauducq
Copy link
Member

Also there are multiple layers to this bug. I have added GTH init diagnostics:
m-labs/artiq@eb0ce93
m-labs/jesd204b@ed7dc91

Sometimes it fails with "JESD core PHY not done" (which previously would have been indistinguishable from other problems), and sometimes with the "not ready" error from other causes.

@enjoy-digital
Copy link

To ease debug, maybe you should disconnect the JSYNC from the restart here:
https://github.com/m-labs/jesd204b/blob/master/jesd204b/core.py#L36 or add a control signal (jsync_enable?) that you could control from the firmware. This way, once the firmware enable the JESD PHYs, link, core, there will be a single attempt to initialize the link, and looking having a "JESD core PHY not done" would really mean the PHY are not initializing correctly and that there is an issue here. Can you use the same clock source that is used for the demo with the Xilinx core?

@sbourdeauducq
Copy link
Member

What is the purpose of reinitializing the transceivers when JSYNC gets deasserted?

@sbourdeauducq
Copy link
Member

I also don't get why the transceivers should be restarted when self.prbs_config == 0.

@sbourdeauducq
Copy link
Member

Shouldn't this restart be simply handled in link.py: in the USER_DATA state, if jsync goes to 0, then return to CGS?

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 20, 2020

m-labs/jesd204b@2d135e3 @enjoy-digital can you review?
With this patch, I have not seen "JESD core PHY not done" so far.

@enjoy-digital
Copy link

The transceivers were reinitialized because we don't know the reason of JSYNC de-assert from the DAC and could be caused by the transceiver not operating correctly. The restart was only enabled when PRBS was not used.

Your changes seems fine yes and should simplify understanding the issue. The "JESD core PHY not done" was probably that JESD was constantly re-initialized and that the ready CSR was read while the PHY was still doing the initialization.

So now that you know the PHY are initialized correctly, the next step is to enable debug on the DAC and get the CGS/ILAS status.

@sbourdeauducq
Copy link
Member

Yes, this separation and the diagnostic make things a lot clearer!
It now seems JSYNC is the issue. Could be this #148 ... sigh

@sbourdeauducq
Copy link
Member

Changing to LVDS3/LVDS11:

--- a/migen/build/platforms/sinara/sayma_amc2.py
+++ b/migen/build/platforms/sinara/sayma_amc2.py
@@ -258,13 +258,13 @@ _io = [
         IOStandard("LVDS"), Misc("DIFF_TERM_ADV=TERM_100")
     ),
     ("dac_sync", 0,
-        Subsignal("p", Pins("J8")),
-        Subsignal("n", Pins("H8")),
+        Subsignal("p", Pins("L8")),
+        Subsignal("n", Pins("K8")),
         IOStandard("LVDS"), Misc("DIFF_TERM_ADV=TERM_100")
     ),
     ("dac_sync", 1,
-        Subsignal("p", Pins("J13")),
-        Subsignal("n", Pins("H13")),
+        Subsignal("p", Pins("J9")),
+        Subsignal("n", Pins("H9")),
         IOStandard("LVDS"), Misc("DIFF_TERM_ADV=TERM_100")
     ),
     ("dac_jesd", 0,

does not improve things (I'm now getting errors on both DACs).
Let's wait for @gkasprow to tell us on which pins those JSYNC signals really are.

@sbourdeauducq
Copy link
Member

It turns out that the pins in current Migen are OK.
I'm now printing the the DAC status when the error happens: m-labs/artiq@c4e4d67
the DAC looks quite unhappy:

[     2.332374s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.337605s]  INFO(board_artiq::ad9154): AD9154-1 found
[     2.349938s]  INFO(satman): uplink is up, switching to recovered clock
[     2.384434s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     3.990946s]  INFO(board_artiq::si5324):   ...locked
[     6.707777s]  INFO(board_artiq::si5324::siphaser): calibration successful, lead: 94, width: 426 (342deg)
[     7.014068s]  INFO(satman): TSC loaded from uplink
[     7.065047s]  INFO(board_artiq::ad9154): Status of DAC-0
[     7.069018s]  INFO(board_artiq::ad9154): SERDES_PLL_LOCK: 0
[     7.074613s]  INFO(board_artiq::ad9154): 
[     7.078650s]  INFO(board_artiq::ad9154): CODEGRPSYNC: 0x00
[     7.084167s]  INFO(board_artiq::ad9154): FRAMESYNC: 0x00
[     7.089508s]  INFO(board_artiq::ad9154): GOODCHECKSUM: 0x00
[     7.095113s]  INFO(board_artiq::ad9154): INITLANESYNC: 0x00
[     7.100707s]  INFO(board_artiq::ad9154): 
[     7.104745s]  INFO(board_artiq::ad9154): DID_REG: 0x00
[     7.109911s]  INFO(board_artiq::ad9154): BID_REG: 0x00
[     7.115077s]  INFO(board_artiq::ad9154): SCR_L_REG: 0x00
[     7.120418s]  INFO(board_artiq::ad9154): F_REG: 0x00
[     7.125410s]  INFO(board_artiq::ad9154): K_REG: 0x00
[     7.130401s]  INFO(board_artiq::ad9154): M_REG: 0x00
[     7.135392s]  INFO(board_artiq::ad9154): CS_N_REG: 0x00
[     7.140646s]  INFO(board_artiq::ad9154): NP_REG: 0x00
[     7.145725s]  INFO(board_artiq::ad9154): S_REG: 0x00
[     7.150716s]  INFO(board_artiq::ad9154): HD_CF_REG: 0x00
[     7.156057s]  INFO(board_artiq::ad9154): RES1_REG: 0x00
[     7.161311s]  INFO(board_artiq::ad9154): RES2_REG: 0x00
[     7.166622s]  INFO(board_artiq::ad9154): LIDx_REG: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
[     7.174941s]  INFO(board_artiq::ad9154): CHECKSUMx_REG: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
[     7.183697s]  INFO(board_artiq::ad9154): COMPSUMx_REG: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
[     7.192309s]  INFO(board_artiq::ad9154): BADDISPARITY: 0x00

This was with reinitializations disabled (but it seems it doesn't initialize at all anyway):

diff --git a/jesd204b/link.py b/jesd204b/link.py
index 89cebaf..9584d54 100644
--- a/jesd204b/link.py
+++ b/jesd204b/link.py
@@ -312,5 +312,5 @@ class JESD204BLinkTX(Module):
             ilas.reset.eq(1),
             self.ready.eq(1),
             source.eq(inserter.source),
-            If(~jsync, NextState("CGS"))
+            #If(~jsync, NextState("CGS"))
         )

@enjoy-digital
Copy link

From what i remember, even without JESD at all, SERDES_PLL_LOCK should be set to 1 with just the AD9154 SPI init and a valid clock input.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 20, 2020

Thanks for the hint.
m-labs/artiq@f4a5c45

Now JESD ready passes on both DACs, and DAC0 is working!
image
image
(ignore the funny voltage readings, scope is configured for 100x probe but connected directly to Sayma)

I don't understand why DAC1 JESD ready passed though.

DAC1 still shows the STPL error, but this could be a one-off problem with my board (since it was working before already on the boards at Creotech). @marmeladapk Can you try on your boards?

@sbourdeauducq
Copy link
Member

SAWG + TTL output
image

@marmeladapk
Copy link
Member Author

I had a moment to test it on DAC0 on one of the boards with simple sine wave. It seems to work. I didn't try anything more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants