Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sayma PRBS errors when FPGA JESD transceiver clock is GTP_CLK2 #1080

Closed
hartytp opened this issue Jun 22, 2018 · 174 comments
Closed

Sayma PRBS errors when FPGA JESD transceiver clock is GTP_CLK2 #1080

hartytp opened this issue Jun 22, 2018 · 174 comments

Comments

@hartytp
Copy link
Collaborator

hartytp commented Jun 22, 2018

Since doing the slave FPGA loading rework and upgrading to the latest master, I've started seeing PRBS errors roughly 100% of the time on boot. I never saw that before. Not sure if it's due to the rework or to changes in the code. IIRC @gkasprow saw this as well...

https://pastebin.com/2X0Y17B6

@gkasprow
Copy link
Collaborator

Maybe this has something to to with TXEN pin of DAC?
I observe this on two boards.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 22, 2018

Maybe. But it happens on both DACs and we only altered the TXEN on DAC2...

@gkasprow
Copy link
Collaborator

that's true. But this was the only modification I did. There is 3.3V -> 1.8V conversion using 200R resistor that injects current to 1.8V port of DAC and FPGA. Theoretically the FPGA has protection diodes, but DAC may not like voltage peaks of rougly 2.5V (1.8V + 0.7V of diode). I have no idea how this could affect second DAC channel in such bizarre way.
Another question to @sbourdeauducq is if there is any signal present on FPGA_CFG_DIN during normal operation. Is the logic level low after configuration?

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 22, 2018

My guess was that it's due to one of the recent ARTIQ commits rather than the HW changes. But, I might be wrong -- I haven't given it too much thought yet.

@gkasprow
Copy link
Collaborator

the funny thing is that I started seeing PRBS errors on one board a few days ago, another was workin well. And next day second board also got PRBS "sickness ".

@sbourdeauducq
Copy link
Member

Have you tried --without-sawg? I suspect that the corruption runs deeper than just the SDRAM.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 23, 2018

Okay. I'll try that and your blinker next to see if we can find some issue with a simpler logic block that we can focus on instead of debugging complex jesd/memory issues.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 23, 2018

One data point here: running with the SAWG held in reset, I don't see the "crash kernel" crash. But, I do see a bunch of errors during init (JESD PRBS, can't determine SYSREF margin at FPGA)

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 23, 2018

Note to self: try this with a no-sawg build. It would be interesting to see if there is a difference between no SAWG and SAWG in reset. If there is, then this seems much more like a vivado issue than a hardware issue.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

I rebuilt the current master without SAWG and still see this: https://hastebin.com/fafokucoqa.sql

I have never seen this until recently (around the time that slave loading was added), but it's been in 100% of my recent builds.

@sbourdeauducq do you see this issue on your board?

@sbourdeauducq
Copy link
Member

No I don't.

@sbourdeauducq
Copy link
Member

One change that is likely to have exposed this bug is this:
de7d64d
Try reverting this commit, it is not required for anything other than the DRTIO master with local DACs.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

I'm currently building with: f9910ab

@sbourdeauducq
Copy link
Member

de7d64d can be easily reverted on top of master. It's a very simple change (disable the other 7043 clock output - not required in theory but let's be paranoid - and change dac_refclk back to 0). The sysref phase doesn't have an impact on PRBS.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

Running current master with de7d64d reverted does indeed work.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jun 25, 2018

@gkasprow @enjoy-digital any idea why that happens? The GTH quad imbalance is the same in both cases, and the number of crossed quads is within spec.

@enjoy-digital
Copy link
Contributor

Do you have the same behaviour if you only keep de7d64d but disable the GTP_CLK1 output of HMC7043?

Could it be related to the fact that we now have two reference clocks active and still using QPLLXREFCLKSEL=0b001? (Table 2-8 of UG576)

@sbourdeauducq
Copy link
Member

Could it be related to the fact that we now have two reference clocks active and still using QPLLXREFCLKSEL=0b001?

The ARTIQ code is not using the QPLL. Should it?

@enjoy-digital
Copy link
Contributor

Ah sorry we are using CPLL. Then maybe check CPLLREFCLKSEL & https://github.com/m-labs/jesd204b/blob/master/jesd204b/phy/gth.py#L260. Should we connect GTREFCLK1 and set CPLLREFCLKSEL to 2?

@sbourdeauducq
Copy link
Member

Should we connect GTREFCLK1 and set CPLLREFCLKSEL to 2?

How would that help?

@sbourdeauducq
Copy link
Member

" a single external reference clock with multiple transceivers connected to multiple Quads. The user design connects the IBUDFS_GTE3 output (O) to the GTREFCLK0 ports of the GTHE3/4_COMMON and GTHE3/4_CHANNEL primitives for the GTH transceiver.
In this case, the Xilinx implementation tools make the necessary adjustments to the north/south routing as well as pin swapping necessary to route the reference clocks from one Quad to another when required"

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

Do you have the same behaviour if you only keep de7d64d but disable the GTP_CLK1 output of HMC7043?

Could it be related to the fact that we now have two reference clocks active and still using QPLLXREFCLKSEL=0b001? (Table 2-8 of UG576)

@sbourdeauducq do you still want me to do this? I'm a bit short on time atm...

@sbourdeauducq
Copy link
Member

No, that's unlikely to help. The other clock is either unrouted or routed to the other quad for DRTIO. And GTREFCLK0 is the correct setting as per the transceiver user guide I quoted above.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 25, 2018

@gkasprow remind me, did you try turning the HMC7043 GTP_CLK{1,2} outputs back to LVPECL? Did that help the PRBS errors?

@marmeladapk
Copy link
Contributor

@hartytp I did help with clock amplitude but not with PRBS errors.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 26, 2018

@marmeladapk that's what I thought, thanks for confirming. It was a long shot, but I wondered if this was some SI issue related to the low clock amplitudes using LVDS outputs in combination with 200R LVPECL bias resistors. If you've tested that then I won't bother looking at it again.

@hartytp
Copy link
Collaborator Author

hartytp commented Jun 28, 2018

I still see this error after fixing the Vccint supply (I measure 0.951V at the 0R power resistors): https://hastebin.com/hevawerodo.sql

I believe that @marmeladapk also found that the Vccint rework did not help the PRBS errors...

@marmeladapk
Copy link
Contributor

Yes, today I got these errors once.

@sbourdeauducq
Copy link
Member

@gkasprow Can you check SI and jitter on GTP_CLK1 and GTP_CLK2 on a board that exhibits this problem? And generally investigate this?

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

Okay, good!

So, the question is how the RTM affects the PRBS...Clock SI? Some PI issue? An issue with the DACs themselves?

@sbourdeauducq
Copy link
Member

Does that RTM work when using GTP_CLK1? We need to make sure this is the same issue.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

If you prepare a bitstream I will test immediately.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

PRBS occurs on both DACs
[ 11.228764s] INFO(board_artiq::ad9154): AD9154-0 found
[ 11.243918s] INFO(board_artiq::ad9154): AD9154-0 initializing...
[ 11.256823s] INFO(board_artiq::ad9154): ...done
[ 11.330706s] INFO(board_artiq::ad9154): AD9154-0 running PRBS test...
[ 12.336232s] WARN(board_artiq::ad9154): PRBS errors on lane0: 00010b
[ 12.341673s] WARN(board_artiq::ad9154): PRBS errors on lane1: 0000fc
[ 12.348273s] WARN(board_artiq::ad9154): PRBS errors on lane2: 000115
[ 12.354871s] WARN(board_artiq::ad9154): PRBS errors on lane3: 0000ee
[ 12.361470s] WARN(board_artiq::ad9154): PRBS errors on lane4: 000162
[ 12.368070s] WARN(board_artiq::ad9154): PRBS errors on lane5: 000142
[ 12.374669s] WARN(board_artiq::ad9154): PRBS errors on lane6: 000122
[ 12.381268s] WARN(board_artiq::ad9154): PRBS errors on lane7: 0000e3
[ 12.387749s] ERROR(board_artiq::ad9154): failed to initialize AD9154-0: PRBS failed
[ 12.395736s] INFO(board_artiq::ad9154): AD9154-1 found
[ 12.410890s] INFO(board_artiq::ad9154): AD9154-1 initializing...
[ 12.423793s] INFO(board_artiq::ad9154): ...done
[ 12.530025s] WARN(board_artiq::ad9154): AD9154-1 config attempt #1 failed (bad SYNC)
[ 12.546856s] INFO(board_artiq::ad9154): AD9154-1 initializing...
[ 12.559759s] INFO(board_artiq::ad9154): ...done
[ 12.633631s] INFO(board_artiq::ad9154): AD9154-1 running PRBS test...
[ 13.639150s] WARN(board_artiq::ad9154): PRBS errors on lane0: 000143
[ 13.644589s] WARN(board_artiq::ad9154): PRBS errors on lane1: 0001a6
[ 13.651188s] WARN(board_artiq::ad9154): PRBS errors on lane2: 000199
[ 13.657787s] WARN(board_artiq::ad9154): PRBS errors on lane3: 00012c
[ 13.664387s] WARN(board_artiq::ad9154): PRBS errors on lane4: 0000f2
[ 13.670986s] WARN(board_artiq::ad9154): PRBS errors on lane5: 000182
[ 13.677585s] WARN(board_artiq::ad9154): PRBS errors on lane6: 00015c
[ 13.684184s] WARN(board_artiq::ad9154): PRBS errors on lane7: 000158
[ 13.690663s] ERROR(board_artiq::ad9154): failed to initialize AD9154-1: PRBS failed

@marmeladapk
Copy link
Contributor

@hartytp I can do this, could you just point me in the general direction? I didn't follow this discussion closely.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

OK, I got it!
The input SMA has short-circuit between hot pin and GND.
With generator the HMC managed to lock and the output signal looked good.
But when I connected to the AMC clock output, the signal disappeared!
Now both from generator and from AMC output HMC locks and PRBS errors don't exist anymore.
[ 12.380113s] INFO(board_artiq::ad9154): AD9154-0 running STPL test...
[ 12.386902s] INFO(board_artiq::ad9154): c0 errors: 0
[ 12.392107s] INFO(board_artiq::ad9154): c1 errors: 0
[ 12.397316s] INFO(board_artiq::ad9154): c2 errors: 0
[ 12.402526s] INFO(board_artiq::ad9154): c3 errors: 0
[ 12.407450s] INFO(board_artiq::ad9154): ...passed
[ 12.412398s] INFO(board_artiq::ad9154): AD9154-0 SYSREF scan...
[ 12.777513s] INFO(board_artiq::ad9154): sync error-: 481 -> 0
[ 13.172098s] INFO(board_artiq::ad9154): sync error+: 481 -> 482
[ 13.176977s] INFO(board_artiq::ad9154): margins: -33 +36
[ 13.192987s] INFO(board_artiq::ad9154): AD9154-0 initializing...
[ 13.205855s] INFO(board_artiq::ad9154): ...done
[ 13.280061s] INFO(board_artiq::ad9154): AD9154-1 found
[ 13.294265s] INFO(board_artiq::ad9154): AD9154-1 initializing...
[ 13.307170s] INFO(board_artiq::ad9154): ...done
[ 13.381043s] INFO(board_artiq::ad9154): AD9154-1 running PRBS test...
[ 14.387412s] INFO(board_artiq::ad9154): ...passed
[ 14.391071s] INFO(board_artiq::ad9154): AD9154-1 running STPL test...
[ 14.397856s] INFO(board_artiq::ad9154): c0 errors: 0
[ 14.403065s] INFO(board_artiq::ad9154): c1 errors: 0
[ 14.408275s] INFO(board_artiq::ad9154): c2 errors: 0
[ 14.413485s] INFO(board_artiq::ad9154): c3 errors: 0
[ 14.418407s] INFO(board_artiq::ad9154): ...passed

@sbourdeauducq
Copy link
Member

How does this explain the behavior where CLK1 works but CLK2 doesn't? And why did you measure good clocks on both CLK1 and CLK2?

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

@marmeladapk

If that looks good, look at the JESD lanes. And see if it all looks okay (trigger the scope from the JESD clock). You can try looking at the JESD lanes both with the PRBS pattern and with a square wave (using this patch #1080 (comment))

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

I didn't said that it works with CLK1. it was pure coincidence, maybe they broadcasted something else on FM when we did tests a few months ago :)

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

@sbourdeauducq are you happy with that description of what needs to be done. #1080 (comment)

@marmeladapk
Copy link
Contributor

marmeladapk commented Aug 7, 2018

@hartytp Without-sawg will have this problem?

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

On my board, SAWG doesn't make any difference to this, so I've been testing --without-sawg to speed up my builds.

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

Remember to check the UART for PRBS errors before testing, as there is no point testing on a working AMC/RTM pair.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

And since the SMA is grounded via 10pF, this has enough impedance to pickup nearby RF. I did tests:

  • I shorted the SMA pin to its GND.
  • I restarted the board with generator ON
  • HMC7043 generates 150MHz
  • PRBS errors exist

then I removed the short circuit

  • PRBS errors do not appear anymore

then I disabled the generator

  • no HMC lock

then I enabled the generator

  • HMc locks.

The signal from generator leaks via non-ideal cable shield and via 10pF capacitor to the LTC chip but with poor quality, but enough for HMC830 to lock

@sbourdeauducq
Copy link
Member

I cannot say I am "happy" about any of this, but yes, this procedure looks correct and hopefully will turn up something.

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

@gkasprow that's posted on the wrong issue, I think.

This is already hard to follow without cross-posting!

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

I've noticed that, too many threads opened...

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 7, 2018

@gkasprow to make sure we don't waste time, please can you send me the binaries you were using for your tests? I'd like to check that I can reproduce the PRBS issues on my board with your binaries. If I can then I'll post it back to you tomorrow morning.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 7, 2018

@marmeladapk please post here binaries I used for tests. It is on your computer on your account.

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 8, 2018

Thanks for sending me the binaries @marmeladapk.

Using them, I see:

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 8, 2018

No idea why I'm seeing this new error on DAC0. But, anyway, this does show up the PRBS errors on at least one DAC, so it seems to be a fine binary for testing.

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 8, 2018

@gkasprow can you remind me where on the AMC I can probe CLK2? I'll double check that before I return the board.

@gkasprow
Copy link
Collaborator

gkasprow commented Aug 8, 2018

@hartytp
Copy link
Collaborator Author

hartytp commented Aug 9, 2018

@gkasprow the AMC + RTM have been delivered and signed for in WUT.

@sbourdeauducq
Copy link
Member

@gkasprow What was the problem? Why was it not found when measuring the clocks?

@gkasprow
Copy link
Collaborator

The board I got from @hartytp had one of coupling capacitors missing (damaged mechanically). The board I got from @sbourdeauducq had SMA input shorted to its shell. Both caused PRBS issue.
The clocks looked good, but had periods of high jitter which was not picked by the scope. I should have look at them with spectrum analyser....

@jbqubit
Copy link
Contributor

jbqubit commented Aug 16, 2018

Do the hardware bugs discovered by @gkasprow explain all the reported PRBS errors related to this issue?

@sbourdeauducq
Copy link
Member

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants