Sayma MiSoC memory test failed #908

jbqubit · 2018-01-26T23:37:24Z

Building .bit from source using

commit 440e19b8f9c8ebfce80402a519796cee7fdd6b06

I see...

$ flterm /dev/ttyUSB2

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 32 59 47 69 59 69 40 48 done
Read delays: 7:00-116 6:00-138 5:27-172 4:38-54 3:57-76 2:67-83 1:98-116 0:109-125 done
SDRAM initialized
Memory test failed (384482/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 34 56 45 65 55 67 40 47 done
Read delays: 7:00-112 6:01-127 5:15-33 4:26-44 3:65-84 2:78-94 1:90-123 0:103-123 done
SDRAM initialized
Memory test failed (412138/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 33 55 45 64 55 72 40 47 done
Read delays: 7:00-115 6:00-136 5:10-26 4:36-53 3:61-79 2:76-97 1:88-108 0:96-113 done
SDRAM initialized
Memory test failed (499333/1114624 words incorrect)
Halting.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 34 57 48 69 59 73 41 50 done
Read delays: 7:00-117 6:00-134 5:18-40 4:31-47 3:70-86 2:76-96 1:103-120 0:93-109 done
SDRAM initialized
Memory test failed (473026/1114624 words incorrect)
Halting.

The text was updated successfully, but these errors were encountered:

whitequark · 2018-01-26T23:44:16Z

Possibly caused by 7429ee4?

cjbe · 2018-01-26T23:52:57Z

The typical valid read region seems to be ~170 LSB on my board, so I don't think that commit (increasing the initial step from 8 LSB to 16 LSB) caused this.

This also looks different from the problem that commit solved for me, where the size of the read window was always the size of the initial step. Here the gaps vary from 16 to 20.

sbourdeauducq · 2018-01-27T03:22:10Z

What Vivado version?
We use 2017.4.

jbqubit · 2018-01-29T16:04:21Z

I'm using 2016.2. Will upgrade and try again.

marmeladapk · 2018-01-29T21:22:03Z

I'm using 2017.4 and I also got this issue, though with build from 25.01. I'm currently building against 0edc34a, will update when it finishes.

marmeladapk · 2018-01-29T22:03:35Z

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 58 89 78 88 56 85 47 48 done
Read delays: 7:00-19 6:07-23 5:53-74 4:60-76 3:111-132 2:117-133 1:125-141 0:133-151 done
SDRAM initialized
Memory test failed (522120/1114624 words incorrect)
Halting.

sbourdeauducq · 2018-01-30T03:06:29Z

Here is everything I built from the current master (with RTM bridge, RTIO and other things disabled to save compilation and RTM yak-shaving time):
http://dl.free.fr/lAFdh3oQV
With those binaries, I verified that SDRAM works fine on both Florent's board and Sayma-1.
Can you try those binaries on your boards?
@marmeladapk You can use the Ethernet TX clock phase adjustement script I posted in the RGMII issue on those binaries.
@marmeladapk If the problem persists, can you use Sayma-2 that I shipped to you to debug Ethernet, since I didn't have SDRAM problems on that one?

enjoy-digital · 2018-01-30T07:02:23Z

thanks @sbourdeauducq. I'll look at that. The read leveling procedure is probably still not robust enough.

marmeladapk · 2018-01-30T15:30:48Z

@sbourdeauducq I loaded it to check if memory tests are passed:

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 95 130 118 133 96 125 87 88 done
Read delays: 7:37-249 6:61-267 5:110-316 4:131-336 3:167-369 2:173-377 1:195-40e
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+516.g0edc34a9
[     0.010126s]  INFO(runtime): gateware version 4.0.dev+516.g0edc34a9.dirty
[     0.016910s]  INFO(runtime): log level set to INFO by default
[     0.022630s]  INFO(runtime): UART log level set to INFO by default
[     0.028790s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.028006s]  INFO(runtime): continuing boot
[     1.030975s]  WARN(runtime): using default MAC address 02-00-00-00-76-01; ct
[     1.039568s]  INFO(runtime): using default IP address 192.168.1.60
[     1.054501s]  INFO(runtime::session): accepting network sessions
[     1.059438s]  INFO(runtime::session): running startup kernel
[     1.064959s]  INFO(runtime::session): no startup kernel found
[     1.070665s]  INFO(runtime::session): no connection, starting idle kernel
[     1.077527s]  INFO(runtime::session): no idle kernel found
[     1.084122s]  INFO(runtime::mgmt): management interface active
[     6.274350s]  WARN(runtime): ethernet mac: rx preamble errors: 2
[     7.357698s]  WARN(runtime): ethernet mac: rx preamble errors: 3
[    19.820658s]  WARN(runtime): ethernet mac: rx preamble errors: 4
[    20.128752s]  WARN(runtime): ethernet mac: rx preamble errors: 5
[    20.888642s]  WARN(runtime): ethernet mac: rx preamble errors: 6

So it works. I'll try the script you mentioned later.

sbourdeauducq · 2018-01-30T15:37:21Z

Good. You do however seem to get a large number of Ethernet RX corrupted packets (preamble errors). Is the PHY correctly set in RGMII mode? Does this happen for every packet? You can change the RX phase as well by using this script command instead: set_property CLKOUT0_PHASE <phase> [get_cells crg_ethrx_mmcm]

marmeladapk · 2018-01-30T15:47:06Z

@sbourdeauducq Should I change it in xdc in artiq/artiq_sayma/gateware/top.xdc and rebuild? Will latest artiq pass memory tests?

sbourdeauducq · 2018-01-30T15:52:06Z

No. Please follow the instruction in my comment: #854 (comment) - you just save the script as edit_pll.tcl and run the mentioned vivado command.
There is no bitstream rebuilding and it is a rather quick process. Nothing in the design other than the PLL phase will be changed, the routes etc. will be exactly as before, so yes memory test should be unaffected.

sbourdeauducq · 2018-02-03T12:21:28Z

I also see the problem with the default build (including SAWG) on ARTIQ 4c22d64, migen e554f072905ceeb27c9c179c8c7b785acd1676bc, misoc cb8e314c7515eade46f5bcde4e48903d7ec92490

Initializing SDRAM...
Write leveling: 43 66 49 68 35 56 34 25 done
Read delays: 7:00-121 6:00-141 5:39-55 4:44-60 3:67-85 2:76-92 1:105-121 0:113-129 done
SDRAM initialized
Memory test failed (356593/1114624 words incorrect)

When disabling the SAWG (--without-sawg), the system boots correctly.
@enjoy-digital Can you move forward with JESD SC1 by disabling SAWG (which you want to do anyway to reduce compilation time)? I cannot reproduce the "no output on UART" bug.

enjoy-digital · 2018-02-03T15:52:27Z

@sbourdeauducq: yes i'll continue on Monday.

sbourdeauducq · 2018-02-06T05:52:45Z

@hartytp Are you looking into this?
This definitely worked when I did the SAWG test and posted the scope screenshot. So it should be possible to isolate what code change exactly caused this problem, maybe with the help of tools like git-bisect.
But I suspect this is due to the non-determinism of Vivado compilation, or to plain Vivado bugs. In the first case, this is normally solvable by adding appropriate timing constraints. In the second case, considering how Xilinx technical support has been degrading for the past years, the first option is basically to apply somewhat random non-functional changes to the code, as Xilinx engineers certainly do, and hope that ça tombe en marche, or try various Vivado synthesis options. (Xilinx's answer to the bug invasion is pretty much the usual)
@whitequark's addition of RTM loading gateware is a good suspect for the triggering of this kind of Vivado misbehavior.

hartytp · 2018-02-06T09:28:43Z

@hartytp Are you looking into this?

I wasn't planning to, no. In general, I'm trying to prioritize things like the HMC830 on Sayma, which seem to be (at least in part) hardware issues. In contrast, the mem test thing is just firmware/gateware, isn't it? As such, it seemed like the standard yak shaving required to get a new board up and running, and not something particular to Sayma. So, I figured that you guys were probably best placed to look into it.

I have a busy week lined up this week, but I might have some time to look into it.

Side note: we've had Sayma for quite a while now, but the ARTIQ tool chain still feels quite hacked and fragile. It would be great to get to the point where Artiq flash can do the RTM as well, the package includes the correct version of JESD204B, etc.

hartytp · 2018-02-06T09:31:52Z

Anyway to be clear, in case I do find time to look into this, your plan is basically to dig through the git history, building various versions of Sayma gateware/firmware with SAWG (at a few hours per build) until we find the point where it stopped working? IIRC, that's a bit complicated by the fact that the tools to build Sayma have changed a bit over time, so it's not always the same instructions to build/flash it, and by the fact that the package doesn't include the right version of JESD204B (also misoc/migen?), so one needs to track the history of several projects to make sure that each build uses the correct version of each. Doesn't sound like fun.

sbourdeauducq · 2018-02-06T09:32:18Z

Doesn't sound like fun.

Yep, standard fare.
Anyway, the first thing I'd try is removing the RTM loading gateware.
Another thing that makes the SDRAM work is removing a lot of peripherals using the patch I posted elsewhere, so there would not be such versioning issues. Just the long Vivado compilation times.

hartytp · 2018-02-06T09:34:22Z

Well, as I said, as this seems like standard yak shaving for getting a board up and running, rather than a particular hardware/design issue with Sayma. So, do you mind taking a look at it first -- it's likely to be quicker for you since you've probably kept a closer eye on the changes that have been made to ARTIQ over the past weeks.

sbourdeauducq · 2018-02-07T04:05:31Z

That's what I thought - the patch below works around the problem.

diff --git a/artiq/gateware/targets/sayma_amc.py b/artiq/gateware/targets/sayma_amc.py
index c45f8d37a..f6c5b95f6 100755
--- a/artiq/gateware/targets/sayma_amc.py
+++ b/artiq/gateware/targets/sayma_amc.py
@@ -160,9 +160,9 @@ class Standalone(MiniSoC, AMPSoC):
         ]
 
         # RTM bitstream upload
-        rtm_fpga_cfg = platform.request("rtm_fpga_cfg")
-        self.submodules.rtm_fpga_cfg = SlaveFPGA(rtm_fpga_cfg)
-        self.csr_devices.append("rtm_fpga_cfg")
+        #rtm_fpga_cfg = platform.request("rtm_fpga_cfg")
+        #self.submodules.rtm_fpga_cfg = SlaveFPGA(rtm_fpga_cfg)
+        #self.csr_devices.append("rtm_fpga_cfg")
 
         # AMC/RTM serwb
         serwb_pll = serwb.phy.SERWBPLL(125e6, 625e6, vco_div=2)

@whitequark What about using GPIO and bit-banging instead? Hopefully the Vivado trash will behave then.

marmeladapk · 2018-02-08T12:12:03Z

@sbourdeauducq With latest commit (2d4a134) when I compile python3 -m artiq.gateware.targets.sayma_amc --without-sawg I still get memory test failed.

hartytp · 2018-03-12T09:55:07Z

My best guess here is that there is something wrong like a missing/incorrect timing constraint, an output drive not being set optimally, etc. AFAICT, Vivado is struggling to meet timing with the SAWG in place, so I wouldn't be surprised if it's doing something funny here, so it's important to have everything optimally set up and fully constrained. Also, if there is anything we can do to make timing closure easier to achieve (e.g. elsewhere in the design), that might well help...

Could one of you (maybe @jordens for a fresh pair of eyes if he can find the time) double check that you can't see anything else suspicious in the way this is all setup?

hartytp · 2018-03-12T09:55:43Z

As always, if you can think of anything you want me to try/look at then let me know.

sbourdeauducq · 2018-03-12T12:43:58Z

Does it pass on your board (and everything looks good) when SAWG is disabled?

hartytp · 2018-03-12T12:51:54Z

Did last time I checked. Can rebuild and recheck if that helps

hartytp · 2018-03-12T14:04:38Z

@sbourdeauducq Without SAWG https://hastebin.com/qagicibita.sql

No mem test errors.

No errors in the middle of the working region AFAICS.

hartytp · 2018-03-12T14:20:43Z

Looking again at the "bad" eyes, with errors in the working region, AFAICS, the problems are all with the read leveling rather than the write leveling. I wonder if there is something wrong with the IDELAY code. Could we fuz this by looping an OSERDES to an ISERDES and sweeping the IDELAY for different ODELAYs?

jordens · 2018-03-19T19:03:30Z

Could everyone who currently has access to a board with this issue try the current artiq/misoc/migen with sawg (and otherwise presumed and/or known worst case conditions) and report the console messages? @jbqubit @enjoy-digital @marmeladapk

jordens · 2018-03-20T17:46:03Z

From @enjoy-digital with @hartytp 's board and SAWG. That's just fine.

Gateware ident 4.0.dev+742.g276b0c7f
Initializing SDRAM...
DQS initial delay: 95 taps
Write leveling scan:
Module 3:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011110110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 95 taps
Write leveling: 78 88 102 104 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3: 160 (146 wide), 2: 150 (162 wide), 1: 132 (157 wide), 0: 126 (151 wide), done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000007s]  INFO(runtime): ARTIQ runtime starting...
[     0.003890s]  INFO(runtime): software version 4.0.dev+742.g276b0c7f
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+742.g276b0c7f
[     0.016425s]  INFO(runtime): log level set to INFO by default
[     0.022137s]  INFO(runtime): UART log level set to INFO by default
[     0.028289s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

enjoy-digital · 2018-03-20T18:04:41Z

@jordens: i'm doing more tests with @hartytp's board:

flashed the board with gateware/bootloader that was failing with @hartytp. I'm able to reproduce memtest errors (4 errors for 500 startups).
flashed the board with last gateware/bootloader (32 bits DRAM), doing startup tests. (no memtest errors on 600 startups for now, i'll stop at 1000 and will have the log if you are interested).

I also regenerated the design with SAWG and the 64 bits DRAM, i'm going to test it when the startup test with the 32 bits DRAM is done.

jordens · 2018-03-20T19:10:38Z

I'd just do the memtests in a loop. That's much faster and it isolates problems. If those work, then there is little left to check (other than the eye location algorithm but i don't see how that could be improved) on that board.

hartytp · 2018-03-20T21:15:34Z

Great work all! That eye scan looks really good. Given the build-build variations we've seen, it's hard to be 100% sure that this is fixed properly, but that does look extremely encouraging. Will be interested to hear from the other people with Sayma about how this looks on their boards.

I'd be really interested to know which of the recent gateware changes did the trick/what the problem actually was.

cjbe · 2018-03-20T21:17:34Z

@hartytp if I understand correctly, with the gateware you were using @enjoy-digital sees ~1% failures - IIRC you were seeing a rate much higher than this (i.e. >90%). Could this difference be a power supply issue at your end?

hartytp · 2018-03-20T21:27:05Z

@cjbe

Could this difference be a power supply issue at your end?

Not likely. I'm running by board from a good 5A linear PSU and Sayma only draws something like 1.7A IIRC. I'm using a short cable to connect the PSU to the board and measured the voltage on the PCB to be something like 11.9V (again, from memory, but that was one of the first things I checked when we had issues, having been stung by that before on a different design).

Also, my setup passed Florent's stress test fine and, again from memory, the overall current draw isn't that much higher at startup with the SAWG build. So, again, I don't suspect the PSU as the issue.

A larger difference between setups is potentially the temperature. My board runs pretty cold because I've a couple of beefy CPU fans blasting it. But, I don't expect that to actually make a difference.

if I understand correctly, with the gateware you were using @enjoy-digital sees ~1% failures - IIRC you were seeing a rate much higher than this (i.e. >90%).

hmmm. You're right, I was seeing pretty frequent mem test errors in all versions I built (between 100% and 25% dependent on the gateware IIRC).

In any case, I don't think comparing mem test results is as instructive as looking at eye scans. The eye scans that both I and @marmeladapk posted look horrible compared to the one that Florent posted from my board. If Florent can reproduce something that looks like the previous eye scans with the old gateware, then it's not so relevant what the mem test says.

whitequark · 2018-03-20T21:45:36Z

Not likely. I'm running by board from a good 5A linear PSU and Sayma only draws something like 1.7A IIRC.

I've calculated the maximum rated power draw based on the values in the schematics just today. AMC alone is 2.9A and AMC+RTM are 11.5A.

hartytp · 2018-03-20T21:50:18Z

I've calculated the maximum rated power draw based on the values in the schematics just today. AMC alone is 2.9A and AMC+RTM are 11.5A.

hmmm...that doesn't sound right and I've certainly never seen anything close to that. Is that maximum value when all the supplies are shorted? Or, a maximum when the FMCs and everything else is fully loaded?

IIRC @gkasprow posted the expected current draw in another issue and it was around the 2A mark (again, I checked this when picking the PSU to work with) which is consistent with what I've seen when running the boards.

hartytp · 2018-03-20T22:31:11Z

Thinking about this more, the board where @gkasprow and @marmeladapk carefully verified the PI also showed bad eye scans, which also suggests that this isn't related to the PSU I'm using (we looked into all that carefully before M-Labs started their thorough gateware investigation....)

whitequark · 2018-03-20T22:43:53Z

Is that maximum value when all the supplies are shorted? Or, a maximum when the FMCs and everything else is fully loaded?

If I'm reading it right, the latter. It's a very conservative number, of course, I calculated it while trying to figure out why a power connector burned up in the lab.

hartytp · 2018-03-20T22:46:13Z

ack.

Well, for a very conservative estimate, that's probably consistent with what I see then. There is a tonne of stuff we're not using (clock mezzanine, ADCs etc). Plus, I think that budgeted for a lot of power consumption in the AFEs, which I don't even have atm. Good reminder though to keep an eye on this once we start plugging more things into Sayma.

enjoy-digital · 2018-03-21T08:14:47Z

With @hartytp AMC (no RTM) and last gateware/bootloader i get:

32 bits ddram: no calibration/memtest error on 1000 consecutive startups.
64 bits ddram: no calibration/memtest error on 1000 consecutive startups.
I'm now going to test continuous memtests with the 64 bits ddram gateware.

sayma_hartytp_ddram_startup_test.zip

enjoy-digital · 2018-03-21T16:18:10Z

@hartytp: your board has been running memtest (64 bits ddram) continuously today without errors (>8h).
It would be good to have results from @marmeladapk and @jbqubit to know if all boards are now working correctly. (building also things on their side if possible to be sure results are the same between builds, if it's too long to rebuild, i can provide the bistreams) Do you want i send your board back tomorrow or should i wait results from others boards?

hartytp · 2018-03-21T21:51:07Z

@enjoy-digital feel free to hang on to that board a bit longer if it helps - I have to work on other things for the next week or so.

Id still be interested to know which of the gw changes fixed this issue.

marmeladapk · 2018-03-22T11:42:26Z

@jbqubit has (or will soon get) one of our Saymas which had worse results. Our second Sayma is in use by wizath at the moment (he's working on MMC).

jbqubit · 2018-03-23T03:15:55Z

Built from master with SAWG. Loaded onto board that I've had for many months -- not the one that @marmeladapk mentioned is in the mail.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2018 M-Labs Limited

Bootloader CRC passed
Gateware ident 4.0.dev+775.g06359076.dirty
Initializing SDRAM...
DQS initial delay: 93 taps
Write leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000100011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000000000000000000000000000000000000001010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 93 taps
Write leveling: 87 86 112 104 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110010010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3:76-216 2:62-210 1:37-188 0:33-175 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000006s]  INFO(runtime): ARTIQ runtime starting...
[     0.003889s]  INFO(runtime): software version 4.0.dev+722.g2edf65f5
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+775.g06359076.dirty
[     0.016970s]  INFO(runtime): log level set to INFO by default
[     0.022671s]  INFO(runtime): UART log level set to INFO by default
[     0.028809s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.345181s]  INFO(board_artiq::serwb): done.
[     0.348330s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+775.g06359076.dirty
[     0.356300s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.356006s]  INFO(runtime): continuing boot
[     1.359026s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found
[     1.365294s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 configuration...
[     1.383756s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.387680s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.661008s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #0 failed (JESD ready timeout), retrying
[     1.679555s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.683471s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.957005s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #1 failed (JESD ready timeout), retrying
[     1.975549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.979465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.253006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #2 failed (JESD ready timeout), retrying
[     2.271550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.275466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.549005s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #3 failed (JESD ready timeout), retrying
[     2.567549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.571465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.845006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #4 failed (JESD ready timeout), retrying
[     2.863549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.867465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.141006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #5 failed (JESD ready timeout), retrying
[     3.159550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     3.163465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.437006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #6 failed (JESD ready timeout), retrying
[     3.455550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     3.459466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.733007s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #7 failed (JESD ready timeout), retrying

jbqubit · 2018-03-23T03:21:33Z

In subsequent load.... success.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2018 M-Labs Limited

Bootloader CRC passed
Gateware ident 4.0.dev+775.g06359076.dirty
Initializing SDRAM...
DQS initial delay: 93 taps
Write leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 93 taps
Write leveling: 84 87 112 107 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000001101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3:75-216 2:59-208 1:39-186 0:32-176 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000006s]  INFO(runtime): ARTIQ runtime starting...
[     0.003889s]  INFO(runtime): software version 4.0.dev+722.g2edf65f5
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+775.g06359076.dirty
[     0.016970s]  INFO(runtime): log level set to INFO by default
[     0.022671s]  INFO(runtime): UART log level set to INFO by default
[     0.028809s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.187859s]  INFO(board_artiq::serwb): done.
[     0.191008s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+775.g06359076.dirty
[     0.198978s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.198006s]  INFO(runtime): continuing boot
[     1.201025s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found
[     1.207293s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 configuration...
[     1.225755s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.229680s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.312620s]  INFO(board_artiq::ad9154): AD9154-0 PRBS test
[     2.327627s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.331544s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.419600s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #0 failed (bad SYNC), retrying
[     2.437273s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.441189s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.715006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #1 failed (JESD ready timeout), retrying
[     2.733550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.737466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.830708s]  INFO(board_artiq::ad9154): AD9154-1 found
[     2.834630s]  INFO(board_artiq::ad9154): AD9154-1 configuration...
[     2.917573s]  INFO(board_artiq::ad9154): AD9154-1 PRBS test

enjoy-digital · 2018-03-23T09:46:41Z

@jbqubit: thanks for testing.

hartytp · 2018-03-26T15:26:32Z

Is there anything else to do here, or can we close this (we can re-open if the issue resurfaces)? AFAICT, the eye scans now look good on all boards that we have access to.

Did you plan to track down which commit/change fixed this problem for future reference? Or, is it not worth the hassle?

jbqubit · 2018-03-26T17:42:05Z

Seems like adding a unit test based on the eye-diagram would be a good measure. Especially in light of recent experience that modifying other parts of the UltraScale gateware (eg SAWG) can impact SDRAM timing.

sbourdeauducq · 2018-03-27T06:37:15Z

Seems like adding a unit test based on the eye-diagram would be a good measure.

The bootloader memory test is now extended, which should catch more subtle problems.

Did you plan to track down which commit/change fixed this problem for future reference? Or, is it not worth the hassle?

Let's do this once we are absolutely certain that it works correctly on all boards with the current code. Otherwise, if we change the code there, we're adding more loose screws.

sbourdeauducq assigned enjoy-digital Jan 28, 2018

jbqubit mentioned this issue Jan 29, 2018

document vivado version #910

Closed

jbqubit mentioned this issue Jan 29, 2018

running ARTIQ Sayma hardware at WUT sinara-hw/sinara#468

Closed

jbqubit mentioned this issue Jan 29, 2018

Sayma v1.0 outstanding hardware bugs sinara-hw/sinara#494

Closed

13 tasks

jbqubit mentioned this issue Jan 31, 2018

Sayma v1.0 outstanding M-Labs items sinara-hw/sinara#501

Closed

6 tasks

sbourdeauducq added this to the 4.0 milestone Feb 3, 2018

sbourdeauducq mentioned this issue Feb 6, 2018

RGMII Ethernet + MiSoC core does not work on Sayma #854

Closed

sbourdeauducq closed this as completed in 2d4a134 Feb 7, 2018

sbourdeauducq mentioned this issue Feb 7, 2018

bitstream loading for Sayma RTM FPGA #813

Closed

sbourdeauducq closed this as completed Mar 27, 2018

hartytp mentioned this issue Apr 9, 2018

Sayma mem test failures with RTM connected #981

Closed

Sayma MiSoC memory test failed #908

Sayma MiSoC memory test failed #908

Comments

jbqubit commented Jan 26, 2018

whitequark commented Jan 26, 2018

cjbe commented Jan 26, 2018

sbourdeauducq commented Jan 27, 2018 • edited

jbqubit commented Jan 29, 2018

marmeladapk commented Jan 29, 2018

marmeladapk commented Jan 29, 2018

sbourdeauducq commented Jan 30, 2018 • edited

enjoy-digital commented Jan 30, 2018

marmeladapk commented Jan 30, 2018

sbourdeauducq commented Jan 30, 2018 • edited

marmeladapk commented Jan 30, 2018

sbourdeauducq commented Jan 30, 2018

sbourdeauducq commented Feb 3, 2018

enjoy-digital commented Feb 3, 2018

sbourdeauducq commented Feb 6, 2018 • edited

hartytp commented Feb 6, 2018

hartytp commented Feb 6, 2018 • edited

sbourdeauducq commented Feb 6, 2018 • edited

hartytp commented Feb 6, 2018

sbourdeauducq commented Feb 7, 2018

marmeladapk commented Feb 8, 2018

hartytp commented Mar 12, 2018

hartytp commented Mar 12, 2018

sbourdeauducq commented Mar 12, 2018

hartytp commented Mar 12, 2018

hartytp commented Mar 12, 2018

hartytp commented Mar 12, 2018

jordens commented Mar 19, 2018

jordens commented Mar 20, 2018

enjoy-digital commented Mar 20, 2018

jordens commented Mar 20, 2018

hartytp commented Mar 20, 2018

cjbe commented Mar 20, 2018

hartytp commented Mar 20, 2018

whitequark commented Mar 20, 2018

hartytp commented Mar 20, 2018

hartytp commented Mar 20, 2018

whitequark commented Mar 20, 2018

hartytp commented Mar 20, 2018

enjoy-digital commented Mar 21, 2018

enjoy-digital commented Mar 21, 2018

hartytp commented Mar 21, 2018

marmeladapk commented Mar 22, 2018

jbqubit commented Mar 23, 2018

jbqubit commented Mar 23, 2018

enjoy-digital commented Mar 23, 2018

hartytp commented Mar 26, 2018

jbqubit commented Mar 26, 2018

sbourdeauducq commented Mar 27, 2018

sbourdeauducq commented Jan 27, 2018 •

edited

sbourdeauducq commented Jan 30, 2018 •

edited

sbourdeauducq commented Jan 30, 2018 •

edited

sbourdeauducq commented Feb 6, 2018 •

edited

hartytp commented Feb 6, 2018 •

edited

sbourdeauducq commented Feb 6, 2018 •

edited