Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sayma MiSoC memory test failed #908

Closed
jbqubit opened this issue Jan 26, 2018 · 343 comments
Closed

Sayma MiSoC memory test failed #908

jbqubit opened this issue Jan 26, 2018 · 343 comments
Assignees
Milestone

Comments

@jbqubit
Copy link
Contributor

jbqubit commented Jan 26, 2018

Building .bit from source using

commit 440e19b8f9c8ebfce80402a519796cee7fdd6b06

I see...

$ flterm /dev/ttyUSB2

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 32 59 47 69 59 69 40 48 done
Read delays: 7:00-116 6:00-138 5:27-172 4:38-54 3:57-76 2:67-83 1:98-116 0:109-125 done
SDRAM initialized
Memory test failed (384482/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 34 56 45 65 55 67 40 47 done
Read delays: 7:00-112 6:01-127 5:15-33 4:26-44 3:65-84 2:78-94 1:90-123 0:103-123 done
SDRAM initialized
Memory test failed (412138/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 33 55 45 64 55 72 40 47 done
Read delays: 7:00-115 6:00-136 5:10-26 4:36-53 3:61-79 2:76-97 1:88-108 0:96-113 done
SDRAM initialized
Memory test failed (499333/1114624 words incorrect)
Halting.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 34 57 48 69 59 73 41 50 done
Read delays: 7:00-117 6:00-134 5:18-40 4:31-47 3:70-86 2:76-96 1:103-120 0:93-109 done
SDRAM initialized
Memory test failed (473026/1114624 words incorrect)
Halting.

@whitequark
Copy link
Contributor

Possibly caused by 7429ee4?

@cjbe
Copy link
Contributor

cjbe commented Jan 26, 2018

The typical valid read region seems to be ~170 LSB on my board, so I don't think that commit (increasing the initial step from 8 LSB to 16 LSB) caused this.

This also looks different from the problem that commit solved for me, where the size of the read window was always the size of the initial step. Here the gaps vary from 16 to 20.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 27, 2018

What Vivado version?
We use 2017.4.

@jbqubit
Copy link
Contributor Author

jbqubit commented Jan 29, 2018

I'm using 2016.2. Will upgrade and try again.

@marmeladapk
Copy link
Contributor

I'm using 2017.4 and I also got this issue, though with build from 25.01. I'm currently building against 0edc34a, will update when it finishes.

@marmeladapk
Copy link
Contributor

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 58 89 78 88 56 85 47 48 done
Read delays: 7:00-19 6:07-23 5:53-74 4:60-76 3:111-132 2:117-133 1:125-141 0:133-151 done
SDRAM initialized
Memory test failed (522120/1114624 words incorrect)
Halting.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 30, 2018

Here is everything I built from the current master (with RTM bridge, RTIO and other things disabled to save compilation and RTM yak-shaving time):
http://dl.free.fr/lAFdh3oQV
With those binaries, I verified that SDRAM works fine on both Florent's board and Sayma-1.
Can you try those binaries on your boards?
@marmeladapk You can use the Ethernet TX clock phase adjustement script I posted in the RGMII issue on those binaries.
@marmeladapk If the problem persists, can you use Sayma-2 that I shipped to you to debug Ethernet, since I didn't have SDRAM problems on that one?

@enjoy-digital
Copy link
Contributor

thanks @sbourdeauducq. I'll look at that. The read leveling procedure is probably still not robust enough.

@marmeladapk
Copy link
Contributor

@sbourdeauducq I loaded it to check if memory tests are passed:

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 95 130 118 133 96 125 87 88 done
Read delays: 7:37-249 6:61-267 5:110-316 4:131-336 3:167-369 2:173-377 1:195-40e
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+516.g0edc34a9
[     0.010126s]  INFO(runtime): gateware version 4.0.dev+516.g0edc34a9.dirty
[     0.016910s]  INFO(runtime): log level set to INFO by default
[     0.022630s]  INFO(runtime): UART log level set to INFO by default
[     0.028790s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.028006s]  INFO(runtime): continuing boot
[     1.030975s]  WARN(runtime): using default MAC address 02-00-00-00-76-01; ct
[     1.039568s]  INFO(runtime): using default IP address 192.168.1.60
[     1.054501s]  INFO(runtime::session): accepting network sessions
[     1.059438s]  INFO(runtime::session): running startup kernel
[     1.064959s]  INFO(runtime::session): no startup kernel found
[     1.070665s]  INFO(runtime::session): no connection, starting idle kernel
[     1.077527s]  INFO(runtime::session): no idle kernel found
[     1.084122s]  INFO(runtime::mgmt): management interface active
[     6.274350s]  WARN(runtime): ethernet mac: rx preamble errors: 2
[     7.357698s]  WARN(runtime): ethernet mac: rx preamble errors: 3
[    19.820658s]  WARN(runtime): ethernet mac: rx preamble errors: 4
[    20.128752s]  WARN(runtime): ethernet mac: rx preamble errors: 5
[    20.888642s]  WARN(runtime): ethernet mac: rx preamble errors: 6

So it works. I'll try the script you mentioned later.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jan 30, 2018

Good. You do however seem to get a large number of Ethernet RX corrupted packets (preamble errors). Is the PHY correctly set in RGMII mode? Does this happen for every packet? You can change the RX phase as well by using this script command instead: set_property CLKOUT0_PHASE <phase> [get_cells crg_ethrx_mmcm]

@marmeladapk
Copy link
Contributor

@sbourdeauducq Should I change it in xdc in artiq/artiq_sayma/gateware/top.xdc and rebuild? Will latest artiq pass memory tests?

@sbourdeauducq
Copy link
Member

No. Please follow the instruction in my comment: #854 (comment) - you just save the script as edit_pll.tcl and run the mentioned vivado command.
There is no bitstream rebuilding and it is a rather quick process. Nothing in the design other than the PLL phase will be changed, the routes etc. will be exactly as before, so yes memory test should be unaffected.

@sbourdeauducq
Copy link
Member

I also see the problem with the default build (including SAWG) on ARTIQ 4c22d64, migen e554f072905ceeb27c9c179c8c7b785acd1676bc, misoc cb8e314c7515eade46f5bcde4e48903d7ec92490

Initializing SDRAM...
Write leveling: 43 66 49 68 35 56 34 25 done
Read delays: 7:00-121 6:00-141 5:39-55 4:44-60 3:67-85 2:76-92 1:105-121 0:113-129 done
SDRAM initialized
Memory test failed (356593/1114624 words incorrect)

When disabling the SAWG (--without-sawg), the system boots correctly.
@enjoy-digital Can you move forward with JESD SC1 by disabling SAWG (which you want to do anyway to reduce compilation time)? I cannot reproduce the "no output on UART" bug.

@sbourdeauducq sbourdeauducq added this to the 4.0 milestone Feb 3, 2018
@enjoy-digital
Copy link
Contributor

@sbourdeauducq: yes i'll continue on Monday.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Feb 6, 2018

@hartytp Are you looking into this?
This definitely worked when I did the SAWG test and posted the scope screenshot. So it should be possible to isolate what code change exactly caused this problem, maybe with the help of tools like git-bisect.
But I suspect this is due to the non-determinism of Vivado compilation, or to plain Vivado bugs. In the first case, this is normally solvable by adding appropriate timing constraints. In the second case, considering how Xilinx technical support has been degrading for the past years, the first option is basically to apply somewhat random non-functional changes to the code, as Xilinx engineers certainly do, and hope that ça tombe en marche, or try various Vivado synthesis options. (Xilinx's answer to the bug invasion is pretty much the usual)
@whitequark's addition of RTM loading gateware is a good suspect for the triggering of this kind of Vivado misbehavior.

@hartytp
Copy link
Collaborator

hartytp commented Feb 6, 2018

@hartytp Are you looking into this?

I wasn't planning to, no. In general, I'm trying to prioritize things like the HMC830 on Sayma, which seem to be (at least in part) hardware issues. In contrast, the mem test thing is just firmware/gateware, isn't it? As such, it seemed like the standard yak shaving required to get a new board up and running, and not something particular to Sayma. So, I figured that you guys were probably best placed to look into it.

I have a busy week lined up this week, but I might have some time to look into it.

Side note: we've had Sayma for quite a while now, but the ARTIQ tool chain still feels quite hacked and fragile. It would be great to get to the point where Artiq flash can do the RTM as well, the package includes the correct version of JESD204B, etc.

@hartytp
Copy link
Collaborator

hartytp commented Feb 6, 2018

Anyway to be clear, in case I do find time to look into this, your plan is basically to dig through the git history, building various versions of Sayma gateware/firmware with SAWG (at a few hours per build) until we find the point where it stopped working? IIRC, that's a bit complicated by the fact that the tools to build Sayma have changed a bit over time, so it's not always the same instructions to build/flash it, and by the fact that the package doesn't include the right version of JESD204B (also misoc/migen?), so one needs to track the history of several projects to make sure that each build uses the correct version of each. Doesn't sound like fun.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Feb 6, 2018

Doesn't sound like fun.

Yep, standard fare.
Anyway, the first thing I'd try is removing the RTM loading gateware.
Another thing that makes the SDRAM work is removing a lot of peripherals using the patch I posted elsewhere, so there would not be such versioning issues. Just the long Vivado compilation times.

@hartytp
Copy link
Collaborator

hartytp commented Feb 6, 2018

Well, as I said, as this seems like standard yak shaving for getting a board up and running, rather than a particular hardware/design issue with Sayma. So, do you mind taking a look at it first -- it's likely to be quicker for you since you've probably kept a closer eye on the changes that have been made to ARTIQ over the past weeks.

@sbourdeauducq
Copy link
Member

That's what I thought - the patch below works around the problem.

diff --git a/artiq/gateware/targets/sayma_amc.py b/artiq/gateware/targets/sayma_amc.py
index c45f8d37a..f6c5b95f6 100755
--- a/artiq/gateware/targets/sayma_amc.py
+++ b/artiq/gateware/targets/sayma_amc.py
@@ -160,9 +160,9 @@ class Standalone(MiniSoC, AMPSoC):
         ]
 
         # RTM bitstream upload
-        rtm_fpga_cfg = platform.request("rtm_fpga_cfg")
-        self.submodules.rtm_fpga_cfg = SlaveFPGA(rtm_fpga_cfg)
-        self.csr_devices.append("rtm_fpga_cfg")
+        #rtm_fpga_cfg = platform.request("rtm_fpga_cfg")
+        #self.submodules.rtm_fpga_cfg = SlaveFPGA(rtm_fpga_cfg)
+        #self.csr_devices.append("rtm_fpga_cfg")
 
         # AMC/RTM serwb
         serwb_pll = serwb.phy.SERWBPLL(125e6, 625e6, vco_div=2)

@whitequark What about using GPIO and bit-banging instead? Hopefully the Vivado trash will behave then.

@marmeladapk
Copy link
Contributor

@sbourdeauducq With latest commit (2d4a134) when I compile python3 -m artiq.gateware.targets.sayma_amc --without-sawg I still get memory test failed.

@hartytp
Copy link
Collaborator

hartytp commented Mar 12, 2018

My best guess here is that there is something wrong like a missing/incorrect timing constraint, an output drive not being set optimally, etc. AFAICT, Vivado is struggling to meet timing with the SAWG in place, so I wouldn't be surprised if it's doing something funny here, so it's important to have everything optimally set up and fully constrained. Also, if there is anything we can do to make timing closure easier to achieve (e.g. elsewhere in the design), that might well help...

Could one of you (maybe @jordens for a fresh pair of eyes if he can find the time) double check that you can't see anything else suspicious in the way this is all setup?

@hartytp
Copy link
Collaborator

hartytp commented Mar 12, 2018

As always, if you can think of anything you want me to try/look at then let me know.

@sbourdeauducq
Copy link
Member

Does it pass on your board (and everything looks good) when SAWG is disabled?

@hartytp
Copy link
Collaborator

hartytp commented Mar 12, 2018

Did last time I checked. Can rebuild and recheck if that helps

@hartytp
Copy link
Collaborator

hartytp commented Mar 12, 2018

@sbourdeauducq Without SAWG https://hastebin.com/qagicibita.sql

No mem test errors.

No errors in the middle of the working region AFAICS.

@hartytp
Copy link
Collaborator

hartytp commented Mar 12, 2018

Looking again at the "bad" eyes, with errors in the working region, AFAICS, the problems are all with the read leveling rather than the write leveling. I wonder if there is something wrong with the IDELAY code. Could we fuz this by looping an OSERDES to an ISERDES and sweeping the IDELAY for different ODELAYs?

@jordens
Copy link
Member

jordens commented Mar 19, 2018

Could everyone who currently has access to a board with this issue try the current artiq/misoc/migen with sawg (and otherwise presumed and/or known worst case conditions) and report the console messages? @jbqubit @enjoy-digital @marmeladapk

@jordens
Copy link
Member

jordens commented Mar 20, 2018

From @enjoy-digital with @hartytp 's board and SAWG. That's just fine.

Gateware ident 4.0.dev+742.g276b0c7f
Initializing SDRAM...
DQS initial delay: 95 taps
Write leveling scan:
Module 3:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011110110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 95 taps
Write leveling: 78 88 102 104 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3: 160 (146 wide), 2: 150 (162 wide), 1: 132 (157 wide), 0: 126 (151 wide), done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000007s]  INFO(runtime): ARTIQ runtime starting...
[     0.003890s]  INFO(runtime): software version 4.0.dev+742.g276b0c7f
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+742.g276b0c7f
[     0.016425s]  INFO(runtime): log level set to INFO by default
[     0.022137s]  INFO(runtime): UART log level set to INFO by default
[     0.028289s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

@enjoy-digital
Copy link
Contributor

@jordens: i'm doing more tests with @hartytp's board:

  • flashed the board with gateware/bootloader that was failing with @hartytp. I'm able to reproduce memtest errors (4 errors for 500 startups).
  • flashed the board with last gateware/bootloader (32 bits DRAM), doing startup tests. (no memtest errors on 600 startups for now, i'll stop at 1000 and will have the log if you are interested).

I also regenerated the design with SAWG and the 64 bits DRAM, i'm going to test it when the startup test with the 32 bits DRAM is done.

@jordens
Copy link
Member

jordens commented Mar 20, 2018

I'd just do the memtests in a loop. That's much faster and it isolates problems. If those work, then there is little left to check (other than the eye location algorithm but i don't see how that could be improved) on that board.

@hartytp
Copy link
Collaborator

hartytp commented Mar 20, 2018

Great work all! That eye scan looks really good. Given the build-build variations we've seen, it's hard to be 100% sure that this is fixed properly, but that does look extremely encouraging. Will be interested to hear from the other people with Sayma about how this looks on their boards.

I'd be really interested to know which of the recent gateware changes did the trick/what the problem actually was.

@cjbe
Copy link
Contributor

cjbe commented Mar 20, 2018

@hartytp if I understand correctly, with the gateware you were using @enjoy-digital sees ~1% failures - IIRC you were seeing a rate much higher than this (i.e. >90%). Could this difference be a power supply issue at your end?

@hartytp
Copy link
Collaborator

hartytp commented Mar 20, 2018

@cjbe

Could this difference be a power supply issue at your end?

Not likely. I'm running by board from a good 5A linear PSU and Sayma only draws something like 1.7A IIRC. I'm using a short cable to connect the PSU to the board and measured the voltage on the PCB to be something like 11.9V (again, from memory, but that was one of the first things I checked when we had issues, having been stung by that before on a different design).

Also, my setup passed Florent's stress test fine and, again from memory, the overall current draw isn't that much higher at startup with the SAWG build. So, again, I don't suspect the PSU as the issue.

A larger difference between setups is potentially the temperature. My board runs pretty cold because I've a couple of beefy CPU fans blasting it. But, I don't expect that to actually make a difference.

if I understand correctly, with the gateware you were using @enjoy-digital sees ~1% failures - IIRC you were seeing a rate much higher than this (i.e. >90%).

hmmm. You're right, I was seeing pretty frequent mem test errors in all versions I built (between 100% and 25% dependent on the gateware IIRC).

In any case, I don't think comparing mem test results is as instructive as looking at eye scans. The eye scans that both I and @marmeladapk posted look horrible compared to the one that Florent posted from my board. If Florent can reproduce something that looks like the previous eye scans with the old gateware, then it's not so relevant what the mem test says.

@whitequark
Copy link
Contributor

Not likely. I'm running by board from a good 5A linear PSU and Sayma only draws something like 1.7A IIRC.

I've calculated the maximum rated power draw based on the values in the schematics just today. AMC alone is 2.9A and AMC+RTM are 11.5A.

@hartytp
Copy link
Collaborator

hartytp commented Mar 20, 2018

I've calculated the maximum rated power draw based on the values in the schematics just today. AMC alone is 2.9A and AMC+RTM are 11.5A.

hmmm...that doesn't sound right and I've certainly never seen anything close to that. Is that maximum value when all the supplies are shorted? Or, a maximum when the FMCs and everything else is fully loaded?

IIRC @gkasprow posted the expected current draw in another issue and it was around the 2A mark (again, I checked this when picking the PSU to work with) which is consistent with what I've seen when running the boards.

@hartytp
Copy link
Collaborator

hartytp commented Mar 20, 2018

Thinking about this more, the board where @gkasprow and @marmeladapk carefully verified the PI also showed bad eye scans, which also suggests that this isn't related to the PSU I'm using (we looked into all that carefully before M-Labs started their thorough gateware investigation....)

@whitequark
Copy link
Contributor

Is that maximum value when all the supplies are shorted? Or, a maximum when the FMCs and everything else is fully loaded?

If I'm reading it right, the latter. It's a very conservative number, of course, I calculated it while trying to figure out why a power connector burned up in the lab.

@hartytp
Copy link
Collaborator

hartytp commented Mar 20, 2018

ack.

Well, for a very conservative estimate, that's probably consistent with what I see then. There is a tonne of stuff we're not using (clock mezzanine, ADCs etc). Plus, I think that budgeted for a lot of power consumption in the AFEs, which I don't even have atm. Good reminder though to keep an eye on this once we start plugging more things into Sayma.

@enjoy-digital
Copy link
Contributor

With @hartytp AMC (no RTM) and last gateware/bootloader i get:

  • 32 bits ddram: no calibration/memtest error on 1000 consecutive startups.
  • 64 bits ddram: no calibration/memtest error on 1000 consecutive startups.
    I'm now going to test continuous memtests with the 64 bits ddram gateware.

sayma_hartytp_ddram_startup_test.zip

@enjoy-digital
Copy link
Contributor

@hartytp: your board has been running memtest (64 bits ddram) continuously today without errors (>8h).
It would be good to have results from @marmeladapk and @jbqubit to know if all boards are now working correctly. (building also things on their side if possible to be sure results are the same between builds, if it's too long to rebuild, i can provide the bistreams) Do you want i send your board back tomorrow or should i wait results from others boards?

@hartytp
Copy link
Collaborator

hartytp commented Mar 21, 2018

@enjoy-digital feel free to hang on to that board a bit longer if it helps - I have to work on other things for the next week or so.

Id still be interested to know which of the gw changes fixed this issue.

@marmeladapk
Copy link
Contributor

@jbqubit has (or will soon get) one of our Saymas which had worse results. Our second Sayma is in use by wizath at the moment (he's working on MMC).

@jbqubit
Copy link
Contributor Author

jbqubit commented Mar 23, 2018

Built from master with SAWG. Loaded onto board that I've had for many months -- not the one that @marmeladapk mentioned is in the mail.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2018 M-Labs Limited

Bootloader CRC passed
Gateware ident 4.0.dev+775.g06359076.dirty
Initializing SDRAM...
DQS initial delay: 93 taps
Write leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000100011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000000000000000000000000000000000000001010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 93 taps
Write leveling: 87 86 112 104 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110010010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3:76-216 2:62-210 1:37-188 0:33-175 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000006s]  INFO(runtime): ARTIQ runtime starting...
[     0.003889s]  INFO(runtime): software version 4.0.dev+722.g2edf65f5
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+775.g06359076.dirty
[     0.016970s]  INFO(runtime): log level set to INFO by default
[     0.022671s]  INFO(runtime): UART log level set to INFO by default
[     0.028809s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.345181s]  INFO(board_artiq::serwb): done.
[     0.348330s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+775.g06359076.dirty
[     0.356300s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.356006s]  INFO(runtime): continuing boot
[     1.359026s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found
[     1.365294s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 configuration...
[     1.383756s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.387680s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.661008s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #0 failed (JESD ready timeout), retrying
[     1.679555s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.683471s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.957005s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #1 failed (JESD ready timeout), retrying
[     1.975549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.979465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.253006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #2 failed (JESD ready timeout), retrying
[     2.271550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.275466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.549005s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #3 failed (JESD ready timeout), retrying
[     2.567549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.571465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.845006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #4 failed (JESD ready timeout), retrying
[     2.863549s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.867465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.141006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #5 failed (JESD ready timeout), retrying
[     3.159550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     3.163465s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.437006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #6 failed (JESD ready timeout), retrying
[     3.455550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     3.459466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     3.733007s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #7 failed (JESD ready timeout), retrying

@jbqubit
Copy link
Contributor Author

jbqubit commented Mar 23, 2018

In subsequent load.... success.

|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017-2018 M-Labs Limited

Bootloader CRC passed
Gateware ident 4.0.dev+775.g06359076.dirty
Initializing SDRAM...
DQS initial delay: 93 taps
Write leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000010111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
DQS initial delay: 93 taps
Write leveling: 84 87 112 107 done
Read leveling scan:
Module 3:
00000000000000000000000000000000000000000000000000000000000000000000000001101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 2:
00000000000000000000000000000000000000000000000000000000000011011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 1:
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Module 0:
00000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101010101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Read leveling: 3:75-216 2:59-208 1:39-186 0:32-176 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000006s]  INFO(runtime): ARTIQ runtime starting...
[     0.003889s]  INFO(runtime): software version 4.0.dev+722.g2edf65f5
[     0.010157s]  INFO(runtime): gateware version 4.0.dev+775.g06359076.dirty
[     0.016970s]  INFO(runtime): log level set to INFO by default
[     0.022671s]  INFO(runtime): UART log level set to INFO by default
[     0.028809s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.187859s]  INFO(board_artiq::serwb): done.
[     0.191008s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+775.g06359076.dirty
[     0.198978s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1.198006s]  INFO(runtime): continuing boot
[     1.201025s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found
[     1.207293s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 configuration...
[     1.225755s]  INFO(board_artiq::ad9154): AD9154-0 found
[     1.229680s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     1.312620s]  INFO(board_artiq::ad9154): AD9154-0 PRBS test
[     2.327627s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.331544s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.419600s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #0 failed (bad SYNC), retrying
[     2.437273s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.441189s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.715006s]  WARN(board_artiq::ad9154): AD9154-0 config attempt #1 failed (JESD ready timeout), retrying
[     2.733550s]  INFO(board_artiq::ad9154): AD9154-0 found
[     2.737466s]  INFO(board_artiq::ad9154): AD9154-0 configuration...
[     2.830708s]  INFO(board_artiq::ad9154): AD9154-1 found
[     2.834630s]  INFO(board_artiq::ad9154): AD9154-1 configuration...
[     2.917573s]  INFO(board_artiq::ad9154): AD9154-1 PRBS test

@enjoy-digital
Copy link
Contributor

@jbqubit: thanks for testing.

@hartytp
Copy link
Collaborator

hartytp commented Mar 26, 2018

Is there anything else to do here, or can we close this (we can re-open if the issue resurfaces)? AFAICT, the eye scans now look good on all boards that we have access to.

Did you plan to track down which commit/change fixed this problem for future reference? Or, is it not worth the hassle?

@jbqubit
Copy link
Contributor Author

jbqubit commented Mar 26, 2018

Seems like adding a unit test based on the eye-diagram would be a good measure. Especially in light of recent experience that modifying other parts of the UltraScale gateware (eg SAWG) can impact SDRAM timing.

@sbourdeauducq
Copy link
Member

Seems like adding a unit test based on the eye-diagram would be a good measure.

The bootloader memory test is now extended, which should catch more subtle problems.

Did you plan to track down which commit/change fixed this problem for future reference? Or, is it not worth the hassle?

Let's do this once we are absolutely certain that it works correctly on all boards with the current code. Otherwise, if we change the code there, we're adding more loose screws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants