Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serwb intermittently fails to initialize #856

Closed
sbourdeauducq opened this issue Nov 27, 2017 · 46 comments
Closed

serwb intermittently fails to initialize #856

sbourdeauducq opened this issue Nov 27, 2017 · 46 comments
Assignees
Milestone

Comments

@sbourdeauducq
Copy link
Member

This occurs on the Sayma1 board we have on the HK server.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Dec 13, 2017

Sometimes the initialization fails in a loop and this is resolved by reloading the RTM FPGA:

[     7.715943s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     7.890916s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.065888s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.240862s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.415835s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.590808s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.765780s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     8.940753s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.115726s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.290699s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     9.465672s]  WARN(board::serwb): AMC/RTM serwb bridge initialization failed, retrying.

...RTM FPGA reloaded by JTAG...

[     9.616264s]  INFO(board::serwb): done.
[     9.618792s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[    10.618006s]  INFO(runtime): continuing boot
[    10.620961s]  INFO(board::hmc830_7043::hmc830): HMC830 found
[    10.626594s]  INFO(board::hmc830_7043::hmc830): HMC830 configuration...
[    10.633283s]  INFO(board::hmc830_7043::hmc830): waiting for lock...

@enjoy-digital
Copy link
Contributor

Thanks. I look at that today.

@sbourdeauducq
Copy link
Member Author

The following change to MiSoC breaks serwb 100%:

diff --git a/misoc/targets/sayma_amc.py b/misoc/targets/sayma_amc.py
index 5737da95..aeb37cab 100755
--- a/misoc/targets/sayma_amc.py
+++ b/misoc/targets/sayma_amc.py
@@ -123,7 +123,7 @@ class MiniSoC(BaseSoC):
             self.config["RGMII_CLOCK_REROUTED"] = None
             si5324_clkin = self.platform.request("si5324_clkin")
             si5324_clkout = self.platform.request("si5324_clkout_fabric")
-            self.specials += DifferentialOutput(eth_clocks.rx, si5324_clkin.p, si5324_clkin.n)
+            self.specials += DifferentialOutput(ClockSignal(), si5324_clkin.p, si5324_clkin.n)
             eth_clocks.rx = Signal()
             self.specials += DifferentialInput(si5324_clkout.p, si5324_clkout.n, eth_clocks.rx)
         self.submodules.ethphy = LiteEthPHY(eth_clocks,

migen 775572ea7, misoc f509de0cb, artiq 2b01aa2

@enjoy-digital
Copy link
Contributor

Hmm ok, at least this can help me understand what is going on.

@sbourdeauducq
Copy link
Member Author

Also the RTM design doesn't meet timing...

@enjoy-digital
Copy link
Contributor

I'm looking at that.

@enjoy-digital
Copy link
Contributor

enjoy-digital commented Dec 21, 2017

@sbourdeauducq: i should have fixed timing on RTM design (a false path was missing). It seems I'm not able to reproduce easily the issue.

I tried adding self.specials += DifferentialOutput(ClockSignal(), si5324_clkin.p, si5324_clkin.n) to my design but serwb is still working. Can you always enable debug on serwb while we still have the issue? It could help me understand what is going on.

For the case where RTM is reloaded by JTAG, how was it loaded initially? from flash? I'm just trying to understand because RTM should automatically be reseted by AMC when retrying. Are you sure RTM was correctly loaded?

@sbourdeauducq
Copy link
Member Author

Try on the HKG boards with SSH?
RTM is always loaded with JTAG, there is currently no other way.

@jbqubit
Copy link
Contributor

jbqubit commented Jan 12, 2018

What's the status of this?

@sbourdeauducq
Copy link
Member Author

Try it on your board. Could be another hw problem. Tom and Florent are not experiencing it.

@jbqubit
Copy link
Contributor

jbqubit commented Jan 12, 2018

Built .bit this afternoon from master.

Sayma_AMC TS190717-7

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 42 69 56 78 73 88 54 62 done
Read delays: 7:00-160 6:08-181 5:57-229 4:70-242 3:101-260 2:109-273 1:127-286 0:134-294 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003857s]  INFO(runtime): software version 4.0.dev+404.gac3c3871
[     0.010118s]  INFO(runtime): gateware version 4.0.dev+404.gac3c3871
[     0.016370s]  INFO(runtime): log level set to INFO by default
[     0.022096s]  INFO(runtime): UART log level set to INFO by default
[     0.028257s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.711094s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.391311s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.071214s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.770865s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     3.452872s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     4.138800s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     4.823377s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     5.504940s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.

Sayma_AMC TS190717-2
This AMC has different behavior. It hangs...

$ flterm /dev/ttyUSB1

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 56 78 61 86 73 97 58 68 done
Read delays: 7:02-175 6:19-195 5:67-241 4:81-252 3:103-279 2:108-287 1:139-312 0:147-320 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003857s]  INFO(runtime): software version 4.0.dev+404.gac3c3871
[     0.010118s]  INFO(runtime): gateware version 4.0.dev+404.gac3c3871
[     0.016384s]  INFO(runtime): log level set to INFO by default
[     0.022103s]  INFO(runtime): UART log level set to INFO by default
[     0.028257s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

Hangs here.

@hartytp
Copy link
Collaborator

hartytp commented Jan 12, 2018

@jbqubit hmm...interesting.

  • Can you try with the current sayma amc standalone and sayma RTM condo packages, please?
  • How did you flash the amc + load the RTM?
  • are all power supply lights on both boards on?

@jbqubit
Copy link
Contributor

jbqubit commented Jan 12, 2018

For the logs just reported I built .bit for RTM and amc stand alone. And flashed using ~/github/m-labs/sinara$ artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma.

are all power supply lights on both boards on?

Yes.

Can you try with the current sayma amc standalone and sayma RTM condo packages, please?
OK. I installed artiq-sayma_amc-standalone and artiq-sayma_rtm from conda and flashed after fixing some typos in artiq_flash (cf #890).

$ flterm /dev/ttyUSB1

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 54 77 58 85 72 96 57 64 done
Read delays: 7:04-170 6:19-196 5:62-240 4:80-249 3:102-278 2:107-288 1:134-311 0:146-312 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003859s]  INFO(runtime): software version 4.0.dev+400.g6d58c439
[     0.010120s]  INFO(runtime): gateware version 4.0.dev+400.g6d58c439
[     0.016385s]  INFO(runtime): log level set to INFO by default
[     0.022105s]  INFO(runtime): UART log level set to INFO by default
[     0.028259s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 54 77 59 85 72 96 56 65 done
Read delays: 7:05-176 6:17-192 5:65-242 4:78-251 3:102-273 2:106-284 1:132-140 0:145-314 done
SDRAM initialized
Memory test failed (43751/1114624 words incorrect)
Halting.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 55 77 60 86 72 96 57 63 done
Read delays: 7:02-171 6:16-194 5:66-241 4:79-249 3:102-277 2:107-283 1:135-311 0:149-318 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003859s]  INFO(runtime): software version 4.0.dev+400.g6d58c439
[     0.010120s]  INFO(runtime): gateware version 4.0.dev+400.g6d58c439
[     0.016385s]  INFO(runtime): log level set to INFO by default
[     0.022105s]  INFO(runtime): UART log level set to INFO by default
[     0.028259s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

@sbourdeauducq
Copy link
Member Author

You need to load the rtm manually.

@sbourdeauducq
Copy link
Member Author

@jbqubit
Copy link
Contributor

jbqubit commented Jan 19, 2018

Ok. Here's what I'm now doing.

python -m artiq.gateware.targets.sayma_rtm 
python -m artiq.gateware.targets.sayma_amc_standalone 
find . -name "*.bit 
find /home/britton/anaconda3 -name "xilinx-xcu.cfg" 
 
openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 0 ./artiq_sayma_rtm/top.bit; exit" 
 
openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 1 ./misoc_standalone_sayma_amc/gateware/top.bit; exit" 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 0 ./artiq_sayma_rtm/top.bit; exit"  
Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: xc7.tap: IR capture error; saw 0x3f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : gdb services need one or more targets defined
loaded file ./artiq_sayma_rtm/top.bit to pld device 0 in 3s 588074us
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ 
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_new.cfg -c "pld load 1 ./misoc_standalone_sayma_amc/gateware/top.bit; exit"  
Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: xc7.tap: IR capture error; saw 0x3f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : gdb services need one or more targets defined
loaded file ./misoc_standalone_sayma_amc/gateware/top.bit to pld device 1 in 12s 874773us

@sbourdeauducq
Copy link
Member Author

You have the 1.8V and/or jtag bugs. Power cycle boards, replug USB connectors, until there are no errors.

@jbqubit
Copy link
Contributor

jbqubit commented Jan 19, 2018

The 1.8V is fine. I'm watching it on a scope.

I've applied the JTAG white-wire fix sinara-hw/sinara#463

artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma was working just fine on this board a couple days ago. But looks like @whitequark made some changes (f77aa9b#diff-23ef9b8c7f366ae0fd8efc4411bdc7a8) to artiq_flash. And now artiq_flash doesn't work for want of bscan_spi_xcku040-sayma.bit. @whitequark should I expect to be able to use artiq_flash now for Sayma?

@whitequark
Copy link
Contributor

@whitequark should I expect to be able to use artiq_flash now for Sayma?

Sure, I use it for Sayma.

And now artiq_flash doesn't work for want of bscan_spi_xcku040-sayma.bit.

What exactly is the error message here?

@jbqubit
Copy link
Contributor

jbqubit commented Jan 19, 2018

$ artiq_flash --srcbuild ./misoc_standalone_sayma_amc -t sayma_rtm
proxy gateware bitstream bscan_spi_xcku040-sayma.bit not found
$ find /home/britton/anaconda3/envs/artiq-dev2 -name "*xcku040-sayma.bit"
(artiq-dev2) britton@britton1:~/artiq-dev2/artiq
$ 

@jordens
Copy link
Member

jordens commented Jan 19, 2018

It's an old openocd. @whitequark

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Jan 19, 2018

Considering:

  • the correlation between the boards with the 1.8V bug and the boards with this serwb issue.
  • the high level of noise in the 1.8V supply (that eventually leads to the regulator shutting down the rail).
  • the fact that the serwb bank on AMC is powered from that noisy 1.8V.
  • the fact that this bug is on the AMC side (putting my problematic RTM on Florent's working AMC is reliable).

...it is possible that this is simply another consequence of the 1.8V bug.

@jordens
Copy link
Member

jordens commented Jan 19, 2018

Yes. The "Open On-Chip Debugger 0.10.0 (2017-02-03-06:53)" Joe installed won't help even with those issues resolved.

@jbqubit
Copy link
Contributor

jbqubit commented Jan 19, 2018

I've been successfully flashing this board for several weeks now using artiq_flash. See here. Pending #898 I'll try again.

@enjoy-digital
Copy link
Contributor

By reducing serwb linerate from 1.25Gbps to 625Mbps, it seems to be reliable on at least a board that has the 1.8v issue. (it's difficult to say if it's related or not). Let's use 625Mbps for now.
Note on sayma1 (that had 1.8v issue), when restarting AMC with artiq_flash, RTM is no longer alive and need to be reloaded by JTAG (this is not the case with the board i bring with me). This is maybe another issue.

@gkasprow
Copy link
Collaborator

@enjoy-digital once you restart AMC you may toggle config pins so RTM gets unconfigured...

@jbqubit
Copy link
Contributor

jbqubit commented Jan 25, 2018

@enjoy-digital IT's indeed odd that 1.25 Gbps worked some time ago but now doesn't. Is serwb now working reliably at 635 Mbps?

@enjoy-digital
Copy link
Contributor

@jbqubit: i don't think serwb 1.25Gbps has a different behaviour than before. Just that it seems not reliable with some of the boards at 1.25Gbps and seems to be reliable with all boards we tested at 625Mbps.

Note that there are 2 problems here:

  • the fact that reconfiguring the AMC can unconfigure the RTM (will be solved when AMC will load the RTM).
  • the fact that 1.25Gbps does not seem reliable with all boards. If 625Mbps is reliable, let's use it for now.

@jbqubit
Copy link
Contributor

jbqubit commented Jan 26, 2018

Ok So problem 1 relates to #813. Agreed that 625 Mbps is fine for getting started. :)

@sbourdeauducq
Copy link
Member Author

@enjoy-digital Sometimes when the RTM FPGA is not loaded, it prints:

[     1.454088s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.

and sometimes it just waits for it to be loaded. Why is that?

@jordens
Copy link
Member

jordens commented Feb 27, 2018

Serwb also appears to hang randomly on sayma-3, mostly when waiting for HMC830 lock. This happens with current master as well as spi2. And @hartytp also sees this (with a slightly older master).

[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003865s]  INFO(runtime): software version 4.0.dev+624.gb466a569
[     0.010130s]  INFO(runtime): gateware version 4.0.dev+630.g54b51493
[     0.016396s]  INFO(runtime): log level set to INFO by default
[     0.022113s]  INFO(runtime): UART log level set to INFO by default
[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.746275s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.419144s]  INFO(board_artiq::serwb): done.
[     1.422262s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+624.gb466a569
[     1.429735s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     2.429006s]  INFO(runtime): continuing boot
[     2.431966s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 found
[     2.438116s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 configuration...
[     2.445347s]  INFO(board_artiq::hmc830_7043::hmc830): waiting for lock...

@jbqubit
Copy link
Contributor

jbqubit commented Feb 27, 2018

My board as well. No flashing errors.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 31 53 45 65 36 52 24 35 done
Read delays: 7:04-98 6:16-115 5:67-157 4:74-169 3:99-196 2:104-192 1:123-217 0:135-226 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000005s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+618.g820c8342
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+618.g820c8342
[     0.016396s]  INFO(runtime): log level set to INFO by default
[     0.022115s]  INFO(runtime): UART log level set to INFO by default
[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.736673s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     1.520363s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     2.324061s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[     3.338301s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
$ artiq_flash --target sayma --variant standalone --dir /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio
Design: top;COMPRESS=TRUE;UserID=0XFFFFFFFF;Version=2017.4.1
Part name: xcku040-ffva1156-1-c
Date: 2018/02/22
Time: 18:10:23
Bitstream payload length: 0xc46a7c
Open On-Chip Debugger 0.10.0-00013-gbb7beda (2018-02-13-15:56)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Info : JTAG tap: xc7.tap tap/device found: 0x0362e093 (mfg: 0x049 (Xilinx), part: 0x362e, ver: 0x0)
Info : JTAG tap: xcu.tap tap/device found: 0x13822093 (mfg: 0x049 (Xilinx), part: 0x3822, ver: 0x1)
Info : gdb server disabled
RTM FPGA XADC:
TEMP 42.50 C
VCCINT 1.002 V
VCCAUX 1.796 V
VCCBRAM 1.002 V
VPVN 0.000 V
VREFP 0.000 V
VREFN 0.000 V
VCCPINT 0.000 V
VCCPAUX 0.000 V
VCCODDR 0.000 V
AMC FPGA XADC:
TEMP 40.70 C
VCCINT 0.890 V
VCCAUX 1.785 V
VCCBRAM 0.959 V
VPVN 0.000 V
VREFP 0.000 V
VREFN 0.000 V
VCCPINT 0.000 V
VCCPAUX 0.000 V
VCCODDR 0.000 V
loaded file /home/britton/anaconda3/envs/artiq-dev/share/bscan-spi-bitstreams/bscan_spi_xcku040-sayma.bit to pld device 1 in 3s 849801us
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 0 took 715 ms
Info : sector 1 took 729 ms
...
Info : sector 194 took 713 ms
Info : sector 195 took 708 ms
Info : sector 196 took 725 ms
erased sectors 0 through 196 on flash bank 0 in 141.517853s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 12872316 bytes from file /tmp/tmpul9tr71q to flash bank 0 at offset 0x00000000 in 95.553131s (131.556 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 12872316 bytes from file /tmp/tmpul9tr71q and flash bank 0 at offset 0x00000000 in 21.167046s (593.877 KiB/s)
contents match
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 0 took 733 ms
Info : sector 1 took 744 ms
erased sectors 0 through 1 on flash bank 1 in 1.477441s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 74980 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/bootloader/bootloader.bin to flash bank 1 at offset 0x00000000 in 0.562994s (130.059 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 74980 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/bootloader/bootloader.bin and flash bank 1 at offset 0x00000000 in 0.124591s (587.704 KiB/s)
contents match
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
flash 'jtagspi' found at 0x00000000
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
Info : sector 5 took 734 ms
Info : sector 6 took 738 ms
Info : sector 7 took 742 ms
Info : sector 8 took 741 ms
Info : sector 9 took 751 ms
Info : sector 10 took 740 ms
Info : sector 11 took 739 ms
Info : sector 12 took 742 ms
Info : sector 13 took 743 ms
erased sectors 5 through 13 on flash bank 1 in 6.669954s
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
wrote 588232 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/runtime/runtime.fbi to flash bank 1 at offset 0x00050000 in 4.515156s (127.226 KiB/s)
Info : Found flash device 'micron n25q256 3v' (ID 0x0019ba20)
read 588232 bytes from file /home/britton/artiq-dev/artiq/artiq_sayma_with_drtio/standalone/software/runtime/runtime.fbi and flash bank 1 at offset 0x00050000 in 0.975071s (589.132 KiB/s)
contents match

@jordens
Copy link
Member

jordens commented Feb 27, 2018

Did you even load the RTM gateware?

@jbqubit
Copy link
Contributor

jbqubit commented Feb 28, 2018

Yes, I'm flashing the RTM. I neglected to paste it -- now included below. Booting fails with Memory test failed or serwb bridge initialization failed. Once it advances to the point where the failure was "HMC830 lock timeout".

$ openocd -s /home/britton/anaconda3/envs/artiq-dev/share/openocd/scripts -f ~/sayma_flash.cfg -c "pld load 0 /home/britton/artiq-dev/artiq/artiq_sayma/rtm_gateware/rtm.bit; exit"  
Open On-Chip Debugger 0.10.0-00013-gbb7beda (2018-02-13-15:56)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 5000 kHz
Info : clock speed 5000 kHz
Info : JTAG tap: xc7.tap tap/device found: 0x0362e093 (mfg: 0x049 (Xilinx), part: 0x362e, ver: 0x0)
Info : JTAG tap: xcu.tap tap/device found: 0x13822093 (mfg: 0x049 (Xilinx), part: 0x3822, ver: 0x1)
Warn : gdb services need one or more targets defined
loaded file /home/britton/artiq-dev/artiq/artiq_sayma/rtm_gateware/rtm.bit to pld device 0 in 1s 52267us

$ cat sayma_flash.cfg


interface ftdi
#ftdi_device_desc "Quad RS232-HS"
ftdi_vid_pid 0x0403 0x6011
# if there are multiple Sayma:
#ftdi_location 5:2
ftdi_channel 0
# EN_USB_JTAG on ADBUS7: out, high
# nTRST on ADBUS4: out, high, but R46 is DNP

ftdi_layout_init 0x0098 0x008b
reset_config none
adapter_khz 5000
transport select jtag
source [find cpld/xilinx-xc7.cfg]
set CHIP XCKU040
source [find cpld/xilinx-xcu.cfg]
init

@hartytp
Copy link
Collaborator

hartytp commented Feb 28, 2018

Did you even load the RTM gateware?

@jordens Starting the informal etiquette manual: AFAICT, the word "even" here serves no purpose other than to make a helpful comment come across as somewhat rude/condescending.

@jboulder AFAICT you need to be a little careful over the timing of loading (not flashing if we want to be pedantic -- if we could flash it, life would be much easier) the RTM FPGA. At some point during startup the AMC restarts the RTM FPGA, and loading must be done after that point. If find that if you get the timing right this all works reliably.

@sbourdeauducq
Copy link
Member Author

At some point during startup the AMC restarts the RTM FPGA

It doesn't. I don't know why it looks like it does, there is probably another bug somewhere.

@hartytp
Copy link
Collaborator

hartytp commented Feb 28, 2018

hmmmm...well it certainly behaves a lot as if it does.

@jordens
Copy link
Member

jordens commented Feb 28, 2018

@hartytp Reading undue rudeness into that question is thin-skinned IMHO. Especially in the light of past experience with negligent and careless treatment of advice and instructions and the frustration associated with it. You yourself are confirming that by suggesting that Joe personally may not have been careful. The "rudeness" seems to be already healed by the purely technical rephrasing "Is the RTM gateware even loaded?". Would you consider that condescending?
But yes. Until the RTM is loaded automatically people need to be extra-careful when testing this.

@hartytp
Copy link
Collaborator

hartytp commented Feb 28, 2018

@hartytp Reading undue rudeness into that question is thin-skinned IMHO. Especially in the light of past experience with negligent and careless treatment of advice and instructions and the frustration associated with it. You yourself are confirming that by suggesting that Joe personally may not have been careful. The "rudeness" seems to be already healed by the purely technical rephrasing "Is the RTM gateware even loaded?". Would you consider that condescending?

I don't understand your argument here. You seem to be acknowledging that your comment was phrased in a way that was deliberately rude, but arguing that this is appropriate given the history. Or, is your point that you could easily have been ruder, so we should be thankful for the relatively restrained level of rudeness you chose to adopt for your post?

In either case, the work "even" here adds nothing to your point on a technical level, but to any native English speaker it implies an element of rudeness. Given the current tensions, it shouldn't be a surprise if that rubs people the wrong way.

@enjoy-digital
Copy link
Contributor

@hartytp
Copy link
Collaborator

hartytp commented Feb 28, 2018

Thanks for confirming that (I knew it does, because I've experience it many times when working with Sayma).

@sbourdeauducq
Copy link
Member Author

serwb is reseting the RTM FPGA at startup:

Yes, I know, but that's not touching the bitstream.

@jordens
Copy link
Member

jordens commented Feb 28, 2018

@hartytp My points are, that it wasn't meant rude, that I don't consider it rude if I am asked that question by you, that "even" is not an insult (just search through your own usage of it on artiq or sinara), that I wouldn't recommend considering it rude for the etiquette rules you are writing, and that I claim on the basis of the general level of rudeness by Joe that even if you as a third person would consider it rude, Joe is not in a position to complain about it.

@jbqubit
Copy link
Contributor

jbqubit commented Feb 28, 2018

The reason for my post that prompted your rude comment is that booting didn't hang at the usual point when waiting for RTM FPGA. What I expected:

[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

What I saw:

[     0.028265s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.736673s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
...

This precluded interaction with the RTM FPGA which is why I didn't post about it.

Today I see similar behavior but after a longer delay.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 26 52 39 60 45 60 29 38 done
Read delays: 7:00-129 6:01-155 5:47-186 4:55-196 3:97-226 2:104-239 1:116-259 0:129-267 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+636.gf97163cd
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+636.gf97163cd
[     0.016395s]  INFO(runtime): log level set to INFO by default
[     0.022114s]  INFO(runtime): UART log level set to INFO by default
[     0.028264s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[   403.212222s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   403.980793s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   404.777906s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   405.481089s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   406.229492s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   406.972165s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   407.699517s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   408.394776s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   409.114399s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   409.830605s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   410.587286s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   411.789428s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   412.683285s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   413.375680s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   414.073249s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   414.898720s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   415.631904s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   416.400393s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   417.255687s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   418.054276s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
[   418.800799s]  WARN(board_artiq::serwb): AMC/RTM serwb bridge initialization failed, retrying.
panic at /home/britton/artiq-dev/artiq/artiq/firmware/runtime/main.rs:268: exception 7 at PC 0x408cc3c8, EA 0x40143cf0
backtrace for software version 4.0.dev+636.gf97163cd:

@jordens I'm trying to communicate what I see to assist M-Labs in debugging this Issue. Your frequent use of language that implies that I am lazy and careless does little to encourage this type of constructive feedback.


Since this is an Issue on something not working it didn't occur to me to post an example of success. Perhaps doing so will help others know what to expect. Following the instructions on the mailing list, I wait for [ 0.028264s] INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready... and then load the RTM FPGA I see the following.

 __  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|

MiSoC Bootloader
Copyright (c) 2017 M-Labs Limited

Bootloader CRC passed
Initializing SDRAM...
Write leveling: 25 50 38 62 47 62 32 40 done
Read delays: 7:00-128 6:01-153 5:48-187 4:56-196 3:94-225 2:101-236 1:115-258 0:130-266 done
SDRAM initialized
Memory test passed

Booting from flash...
Starting firmware.
[     0.000004s]  INFO(runtime): ARTIQ runtime starting...
[     0.003864s]  INFO(runtime): software version 4.0.dev+636.gf97163cd
[     0.010129s]  INFO(runtime): gateware version 4.0.dev+636.gf97163cd
[     0.016395s]  INFO(runtime): log level set to INFO by default
[     0.022114s]  INFO(runtime): UART log level set to INFO by default
[     0.028264s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     8.592829s]  INFO(board_artiq::serwb): done.
[     8.595959s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+636.gf97163cd
[     8.603428s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     9.603005s]  INFO(runtime): continuing boot
[     9.605972s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 found
[     9.612120s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 configuration...
[     9.619353s]  INFO(board_artiq::hmc830_7043::hmc830): waiting for lock...
[    11.620010s] ERROR(board_artiq::hmc830_7043::hmc830): HMC830 lock timeout. Register dump:
[    11.626901s] ERROR(board_artiq::hmc830_7043::hmc830): [0x00] = 0xa7975
[    11.633411s] ERROR(board_artiq::hmc830_7043::hmc830): [0x01] = 0x0002
[    11.639837s] ERROR(board_artiq::hmc830_7043::hmc830): [0x02] = 0x0002
[    11.646262s] ERROR(board_artiq::hmc830_7043::hmc830): [0x03] = 0x0030
[    11.652688s] ERROR(board_artiq::hmc830_7043::hmc830): [0x04] = 0x0000
[    11.659114s] ERROR(board_artiq::hmc830_7043::hmc830): [0x05] = 0x0000
[    11.665539s] ERROR(board_artiq::hmc830_7043::hmc830): [0x06] = 0x303ca
[    11.672052s] ERROR(board_artiq::hmc830_7043::hmc830): [0x07] = 0x014d
[    11.678477s] ERROR(board_artiq::hmc830_7043::hmc830): [0x08] = 0xc1beff
[    11.685076s] ERROR(board_artiq::hmc830_7043::hmc830): [0x09] = 0x153fff
[    11.691676s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0a] = 0x2046
[    11.698101s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0b] = 0x7c061
[    11.704614s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0c] = 0x0000
[    11.711039s] ERROR(board_artiq::hmc830_7043::hmc830): [0x0f] = 0x0081
[    11.717465s] ERROR(board_artiq::hmc830_7043::hmc830): [0x10] = 0x0080
[    11.723890s] ERROR(board_artiq::hmc830_7043::hmc830): [0x11] = 0x7ffff
[    11.730402s] ERROR(board_artiq::hmc830_7043::hmc830): [0x12] = 0x0000
[    11.736828s] ERROR(board_artiq::hmc830_7043::hmc830): [0x13] = 0x1259
panic at src/libcore/result.rs:906: cannot initialize HMC830/7043: "HMC830 lock timeout"
backtrace for software version 4.0.dev+636.gf97163cd:
0x4002398c
0x4004504c
0x400060c0
0x40002fa4
0x400236dc
halting.

I interpret this to mean that the HMC830 is blocking, an unrelated Issue.

@enjoy-digital
Copy link
Contributor

serwb has been refactored (architecture and clocking). An issue has aslo been found with un-initialized HMC7043 (sinara-hw/sinara#541) that could explain the issue. To prevent un-initialized HMC7043 to introduce noise in AMC FPGA, clock buffers are now disabled at startup. (8212e46).
Closing this since content if this issue is probably no longer relevant.

@jbqubit
Copy link
Contributor

jbqubit commented May 1, 2018

Thanks @enjoy-digital, @gkasprow !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants