Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYS/RTIO clock merge #129

Merged
merged 27 commits into from
Oct 10, 2022
Merged

SYS/RTIO clock merge #129

merged 27 commits into from
Oct 10, 2022

Conversation

Spaqin
Copy link
Contributor

@Spaqin Spaqin commented Jul 26, 2022

So for Kasli we'd like to use the RTIO clock (output of Si5324) as SYS clock, avoiding clock domain transfers, with a simple bootstrap clock before the bootloader changes the clock. It seems possible in general, but comes with its own challenges - e.g. no support for 100/150MHz clocks (unless we set the MMCM/PLL parameters to match these frequencies? change in gateware + software, essentially, different build).

Marking this PR as WIP as I've been having difficulties with it and it's nowhere near finished. There's also kc705 support in there to be done as well, and I suppose it should be eventually merged in tandem with ARTIQ part.

Few things that I encountered though that I would like an explanation for:

  • why is the div2 clock used in MMCM/PLL, rather than the 125MHz one?
  • why was sysclk running at ~114MHz instead of 125 (see clkfbout_mult_f @ 14.5 instead of 16)? this looks like an oversight?
  • Kasli QPLL took the buffered 125MHz from the reference directly (see line 209/256 of kasli.py) - I cannot use the "chosen clock" from BUFGMUX, as Vivado complains about unbuffered output, and sending the BUFGMUX output through BUFG/BUFH/BUFR resulted in unroutable design. Which should it use, is there a way to choose a clock for it too?

At the point of making this PR this method doesn't meet timing constraints and I can't seem to find a way to improve it (e.g. putting the si5324 clock through MMCM and then to BUFGMUX makes it even worse). Unless there's a different, better way to switch between the two clocks (probably is). I managed to run it at some point, but more by luck I suppose. I get garbage output on serial most of the time, possibly due to unmet timing, quite badly too.

------------------------------------------------------------------------------------------------
| Design Timing Summary
| ---------------------
------------------------------------------------------------------------------------------------

    WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  THS Failing Endpoints  THS Total Endpoints     WPWS(ns)     TPWS(ns)  TPWS Failing Endpoints  TPWS Total Endpoints  
    -------      -------  ---------------------  -------------------      -------      -------  ---------------------  -------------------     --------     --------  ----------------------  --------------------  
     -3.074       -5.992                      2                36450        0.056        0.000                      0                36446        0.264        0.000                       0                 15627  


Timing constraints are not met.

Open for suggestions, changes, criticism.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Jul 26, 2022

  • why is the div2 clock used in MMCM/PLL, rather than the 125MHz one?

Because of routing restrictions of IBUFDS_GTE2. You can't have the same output to the QPLL and to the fabric.

  • why was sysclk running at ~114MHz instead of 125 (see clkfbout_mult_f @ 14.5 instead of 16)? this looks like an oversight?

Kasli 1 with the lower speed grade FPGA cannot meet timing at 125MHz. I guess we will have to introduce a slower RTIO clock to support legacy systems.

  • Kasli QPLL took the buffered 125MHz from the reference directly (see line 209/256 of kasli.py) - I cannot use the "chosen clock" from BUFGMUX, as Vivado complains about unbuffered output, and sending the BUFGMUX output through BUFG/BUFH/BUFR resulted in unroutable design.

Indeed this is unroutable - the QPLL/transceiver needs to use the dedicated clocking pins and dedicated clocking network.
And you don't want to switch that clock, it needs to be exactly 125MHz for Ethernet. And the Ethernet clock domain is asynchronous to all other domains. So just keep that direct IBUFDS_GTE2 -> QPLL connection, it is fine (and there aren't many other options anyway).

Your code cannot work reliably because the system PLL/MMCM may unlock/glitch when you switch clocks. You cannot put a BUFGMUX at the input of a MMCM without precautions, which are difficult to take here because the MMCM needs to be continuously clocking the CPU. It seems to confuse the Vivado timing analyzer as well.

I suggest you redesign it based on this scheme from Xilinx UG472:
image

The top-left IBUFG would take the Si5324 output, and the top-right BUFG would become a BUFGMUX with the main input from the MMCM (as shown) and the IBUFDS_GTE2 ODIV2 output on its other input. It is fine if the system bootstraps at 62.5MHz, just correct the UART baudrate register. Note that BUFG and BUFGMUX correspond to the same physical resource inside the FPGA (BUFG is just a software abstraction with some BUFGMUX signals hidden), so the deskew should also work. The feedback BUFG would stay as on the UG472 diagram.


self.specials += Instance("IBUFDS_GTE2",
i_CEB=0,
i_I=si5324_out.p, i_IB=si5324_out.n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be called cdr_clk_... since it may not come from the Si5324 after WRPLL.

@@ -61,44 +99,56 @@ def __init__(self, platform):
pll_clk200 = Signal()
self.specials += [
Instance("MMCME2_BASE",
p_CLKIN1_PERIOD=16.0,
i_CLKIN1=self.clk125_div2,
# 150mhz not supported anymore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the place to mention this.

SoCSDRAM.__init__(self, platform,
clk_freq=125e6*14.5/16, cpu_reset_address=0x400000,
**kwargs)
kwargs["uart_initial_clk_freq"] = 62.5e6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mutate kwargs?

Just do uart_initial_clk_freq=62.5e6 below?

p_CLKOUT1_DIVIDE=2, p_CLKOUT1_PHASE=0.0, o_CLKOUT1=mmcm_sys4x,
p_CLKOUT2_DIVIDE=2, p_CLKOUT2_PHASE=90.0, o_CLKOUT2=mmcm_sys4x_dqs,
),
Instance("PLLE2_BASE",
p_CLKIN1_PERIOD=16.0,
i_CLKIN1=self.clk125_div2,
p_CLKIN1_PERIOD=si5324_period*2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just keep that one on clk125?
The IDELAYCTRL 200MHz clock is asynchronous to the others.

i_I=clk125.p, i_IB=clk125.n,
o_O=self.clk125_buf,
o_ODIV2=self.clk125_div2)
i_I=bootstrap.p, i_IB=bootstrap.n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to rename this, it could still be called clk125 to match the hw and migen pin names.

o_ODIV2=si5324_div2)

# required for qpll
self.clk125_buf = bootstrap_buf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Spaqin
Copy link
Contributor Author

Spaqin commented Aug 24, 2022

Although this looks to be what we need, I cannot get the system to boot anymore (no code is run at all, no UART output). Probably something I missed in the log.

With the BUFGMUX for cd_sys and just BUFG from MMCM for cd_sys4x, there will be a slew of warnings:

WARNING: [DRC REQP-1580] Phase alignment: Unsupported clocking topology used for ISERDESE2 ISERDESE2. This can result in corrupted data. The ISERDESE2/CLK / ISERDESE2/CLKDIV pins should be driven by the same source through the same buffer type or by a BUFIO / BUFR combination in order to have a proper phase relationship. Please refer to the Select I/O User Guide for supported clocking topologies of the chosen INTERFACE_TYPE mode.

For all ISERDESE2 and OSERDESE2 - makes sense, Vivado doesn't know the ISERDESE2 will be used after the clock hand-over, with proper relationship - but it does clutter the log.

cdr_clk_out = platform.request("si5324_clkout")

if freq == 125.0e6:
cdr_clk_period = 8.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation


platform.add_false_path_constraints(
self.clk125_buf,
self.cd_sys.clk, self.cd_sys4x.clk, self.cd_sys4x_dqs.clk, self.cd_clk200.clk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False path between cd_sys and cd_sys4x sounds wrong.

@sbourdeauducq
Copy link
Member

Although this looks to be what we need, I cannot get the system to boot anymore

Can you blink an LED with a counter in the sys domain?

clk125 = platform.request("clk125_gtp")
platform.add_period_constraint(clk125, 8.)
self.clk125_buf = Signal()
self.clk125_div2 = Signal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.pll_locked = CSRStatus()
self.mmcm_locked = CSRStatus()

# old "sys" clock, now bootstrap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just say "bootstrap clock"

else:
raise NotImplementedError

self.cdr_clk_buf = Signal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this attribute used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, you are moving the IBUFDS_GTE2 from ARTIQ kasli.py to here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler to use cdr_clk_clean_fabric / si5324_clkout_fabric.
BTW this (or your current code) will break DRTIO since you won't meet timing at the transceiver's TXDATA pins anymore. On DRTIO systems, the post-bootstrap system clock needs to be driven from the transceiver's TXOUTCLK pin. But we can deal with that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need something like this (from UG482), but with both TXUSRCLK2 and TXUSRCLK on the same clock (as done currently).
image
BUFH if you can use it would be lighter on resources.


# at 62.5MHz bootstrap cd, will get around 1ms
# todo: check if can do it faster
delay_counter = Signal(16, reset=0xFFFF_FFFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reset value is 32-bit.

# todo: check if can do it faster
delay_counter = Signal(16, reset=0xFFFF_FFFF)

# register to prevent glitches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add no_retiming

i_switch.eq(clock_switch_sync.o)
]

#2-reg the done signal to prevent glitches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use MultiReg instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which you are already doing below. Here one register with no_retiming is enough.

self.submodules += fsm

fsm.act("START",
If(i_switch == 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just If(i_switch, ...

)

fsm.act("RESET_START",
NextValue(reset, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply drive signals using eq() in fsm.act(). They are driven combinatorially then, with the current FSM state as input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NextValue adds a register.

def __init__(self, sdram_controller_type="minicon", hw_rev=None,
**kwargs):
def __init__(self, sdram_controller_type="minicon", hw_rev=None, rtio_sys_merge=False,
clk_freq=125e6*14.5/16, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't clk_freq be 125e6 now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backwards compatibility - by default it should go back to old frequency matching p_CLKFBOUT_MULT_F=14.5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. But changing this in ARTIQ at the same time as we update MiSoC shouldn't be difficult.
Also the timing usually passes anyway on Kasli 2 at 125MHz.

- use multireg instead of manual registers
- use combinatorial FSM where aplicable instead of reg

fsm.act("CLOCK_SWITCH",
reset.eq(1),
NextValue(o_switch, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just use done? It will definitely take more than a few cycles before the CPU reads the register, so it'll work.

reset = Signal()

# at 62.5MHz bootstrap cd, will get around 1ms
# todo: check if can do it faster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter, remove the todo.

@Spaqin Spaqin changed the title [WIP] SYS/RTIO clock merge SYS/RTIO clock merge Oct 5, 2022
@Spaqin Spaqin changed the title SYS/RTIO clock merge [WIP] SYS/RTIO clock merge Oct 5, 2022
@Spaqin
Copy link
Contributor Author

Spaqin commented Oct 6, 2022

Still in WIP because after moving the reboot to the firmware it seems rather unreliable - sometimes it reboots after firmware sets the clock switch bit, sometimes it doesn't and just continues onward.

@Spaqin Spaqin changed the title [WIP] SYS/RTIO clock merge SYS/RTIO clock merge Oct 6, 2022
@Spaqin
Copy link
Contributor Author

Spaqin commented Oct 6, 2022

PulseSynchronizer sometimes wouldn't synchronize the pulse - not sure why, but ~half the time it would fail. An even simpler approach seemed to have worked, though - reboots every time.

state[2].eq(o_switch),
]
# latch the clock switch (no going back)
self.sync.bootstrap += If(self.i_clk_sw, i_switch.eq(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a register not a latch and it won't work reliably (clocked by bootstrap, input in sys).
If you do want a latch use Instance of something like FDPE (check Xilinx docs) and the PRE input, and make sure PRE does not glitch (e.g. register output with no_retiming).

@sbourdeauducq sbourdeauducq merged commit 4fb0730 into m-labs:master Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants