Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limited support for Distributed Memory / LUTRAM #20

Open
hansemro opened this issue Dec 20, 2023 · 27 comments
Open

Limited support for Distributed Memory / LUTRAM #20

hansemro opened this issue Dec 20, 2023 · 27 comments

Comments

@hansemro
Copy link

hansemro commented Dec 20, 2023

Issue Description

memory_libmap pass in Yosys 0.18 and newer would synthesize LUTRAMs unsupported by nextpnr including:

  • RAMS32 (manually instantiated)
  • RAMD32 (manually instantiated)
  • RAMS64E (manually instantiated)
  • RAMD64E (manually instantiated)
  • RAM32X1S
  • RAM64X1S
  • RAM64X1S_1 (same as RAM64X1S with inverted clock)
  • RAM128X1S
  • RAM256X1S

Part of the issue stems from nextpnr not fully supporting all LUTRAMs in the Distributed RAM packer in xilinx/pack_dram.cc.

Resolving this should also address openXC7/demo-projects#6.

Tasks/Status

TODO: rewrite tasks

Development Branches

  • experimental branch with support for RAMS32, RAMS64E, RAM32X1S, RAM64X1S, RAM128X1S, RAM256X1S: https://github.com/hansemro/nextpnr-xilinx/commits/xc7-lutram-dev/
    • use with yosys 0.18 or newer to test
    • rebuild chipdb after building
    • broken as many incorrect assumptions were made in this branch:
      • does not check negative z height for newly supported cells, possibly breaking projects with several newly supported cells.
      • missing RAMS32/RAMD32 to LUT_OR_MEM transformations with DI1 and O6 ports used.

References

See 018-clb-ram minitest. Build and view design checkpoint in Vivado.

https://f4pga.readthedocs.io/projects/prjxray/en/latest/architecture/dram_configuration.html

https://docs.xilinx.com/v/u/en-US/ug474_7Series_CLB

https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb

https://docs.xilinx.com/r/en-US/ug953-vivado-7series-libraries

https://docs.xilinx.com/r/en-US/ug974-vivado-ultrascale-libraries

https://www.xilinx.com/content/dam/xilinx/support/documents/sw_manuals/xilinx14_7/7series_hdl.pdf

https://docs.amd.com/v/u/en-US/7series_hdl

https://github.com/Xilinx/XilinxUnisimLibrary/tree/master/verilog/src/unisims

@hansfbaier
Copy link
Collaborator

Very good, thanks!

@hansemro
Copy link
Author

Added support for RAM128X1S and RAM256X1S though not sufficiently tested. I was able to build litex-ddr-kc705 after latest changes. However, I am now running into DDR memtest issues it seems: https://gist.github.com/hansemro/5f48f4098e59f9db2e34ae25cb0b6ecd

@hansfbaier
Copy link
Collaborator

@hansemro Wow, that was quick! We might want to write some basic tests here:
https://github.com/openXC7/primitive-tests
Debugging the issue inside that complex design is probably too cumbersome.

@hansemro
Copy link
Author

@hansfbaier Sounds good. I'll try to write some tests soon. DDR issue could be totally unrelated to this.

@hansfbaier
Copy link
Collaborator

@hansemro Yes, very likely your code works. I never got 8 modules working nice with OpenXC7. At some point the timing just falls apart, because of congestion, I suppose. See my comment on your gist.

@hansemro
Copy link
Author

Rebased experimental branch on 8120acd (current stable-backports)

@hansfbaier
Copy link
Collaborator

@hansemro great, thanks

@hansemro
Copy link
Author

hansemro commented Dec 20, 2023

Fixed up RAM32X1D not creating RAMD32 instances (was creating RAMD64 instances previously) and made sure RAM32X1S is handled in pack_dram (forgot this one apparently).

@hansemro
Copy link
Author

hansemro commented Dec 20, 2023

Made initial tests: https://github.com/hansemro/primitive-tests/commits/lutram-tests/

  • Targets KC705 with its 200 MHz differential clock.
  • Uses basic clock division and reset propagation so that I can visibly check patterns via LED.
  • Basic 1,0,1,0,... write pattern.
  • Simple FSM: Reset -> Clear -> Write -> Read -> Finish

Notably nextpnr-xilinx hangs with RAM256X1S. Seems #10 made this observation months ago. I'll try to spend some time debugging this.

@hansemro
Copy link
Author

RAM256X1S and RAM256X1D were not handled correctly since their address pins are in an array A[N:0] rather than specified individually (A0, A1, A2, ... ). I'll need to validate every transform rule anyway...

@hansfbaier
Copy link
Collaborator

@hansemro I pushed an MMCM-fix to stable-backports. The MMCM should work now, if you have time to try.

@hansemro
Copy link
Author

hansemro commented Dec 21, 2023

@hansfbaier Thanks for the heads up. I will get to it at some point. I decided it is better for me to fix dual-port LUTRAMs before moving forward, because everything is broken (for at least xc7, not sure about ultrascale).

For example, while tracing how RAM128X1D is handled, twice as many RAMD64Es are created with z/height decremented for each one. We end up with negative z?! Not sure what the intention was but this does not seem right to me.

@hansfbaier
Copy link
Collaborator

Yes definitely, that is more important. Thanks for working on this.

@hansemro
Copy link
Author

I misspoke since I was testing RAM256X1D (thought I was testing RAM256X1S) which does not fit in xc7 anyhow. Still, yosys should not allow unsupported LUTRAMs to be synthesized!

@hansemro
Copy link
Author

hansemro commented Dec 21, 2023

Resolved two issues:

  • Fixed address port detection for single port LUTRAM with c073116
  • Fixed MUXF tree z offset for RAM256X1S with 4792453
    • This fixes placer stalling when handling designs with RAM256X1S
    • Will need to revisit for Ultrascale since they have different offsets (more LUTs in a slice).

@hansfbaier
Copy link
Collaborator

@hansemro How are things going? Have you been able to test the changes?

@hansemro
Copy link
Author

hansemro commented Feb 6, 2024

@hansfbaier

Have you been able to test the changes?

MMCM is confirmed working with multiple clock outputs on KC705 though with a BUFG on all clock outputs. fasm2frames would throw segment DB errors if I didn't have them. Test branch: https://github.com/hansemro/primitive-tests/commits/mmcm-blinky-kc705-db-error/

Interestingly, the first clock output didn't require me to place BUFG buffer, though it should probably have one.

How are things going?

Things were going well until I had to handle each LUTRAM as an edge case. Initially, I was less bothered to write things down, but now I feel it is appropriate to actually spend time documenting the port/parameter transformations for all cells (including ultrascale-only cells). I intend to resume work on validation and get xc7 cells covered.

Anyhow, it turns out I made some incorrect assumptions about some things. Here are some TODOs:

  1. create_dram32_lut should be able to map LUT5 or LUT6 BEL site, but currently doesn't
  2. check whether address ports are connected to both A{6:1} and WA{8:1} (WA{9:1} for ultrascale) ports in LUT6/LUT5 BEL for single-port LUTRAM

Will elaborate further with a follow-up post on LUTRAM transformations to RAMS/RAMD primitive cells to LUT5/LUT6 BELs in more detail.

@hansfbaier
Copy link
Collaborator

hansfbaier commented Feb 6, 2024

Yes, I had similar observations.
CLKOUT 1-3 had missing pips. CLKOUT 0,5,6 worked fine out of the box:
https://github.com/openXC7/primitive-tests/blob/main/mmcm-blinky-kintex/blinky.v
But only on Kintex. On other series all CLKOUT ports were fine.

@hansfbaier
Copy link
Collaborator

Thanks for the update!

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

Naming Notation:

I'll define the following name notation to fold port/parameter names:

  • A comma-separated list or range enclosed by curly brackets {} maps to individual names with an entry from the list/range.
    • Curly brackets may be inserted anywhere in the name.
    • List example: ADR{3, 2, 1, 0} maps to ADR3, ADR2, ADR1, ADR0
    • Range example: {D:A}_O maps to D_O, C_O, B_O, A_O
  • Verilog-style array with a range enclosed by square brackets [].
    • The range indicates the bit width of the signal/parameter. If there are no square brackets, assume 1-bit width.
    • Square brackets can only be placed at the end of a name.
    • Example: ADR[3:0] maps to ADR[3], ADR[2], ADR[1], ADR[0]
  • Combined example:
    • ADDR{D:A}[4:0] maps to ADDRD[4:0], ADDRC[4:0], ADDRB[4:0], ADDRA[4:0]

LUTRAM Cell Table:

Additional notes:

  • LUT_OR_MEM5/LUT_OR_MEM6 BEL cannot be instantiated manually. Port and parameter details of these BELs are purely behavioral.
  • LUT_OR_MEM6 BEL has SIN and MC31 pins which are used for shift register functionality and can be ignored for LUTRAM.
Cell Cell Type US-only *INIT* Parameter *CLK_INVERTED Parameter Clock Input Write Enable Input Write Data Input Write Select Input Write Address Input Read Address Input Read Data Output
LUT_OR_MEM5 BEL false INIT[31:0] N/A CLK WE DI1 N/A WA{5:1} A{5:1} O5
LUT_OR_MEM6 BEL false INIT[63:0] N/A CLK WE DI2 N/A US ? WA{9:1} : WA{8:1} A{6:1} O6
RAMS32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A ADR{4:0} ADR{4:0} O
RAMD32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{4:0} O
RAMD32M64 LUTRAM Primitive true INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{5:0} O
RAM32X1S LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0} O
RAM32X1D LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0}; DPRA{4:0} SPO; DPO
RAM32X16DR8 Asymmetric LUTRAM true N/A? IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDRH[4:0];ADDR{G:A}[5:0] ADDRH[4:0];ADDR{G:A}[5:0] DOH[1:0]; DO{G:A}
RAM32M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A}[1:0] N/A ADDR{D:A}[4:0] ADDR{D:A}[4:0] DO{D:A}[1:0]
RAM32M16 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDR{H:A}[4:0] ADDR{H:A}[4:0] DO{H:A}[1:0]
RAMS64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{7:6}), ADR{5:0} ADR{5:0} O
RAMS64E1 LUTRAM Primitive true? INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{8:6}), ADR{5:0} ADR{5:0} O
RAMD64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{7:0} RADR{5:0} O
RAM64X1S LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0} O
RAM64X1D LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0}; DPRA{5:0} SPO; DPO
RAM64X8SW SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE D WSEL[2;0] A[5;0] A[5;0] O[7:0]
RAM64M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A} N/A ADDR{D:A}[5:0] ADDR{D:A}[5:0] DO{D:A}
RAM64M8 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A} N/A ADDR{H:A}[5:0] ADDR{H:A}[5:0] DO{H:A}
RAM128X1S LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A{6:0} A{6:0} O
RAM128X1D LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A[6:0] A[6:0]; DPRA[6:0] SPO; DPO
RAM256X1S LUTRAM false INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0] O
RAM256X1D LUTRAM true INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0]; DPRA[7:0] SPO; DPO
RAM512X1S LUTRAM true INIT[511:0] IS_WCLK_INVERTED WCLK WE D N/A A[8:0] A[8:0] O

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

XC7 LUTRAM to LUTRAM Primitive Transformations:

LUTRAMs are broken down to primitive cell(s) that will eventually map to SLICEM LUT_OR_MEM6/LUT_OR_MEM5 BEL site(s) once placed.

Convention:

  • Title: Source Cell to Destination Cell
  • Transformations: Source Name to Destination Name (if owner is not specified)

RAM32X1S -> 1x RAMS32

Cell Rules:

  • RAMS32 (6LUT) with /SP appended to name
  • No MUXF cells
  • 1 output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • A{4:0} to /SP's ADR{4:0}
  • D to I
  • O to O
  • WCLK to CLK
  • WE to WE

RAM32X1D -> 2x RAMD32

Cell Rules:

  • RAMD32 (D6LUT/B6LUT) with /SP appended to name
  • RAMD32 (C6LUT/A6LUT) with /DP appended to name
  • No MUXF cells
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • A{4:0} to /SP's RADR{4:0}
  • A{4:0} to /SP's WADR{4:0}
  • DPRA{4:0} to /DP's RADR{4:0}
  • A{4:0} to /DP's WADR{4:0}
  • D to all I
  • SPO to /SP's O
  • DPO to /DP's O
  • WCLK to all CLK
  • WE to all WE

RAM32M -> 2x RAMS32 + 6x RAMD32

Cell Rules:

  • RAMS32 (D5LUT) with /RAMD appended to name
  • RAMS32 (D6LUT) with /RAMD_D1 appended to name
  • RAMD32 (C5LUT) with /RAMC appended to name
  • RAMD32 (C6LUT) with /RAMC_D1 appended to name
  • RAMD32 (B5LUT) with /RAMB appended to name
  • RAMD32 (B6LUT) with /RAMB_D1 appended to name
  • RAMD32 (A5LUT) with /RAMA appended to name
  • RAMD32 (A6LUT) with /RAMA_D1 appended to name
  • No MUXF cells
  • 8 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • DIA[0] to /RAMA's I
  • DIA[1] to /RAMA_D1's I
  • DIB[0] to /RAMB's I
  • DIB[1] to /RAMB_D1's I
  • DIC[0] to /RAMC's I
  • DIC[1] to /RAMC_D1's I
  • DID[0] to /RAMD's I
  • DID[1] to /RAMD_D1's I
  • DOA[0] to /RAMA's O
  • DOA[1] to /RAMA_D1's O
  • DOB[0] to /RAMB's O
  • DOB[1] to /RAMB_D1's O
  • DOC[0] to /RAMC's O
  • DOC[1] to /RAMC_D1's O
  • DOD[0] to /RAMD's O
  • DOD[1] to /RAMD_D1's O
  • ADDRA[4:0] to /RAMA and /RAMA_D1's RADR{4:0}
  • ADDRB[4:0] to /RAMB and /RAMB_D1's RADR{4:0}
  • ADDRC[4:0] to /RAMC and /RAMC_D1's RADR{4:0}
  • ADDRD[4:0] to all WADR{4:0}
  • ADDRD[4:0] to /RAMD and /RAMD_D1's `ADR{4:0}

RAM64X1S -> RAMS64E

Cell Rules:

  • RAMS64E (6LUT)
  • No MUXF cells
  • 1 output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to CLK
  • WE to WE
  • D to I
  • O to O
  • A{5:0} to ADR{5:0}
  • RAMS64E's WADR6 and WADR7 ports not connected

RAM64X1D -> 2x RAMD64E

Cell Rules:

  • RAMD64E (D6LUT/B6LUT) with /SP appended to name
  • RAMD64E (C6LUT/A6LUT) with /DP appended to name
  • No MUXF cells
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • A{5:0} to all RADR{5:0}
  • A{5:0} to all WADR{5:0}
  • DPRA{5:0} to /DP's RADR{5:0}
  • D to all I
  • SPO to /SP's O
  • DPO to /DP's O

RAM64M -> 4x RAMD64E

Cell Rules:

  • RAMD64E (A6LUT) with /RAMA appended to name
  • RAMD64E (B6LUT) with /RAMB appended to name
  • RAMD64E (C6LUT) with /RAMC appended to name
  • RAMD64E (D6LUT) with /RAMD appended to name
  • No MUXF cells
  • 4 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • DI{A:D} to /RAM{A:D}'s I
  • DO{A:D} to /RAM{A:D}'s O
  • DO{A:D} to /RAM{A:D}'s O
  • ADDR{A:D}[5:0] to /RAM{A:D}'s RADR{5:0}
  • ADDRD[5:0] to all WADR{5:0}
  • all WADR6 and WADR7 RAMD64E ports are not connected

RAM128X1S -> 2x RAMS64E

Cell Rules:

  • RAMS64E (D6LUT/B6LUT) with /LOW appended to name
  • RAMS64E (C6LUT/A6LUT) with /HIGH appended to name
  • MUXF7 cell with /F7 for single output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • D to all I
  • A{5:0} to all ADR{5:0}
  • A6 to all WADR6
  • A6 to /F7's S
  • all WADR7 ports are not connected
  • /LOW's O to /F7's I0
  • /HIGH's O to /F7's I1
  • O to /F7's O

RAM128X1D -> 4x RAMD64E

Cell Rules:

  • RAMD64E (D6LUT) with /SP.LOW appended to name
  • RAMD64E (C6LUT) with /SP.HIGH appended to name
  • RAMD64E (B6LUT) with /DP.LOW appended to name
  • RAMD64E (A6LUT) with /DP.HIGH appended to name
  • MUXF7 with /F7.SP appended to name for single port output
  • MUXF7 with /F7.DP appended to name for dual port output
  • 2 outputs

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • D to all I
  • A[6:0] to all WADR{6:0}
  • A[6:0] to /SP.LOW's RADR{6:0}
  • A[6:0] to /SP.HIGH's RADR{6:0}
  • A[6] to /F7.SP's S
  • DPRA[6:0] to /DP.LOW's RADR{6:0}
  • DPRA[6:0] to /DP.HIGH's RADR{6:0}
  • DPRA[6] to /F7.DP's S
  • /SP.LOW's O to /F7.SP's I0
  • /SP.HIGH's O to /F7.SP's I1
  • /DP.LOW's O to /F7.DP's I0
  • /DP.HIGH's O to /F7.DP's I1
  • SPO to /F7.SP's O
  • DPO to /F7.DP's O

RAM256X1S -> 4x RAMS64E

Cell Rules:

  • RAMS64E (D6LUT) with /RAMS64E_D appended to name
  • RAMS64E (C6LUT) with /RAMS64E_C appended to name
  • RAMS64E (B6LUT) with /RAMS64E_B appended to name
  • RAMS64E (A6LUT) with /RAMS64E_A appended to name
  • MUXF7 with /F7.A appended to name
  • MUXF7 with /F7.B appended to name
  • MUXF8 with /F8 appended to name for single output

Parameter Rules:

  • IS_WCLK_INVERTED to IS_CLK_INVERTED

Port Rules:

  • WCLK to all CLK
  • WE to all WE
  • D to all I
  • A[5:0] to all ADR{5:0}
  • A[7:6] to all WADR{7:6}
  • A[6] to /F7.A's S
  • A[6] to /F7.B's S
  • A[7] to /F8's S
  • /RAMS64E_D's O to /F7.B's I0
  • /RAMS64E_C's O to /F7.B's I1
  • /RAMS64E_B's O to /F7.A's I0
  • /RAMS64E_A's O to /F7.A's I1
  • /F7.B's O to /F8's I0
  • /F7.A's O to /F8's I1
  • O to /F8's O

@hansemro
Copy link
Author

hansemro commented Feb 8, 2024

LUTRAM Primitive to BEL Transformations:

Additional notes:

  • Unclear to me how CLKINV bit is set for SLICE_LUTX in nextpnr/fasm.
    • Also unclear to me how CLKINV status for all BELs in the SLICE site are checked to match.
  • Some of the transformation details have yet been implemented or verified in my development branch.

RAMD64E -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{5:0} to A{6:1}
  • WADR{7:0} to WA{8:1}
  • I to DI1
  • O to O6
  • WE to WE
  • CLK to CLK

RAMS64E -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{5:0} to A{6:1}
  • ADR{5:0} to WA{6:1}
  • WADR{7:6} to WA{8:7}
  • I to DI1
  • O to O6
  • WE to WE
  • CLK to CLK

RAMD32 -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{4:0} to A{5:1}
  • WADR{4:0} to WA{5:1}
  • I to DI1 or DI2 if DI1 already used by LUT_OR_MEM5
  • O to O6
  • WE to WE
  • CLK to CLK

RAMD32 -> LUT_OR_MEM5 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • RADR{4:0} to A{5:1}
  • WADR{4:0} to WA{5:1}
  • I to DI1
  • O to O5
  • WE to WE
  • CLK to CLK

RAMS32 -> LUT_OR_MEM6 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{4:0} to A{5:1}
  • ADR{4:0} to WA{5:1}
  • I to DI1 or DI2 if DI1 already used by LUT_OR_MEM5
  • O to O6
  • WE to WE
  • CLK to CLK

RAMS32 -> LUT_OR_MEM5 BEL

  • nextpnr-specific rules:
    • set type to id_SLICE_LUTX
    • transform parameter IS_CLK_INVERTED to IS_WCLK_INVERTED
    • set attribute X_LUT_AS_DRAM to 1
  • ADR{4:0} to A{5:1}
  • ADR{4:0} to WA{5:1}
  • I to DI1
  • O to O5
  • WE to WE
  • CLK to CLK

@hansfbaier
Copy link
Collaborator

Thanks for the effort! I am looking forward to what you will come up with!

@hansemro
Copy link
Author

hansemro commented Feb 12, 2024

Issue: nextpnr checks INIT{A:D} instead of INIT_{A:D} parameters for RAM32M/RAM64M

While working on an INIT parameter test, I noticed that the INIT parameters for RAM32M/RAM64M were not being set with correct values in the FASM result.

if (ci->params.count(ctx->id(stringf("INIT%c", 'A' + i))))
dram->params[ctx->id("INIT")] = ci->params[ctx->id(stringf("INIT%c", 'A' + i))];
} else {
for (int j = 0; j < dbits; j++) {
NetInfo *di = get_net_or_empty(ci, ctx->id(stringf("DI%c[%d]", 'A' + i, j)));
NetInfo *dout = get_net_or_empty(ci, ctx->id(stringf("DO%c[%d]", 'A' + i, j)));
disconnect_port(ctx, ci, ctx->id(stringf("DI%c[%d]", 'A' + i, j)));
disconnect_port(ctx, ci, ctx->id(stringf("DO%c[%d]", 'A' + i, j)));
CellInfo *dram = create_dram32_lut(stringf("%s/DPR%d_%d", ctx->nameOf(ci), i, j), base, dcs,
address, di, dout, (j == 0), zoffset + i);
if (base == nullptr)
base = dram;
if (ci->params.count(ctx->id(stringf("INIT%c", 'A' + i)))) {
auto orig_init =
ci->params.at(ctx->id(stringf("INIT%c", 'A' + i))).extract(0, 64).as_bits();

Merely adding underscores does not immediately solve the issue, so I will need to look more into this later.

WIP INIT Parameter Test: https://github.com/hansemro/primitive-tests/tree/xc7-lutram-tests/lutram-tests/init-test

@hansemro
Copy link
Author

hansemro commented Feb 13, 2024

Issue: nextpnr checks INIT{A:D} instead of INIT_{A:D} parameters for RAM32M/RAM64M

This should be fixed in this branch: https://github.com/hansemro/nextpnr-xilinx/tree/fix-ram32m-ram64m-init

However, I am noticing some discrepancies compared to Vivado that I will need to verify. Notice how the upper and lower 32-bits of the ?LUT.INIT pattern are swapped.

RAM32M NextPNR fasm result:

CLBLM_L_X84Y126.SLICEM_X0.ALUT.INIT[63:0] = 64'b0000000000000000111011100100010000000000000000000101000001010000
CLBLM_L_X84Y126.SLICEM_X0.ALUT.DI1MUX.AI
CLBLM_L_X84Y126.SLICEM_X0.ALUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.ALUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.AOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.BLUT.INIT[63:0] = 64'b0000000000000000111011100100010000000000000000001111101011111010
CLBLM_L_X84Y126.SLICEM_X0.BLUT.DI1MUX.BI
CLBLM_L_X84Y126.SLICEM_X0.BLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.BLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.BOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.CLUT.INIT[63:0] = 64'b0000000000000000000100011011101100000000000000001010111110101111
CLBLM_L_X84Y126.SLICEM_X0.CLUT.DI1MUX.CI
CLBLM_L_X84Y126.SLICEM_X0.CLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.CLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.COUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.DLUT.INIT[63:0] = 64'b0000000000000000000100011011101100000000000000000000010100000101
CLBLM_L_X84Y126.SLICEM_X0.DLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.DLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.DOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.WEMUX.CE
CLBLM_L_X84Y126.SLICEM_X0.NOCLKINV
CLBLM_L_X84Y126.SLICEL_X1.NOCLKINV

RAM32M Vivado bit2fasm result:

CLBLM_L_X48Y95.SLICEL_X1.NOCLKINV
CLBLM_L_X48Y95.SLICEL_X1.PRECYINIT.C0
CLBLM_L_X48Y95.SLICEM_X0.ALUT.DI1MUX.BDI1_BMC31
CLBLM_L_X48Y95.SLICEM_X0.ALUT.INIT[46:0] = 47'b10100000101000000000000000000001110111001000100
CLBLM_L_X48Y95.SLICEM_X0.ALUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.ALUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.AOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.BLUT.DI1MUX.DI_CMC31
CLBLM_L_X48Y95.SLICEM_X0.BLUT.INIT[47:0] = 48'b111110101111101000000000000000001110111001000100
CLBLM_L_X48Y95.SLICEM_X0.BLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.BLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.BOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.CLUT.DI1MUX.DI_DMC31
CLBLM_L_X48Y95.SLICEM_X0.CLUT.INIT[47:0] = 48'b101011111010111100000000000000000001000110111011
CLBLM_L_X48Y95.SLICEM_X0.CLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.CLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.COUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.DLUT.INIT[42:0] = 43'b1010000010100000000000000000001000110111011
CLBLM_L_X48Y95.SLICEM_X0.DLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.DLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.DOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.NOCLKINV
CLBLM_L_X48Y95.SLICEM_X0.PRECYINIT.C0
CLBLM_L_X48Y95.SLICEM_X0.WEMUX.CE

@hansemro
Copy link
Author

hansemro commented Feb 13, 2024

Issue: nextpnr does nothing with IS_*CLK_INVERTED property for LUTRAM cells

While working on CLKINV property test, I noticed that nextpnr does not set the CLKINV bit when IS_*CLK_INVERTED property for a LUTRAM is set. Instead, nextpnr fasm writer ignores the property and sets NOCLKINV for the SLICEM site.

Also note that, on XC7, LUTRAMs and FFs share the same CLKINV routing BEL. However, on Ultrascale(+), LUTRAMs have their own dedicated clock inverter provided by the LCLKINV routing BEL.

WIP CLKINV property test: https://github.com/hansemro/primitive-tests/tree/xc7-lutram-tests/lutram-tests/clkinv-test

@hansfbaier
Copy link
Collaborator

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants