Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marocchino: Add initial support for new OpenRISC core #1161

Merged
merged 1 commit into from
Jan 17, 2022

Conversation

stffrdhrn
Copy link
Contributor

The Marocchino is a superscaler OpenRISC implementation which has
advanced features including 64-bit double FPU support.

Much of the python module here is copied form mor1kx like the *.S
and *.h files.

The Marocchino is a superscaler OpenRISC implementation which has
advanced features including 64-bit double FPU support.

Much of the python module here is copied form mor1kx like the *.S
and *.h files.
@stffrdhrn
Copy link
Contributor Author

stffrdhrn commented Jan 14, 2022

This core was created by @bandvig. I am testing booting linux on it but it is failing. The Litex bios works fine. but liftoff to linux fails. Maybe we should not merge until this is sorted out?

The same works fine with mor1kx.

--=============== SoC ==================--
CPU:            Marocchino @ 100MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          262144KiB 16-bit @ 800MT/s (CL-7 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x00000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |11111111111000000000000000000000| delays: 05+-05
  m0, b03: |00000000000011111111111111100000| delays: 19+-07
  m0, b04: |00000000000000000000000000000111| delays: 30+-01
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b03 delays: 19+-07
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |11111111111000000000000000000000| delays: 05+-05
  m1, b03: |00000000000001111111111111100000| delays: 20+-07
  m1, b04: |00000000000000000000000000000111| delays: 30+-01
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b03 delays: 20+-07
Switching SDRAM to hardware control.
Memtest at 0 (2.0MiB)...
  Write: 0x0-0x200000 2.0MiB
   Read: 0x0-0x200000 2.0MiB
Memtest OK
Memspeed at 0 (Sequential, 2.0MiB)...
  Write speed: 25.2MiB/s
   Read speed: 23.4MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
Booting from SDCard in SD-Mode...
Booting from boot.json...
boot.bin file not found.
Booting from boot.bin...
boot.bin file not found.
SDCard boot failed.
Booting from network...
Local IP: 192.168.1.50
Remote IP: 192.168.1.100
Booting from boot.json...
Copying boot.bin to 0... (7702016 bytes)
Copying litex-mor1kx.dtb to 0x7f0000...
Booting from boot.bin...
Copying boot.bin to 0... (7702016 bytes)
Executing booted program at 0x00000000

--============= Liftoff! ===============--

@stffrdhrn
Copy link
Contributor Author

For reference this is what the mor1kx boot looks like. Note a few things:

  1. The memory read/write speed is a bit faster on mor1kx (>35Mb/s vs 25Mb/s)
  2. The boot loader in marocchino is loading the boot.bin 2 times

I will try to run linux in the simulator to see if I can see the linux boot sequence getting stuck somewhere.


--=============== SoC ==================--
CPU:            MOR1KX @ 100MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          262144KiB 16-bit @ 800MT/s (CL-7 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x00000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |11111111111100000000000000000000| delays: 06+-06
  m0, b03: |00000000000001111111111111110000| delays: 20+-07
  m0, b04: |00000000000000000000000000000111| delays: 30+-01
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b03 delays: 20+-07
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |11111111111000000000000000000000| delays: 05+-05
  m1, b03: |00000000000001111111111111100000| delays: 20+-07
  m1, b04: |00000000000000000000000000000011| delays: 31+-01
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b03 delays: 20+-07
Switching SDRAM to hardware control.
Memtest at 0 (2.0MiB)... 
  Write: 0x0-0x200000 2.0MiB
   Read: 0x0-0x200000 2.0MiB
Memtest OK
Memspeed at 0 (Sequential, 2.0MiB)...
  Write speed: 37.7MiB/s 
   Read speed: 32.1MiB/s 



--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
Booting from SDCard in SD-Mode...
Booting from boot.json...
boot.bin file not found.
Booting from boot.bin...
boot.bin file not found.
SDCard boot failed.
Booting from network...
Local IP: 192.168.1.50
Remote IP: 192.168.1.100
Booting from boot.json...
Copying boot.bin to 0... (7702016 bytes)
Copying litex-mor1kx.dtb to 0x7f0000... (1881 bytes)
Copying dummy-rootfs.cpio.gz to 0x800000... (6 bytes)
Executing booted program at 0x00000000

--============= Liftoff! ===============--
[    0.000000] FDT at (ptrval)
...

@stffrdhrn
Copy link
Contributor Author

For reference

mor1kx

+------+-------------------------------------------------------------------+-------------------------------------------+------+
|      |Instance                                                           |Module                                     |Cells |
+------+-------------------------------------------------------------------+-------------------------------------------+------+
|1     |top                                                                |                                           | 18471|
|2     |  mor1kx                                                           |mor1kx                                     |  8300|
|3     |    mor1kx_cpu                                                     |mor1kx_cpu                                 |  8300|
|4     |      \cappuccino.mor1kx_cpu                                       |mor1kx_cpu_cappuccino                      |  8299|
|5     |        mor1kx_branch_prediction                                   |mor1kx_branch_prediction                   |     1|
|6     |          \branch_predictor_simple.mor1kx_branch_predictor_simple  |mor1kx_branch_predictor_simple             |     1|
|7     |        mor1kx_ctrl_cappuccino                                     |mor1kx_ctrl_cappuccino                     |   536|
|8     |          \pic.mor1kx_pic                                          |mor1kx_pic                                 |   109|
|9     |          \tt.mor1kx_ticktimer                                     |mor1kx_ticktimer                           |   121|
|10    |        mor1kx_decode_execute_cappuccino                           |mor1kx_decode_execute_cappuccino           |   705|
|11    |        mor1kx_execute_alu                                         |mor1kx_execute_alu                         |  2533|
|12    |          \fpu_alu_ena.u_pfpu32                                    |pfpu32_top                                 |  2204|
|13    |            u_f2i_cnv                                              |pfpu32_f2i                                 |   180|
|14    |            u_f32_addsub                                           |pfpu32_addsub                              |   432|
|15    |            u_f32_muldiv                                           |pfpu32_muldiv                              |  1155|
|16    |            u_f32_rnd                                              |pfpu32_rnd                                 |   308|
|17    |            u_i2f_cnv                                              |pfpu32_i2f                                 |   128|
|18    |        mor1kx_execute_ctrl_cappuccino                             |mor1kx_execute_ctrl_cappuccino             |  1010|
|19    |        mor1kx_fetch_cappuccino                                    |mor1kx_fetch_cappuccino                    |  1205|
|20    |          \icache_gen.mor1kx_icache                                |mor1kx_icache                              |   601|
|21    |            tag_ram                                                |mor1kx_simple_dpram_sclk__parameterized0   |    59|
|22    |            \way_memories[0].way_data_ram                          |mor1kx_simple_dpram_sclk                   |   473|
|23    |          \immu_gen.mor1kx_immu                                    |mor1kx_immu                                |   202|
|24    |            \itlb[0].itlb_match_regs                               |mor1kx_true_dpram_sclk_3                   |   138|
|25    |            \itlb[0].itlb_translate_regs                           |mor1kx_true_dpram_sclk_4                   |    21|
|26    |        mor1kx_lsu_cappuccino                                      |mor1kx_lsu_cappuccino                      |  1291|
|27    |          \dcache_gen.mor1kx_dcache                                |mor1kx_dcache                              |   311|
|28    |            tag_ram                                                |mor1kx_simple_dpram_sclk__parameterized3   |    97|
|29    |            \way_memories[0].way_data_ram                          |mor1kx_simple_dpram_sclk__parameterized2   |   135|
|30    |          \dmmu_gen.mor1kx_dmmu                                    |mor1kx_dmmu                                |   185|
|31    |            \dtlb[0].dtlb_match_regs                               |mor1kx_true_dpram_sclk                     |    60|
|32    |            \dtlb[0].dtlb_translate_regs                           |mor1kx_true_dpram_sclk_2                   |   120|
|33    |          \store_buffer_gen.mor1kx_store_buffer                    |mor1kx_store_buffer                        |   204|
|34    |            fifo_ram                                               |mor1kx_simple_dpram_sclk__parameterized1   |   159|
|35    |        mor1kx_rf_cappuccino                                       |mor1kx_rf_cappuccino                       |   888|
|36    |          rfa                                                      |mor1kx_simple_dpram_sclk__parameterized4   |    33|
|37    |          rfb                                                      |mor1kx_simple_dpram_sclk__parameterized4_0 |    33|
|38    |          \rfspr_gen.rfspr                                         |mor1kx_simple_dpram_sclk__parameterized4_1 |     1|
|39    |        mor1kx_wb_mux_cappuccino                                   |mor1kx_wb_mux_cappuccino                   |   130|
+------+-------------------------------------------------------------------+-------------------------------------------+------+

Marocchino

+------+---------------------------------------+---------------------------------------------+------+
|      |Instance                               |Module                                       |Cells |
+------+---------------------------------------+---------------------------------------------+------+
|1     |top                                    |                                             | 29218|
|2     |  or1k_marocchino_top                  |or1k_marocchino_top                          | 18948|
|3     |    dbus_bridge                        |or1k_marocchino_bus_if_wb32__parameterized0  |   684|
|4     |    ibus_bridge                        |or1k_marocchino_bus_if_wb32                  |   639|
|5     |    u_cpu                              |or1k_marocchino_cpu                          | 17625|
|6     |      u_1clk_rsrvs                     |or1k_marocchino_rsrvs_1clk                   |   824|
|7     |      u_ctrl                           |or1k_marocchino_ctrl                         |   843|
|8     |      u_decode                         |or1k_marocchino_decode                       |   558|
|9     |      u_divider                        |or1k_marocchino_int_div                      |   366|
|10    |      u_exec_1clk                      |or1k_marocchino_int_1clk                     |   123|
|11    |      u_fetch                          |or1k_marocchino_fetch                        |  1715|
|12    |        u_bc_cnt_ram                   |or1k_dpram_en_w1st__parameterized2           |    44|
|13    |        u_icache                       |or1k_marocchino_icache                       |   647|
|14    |          ic_tag_ram                   |or1k_dpram_en_w1st__parameterized1_6         |    29|
|15    |          \ways_ram[0].ic_way_ram      |or1k_dpram_en_w1st__parameterized0_7         |   362|
|16    |        u_immu                         |or1k_marocchino_immu                         |   474|
|17    |          \itlb[0].itlb_match_regs     |or1k_dpram_en_w1st_4                         |    64|
|18    |          \itlb[0].itlb_trans_regs     |or1k_dpram_en_w1st_5                         |    45|
|19    |      u_fpxx_rsrvs                     |or1k_marocchino_rsrvs__parameterized0        |   879|
|20    |      u_lsu                            |or1k_marocchino_lsu                          |  1575|
|21    |        u_dcache                       |or1k_marocchino_dcache                       |   281|
|22    |          dc_tag_ram                   |or1k_dpram_en_w1st__parameterized1           |    47|
|23    |          \way_memories[0].dc_way_ram  |or1k_dpram_en_w1st__parameterized0           |   160|
|24    |        u_dmmu                         |or1k_marocchino_dmmu                         |   422|
|25    |          \dtlb[0].dtlb_match_regs     |or1k_dpram_en_w1st                           |    69|
|26    |          \dtlb[0].dtlb_trans_regs     |or1k_dpram_en_w1st_3                         |    32|
|27    |        u_store_buffer                 |or1k_marocchino_oreg_buff__parameterized0    |   270|
|28    |          u_oreg_buff_ram              |or1k_dpram_en_w1st__parameterized7           |    73|
|29    |      u_lsu_rsrvs                      |or1k_marocchino_rsrvs__parameterized1        |   302|
|30    |      u_muldiv_rsrvs                   |or1k_marocchino_rsrvs                        |   378|
|31    |      u_multiplier                     |or1k_marocchino_int_mul                      |   128|
|32    |      u_oman                           |or1k_marocchino_oman                         |  1707|
|33    |        \rat_cell_k[0].u_rat_cell      |or1k_marocchino_rat_cell                     |    12|
|34    |        \rat_cell_k[10].u_rat_cell     |or1k_marocchino_rat_cell__parameterized9     |     9|
|35    |        \rat_cell_k[11].u_rat_cell     |or1k_marocchino_rat_cell__parameterized10    |    43|
|36    |        \rat_cell_k[12].u_rat_cell     |or1k_marocchino_rat_cell__parameterized11    |    13|
|37    |        \rat_cell_k[13].u_rat_cell     |or1k_marocchino_rat_cell__parameterized12    |     9|
|38    |        \rat_cell_k[14].u_rat_cell     |or1k_marocchino_rat_cell__parameterized13    |     9|
|39    |        \rat_cell_k[15].u_rat_cell     |or1k_marocchino_rat_cell__parameterized14    |    32|
|40    |        \rat_cell_k[16].u_rat_cell     |or1k_marocchino_rat_cell__parameterized15    |    14|
|41    |        \rat_cell_k[17].u_rat_cell     |or1k_marocchino_rat_cell__parameterized16    |     9|
|42    |        \rat_cell_k[18].u_rat_cell     |or1k_marocchino_rat_cell__parameterized17    |     9|
|43    |        \rat_cell_k[19].u_rat_cell     |or1k_marocchino_rat_cell__parameterized18    |    49|
|44    |        \rat_cell_k[1].u_rat_cell      |or1k_marocchino_rat_cell__parameterized0     |     9|
|45    |        \rat_cell_k[20].u_rat_cell     |or1k_marocchino_rat_cell__parameterized19    |    17|
|46    |        \rat_cell_k[21].u_rat_cell     |or1k_marocchino_rat_cell__parameterized20    |     9|
|47    |        \rat_cell_k[22].u_rat_cell     |or1k_marocchino_rat_cell__parameterized21    |     9|
|48    |        \rat_cell_k[23].u_rat_cell     |or1k_marocchino_rat_cell__parameterized22    |    29|
|49    |        \rat_cell_k[24].u_rat_cell     |or1k_marocchino_rat_cell__parameterized23    |    11|
|50    |        \rat_cell_k[25].u_rat_cell     |or1k_marocchino_rat_cell__parameterized24    |     9|
|51    |        \rat_cell_k[26].u_rat_cell     |or1k_marocchino_rat_cell__parameterized25    |     9|
|52    |        \rat_cell_k[27].u_rat_cell     |or1k_marocchino_rat_cell__parameterized26    |    62|
|53    |        \rat_cell_k[28].u_rat_cell     |or1k_marocchino_rat_cell__parameterized27    |    13|
|54    |        \rat_cell_k[29].u_rat_cell     |or1k_marocchino_rat_cell__parameterized28    |     9|
|55    |        \rat_cell_k[2].u_rat_cell      |or1k_marocchino_rat_cell__parameterized1     |     9|
|56    |        \rat_cell_k[30].u_rat_cell     |or1k_marocchino_rat_cell__parameterized29    |     9|
|57    |        \rat_cell_k[31].u_rat_cell     |or1k_marocchino_rat_cell__parameterized30    |    29|
|58    |        \rat_cell_k[3].u_rat_cell      |or1k_marocchino_rat_cell__parameterized2     |    45|
|59    |        \rat_cell_k[4].u_rat_cell      |or1k_marocchino_rat_cell__parameterized3     |    13|
|60    |        \rat_cell_k[5].u_rat_cell      |or1k_marocchino_rat_cell__parameterized4     |     9|
|61    |        \rat_cell_k[6].u_rat_cell      |or1k_marocchino_rat_cell__parameterized5     |     9|
|62    |        \rat_cell_k[7].u_rat_cell      |or1k_marocchino_rat_cell__parameterized6     |    32|
|63    |        \rat_cell_k[8].u_rat_cell      |or1k_marocchino_rat_cell__parameterized7     |    11|
|64    |        \rat_cell_k[9].u_rat_cell      |or1k_marocchino_rat_cell__parameterized8     |     9|
|65    |        u_jb_attr_ocb                  |or1k_marocchino_ff_oreg_buff__parameterized0 |    92|
|66    |          u_ff_oreg_buff_ram           |or1k_dpram_en_w1st__parameterized5           |    41|
|67    |        u_ocb                          |or1k_marocchino_ff_oreg_buff                 |   464|
|68    |          u_ff_oreg_buff_ram           |or1k_dpram_en_w1st__parameterized4           |    77|
|69    |      u_pfpu3264                       |pfpu_marocchino_top                          |  7683|
|70    |        u_fpxx_cmp                     |pfpu_marocchino_cmp                          |    62|
|71    |        u_pfpu_addsub                  |pfpu_marocchino_addsub                       |  1488|
|72    |        u_pfpu_f2i                     |pfpu_marocchino_f2i                          |   392|
|73    |        u_pfpu_i2f                     |pfpu_marocchino_i2f                          |   568|
|74    |        u_pfpu_muldiv                  |pfpu_marocchino_muldiv                       |  3638|
|75    |          u_fp64_div                   |pfpu_marocchino_div                          |  1075|
|76    |            u_r4div_fract              |r4div_fract58                                |   878|
|77    |          u_fp64_mul                   |pfpu_marocchino_mul                          |  1599|
|78    |        u_pfpu_ocb                     |or1k_marocchino_oreg_buff                    |    52|
|79    |          u_oreg_buff_ram              |or1k_dpram_en_w1st__parameterized6           |    17|
|80    |        u_pfpu_rnd                     |pfpu_marocchino_rnd                          |  1479|
|81    |      u_pic                            |or1k_marocchino_pic                          |   195|
|82    |      u_rf                             |or1k_marocchino_rf                           |    58|
|83    |        \shadow_enabled.rfShadow       |or1k_spram_en_w1st                           |     2|
|84    |        u_ram_a1                       |or1k_dpram_en_w1st__parameterized3           |     2|
|85    |        u_ram_a2                       |or1k_dpram_en_w1st__parameterized3_0         |     2|
|86    |        u_ram_b1                       |or1k_dpram_en_w1st__parameterized3_1         |    34|
|87    |        u_ram_b2                       |or1k_dpram_en_w1st__parameterized3_2         |     2|
|88    |      u_ticktimer                      |or1k_marocchino_ticktimer                    |   279|
+------+---------------------------------------+---------------------------------------------+------+

@Dolu1990
Copy link
Collaborator

@stffrdhrn On which FPGA :D ?

@stffrdhrn
Copy link
Contributor Author

@stffrdhrn On which FPGA :D ?

@Dolu1990 this is on the digilent arty. I think just plain a7. I can't recall the model.

@Dolu1990
Copy link
Collaborator

A few more question XD

  • How many decode in paralelle ?
  • How many issue ?
  • Any number about benchmarks ?

Thanks ^^

@stffrdhrn
Copy link
Contributor Author

stffrdhrn commented Jan 15, 2022

I have written a blog article about this. http://stffrdhrn.github.io/hardware/embedded/openrisc/2019/10/21/or1k_marocchino_tomasulo.html
Maybe its not fully super scalar.

  • Only one instruction is decoded at a time
  • 5 or 10 instructions can execute in parallel
  • Only one instruction is retired at a time

I hope to get some better benchmarks when running on litex.

@Dolu1990
Copy link
Collaborator

Thanks :D

@bandvig
Copy link

bandvig commented Jan 16, 2022

@stffrdhrn @Dolu1990
Let me made a comment.
MAROCCHINO is designed to run on different clock (port cpu_clk) than Wishbone bus clock (port wb_clk).
It assumed that the CPU clock is greater than or equals the Wishbone clock, but they must be aligned. Thanks to the align requirement a simplified version of clock domains crossing logic is implemented (let's say Quasi CDC, or Q-CDC).
At the same time the Q-CDC could be the reason of slower RAM access speed relative more1kx (which runs on the same clock as Wishbone bus).
On my Atlys board (45-nm Xilinx Spartan-6 FPGA) I implement MAROCCHINO pipeline at 100MHz (and Wishbone's clock is 50MHz), but with reduced number of DCACHE and ICACHE ways from default 4 to 2.
As Arty is equipped with a more modern FPGA, the faster cpu_clk rates could be achieved I think.

Attention!
MAROCCHINO's buit-in timer operates at Wishbone clock. I did this exactly for backward compatibility with mor1kx software.

@enjoy-digital
Copy link
Owner

Thanks @stffrdhrn, @bandvig, very interesting CPU. I think the initial LiteX support is good enough to be able to merge it, we could just update it as you make progress. I'm going to merge it.

@enjoy-digital enjoy-digital merged commit 1c82b20 into enjoy-digital:master Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants