Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[top] top_earlgrey_* organization #295

Closed
tjaychen opened this issue Sep 30, 2019 · 29 comments
Closed

[top] top_earlgrey_* organization #295

tjaychen opened this issue Sep 30, 2019 · 29 comments
Assignees
Labels

Comments

@tjaychen
Copy link

Currently, there are 3 overall top_earlgrey_* organizations
top_earlgrey_asic
top_earlgrey_verilator
top_earlgrey_nexysvideo

Each basically follows the structure of ...
top_earlgrey_*
-> top specific pin/pad logic
-> top specific reset / clock logic
-> top_earlgrey

Instead of maintaining 3 separate top wrappers, we'd like to suggest having only one functional top in the following organization
top_earlgrey
-> top_earlgrey_core (today's top_earlgrey)
-> pinmux / padring
-> other specific top level logic

This will allow us to stop maintaining 3 different versions, and also concentrate testing.

To accomplish this, there are a few things that need to be done

  • dpi modules for top verilator need to be moved to a 'tb' level. So in essence, there will be top specific test benches where required, and effectively another wrapper
  • there needs to be another prim module that wraps the clocking logic to differentiate between fpga / dv and verilator
  • there needs to be a full fledged pinmux / pad module that can support the needs of all 3 - this is already under discussion and development

Please let us know what you think.
One alternative approach is to no separate top_earlgrey and top_earlgrey_core, but instead make the top level template flexible enough to know when to generate what. That's something we should think about as well

@asb
@imphil
@vogelpi
@eunchan
@msfschaffner
@sriyerg
@aytong
@martin-lueker
@sjgitty

@tjaychen
Copy link
Author

@gorusso
@rpenugo

@martin-lueker
Copy link
Contributor

This seems very reasonable. We have certainly made functional core designs, and then wrapped them with process/platform dependent padring files which hold all the pads, and process dependent guts. In fact this is often extended such that any process dependent cells get pulled to the top level (SRAMs, or mixed signal IPs). For example, in ASIC's SRAMs are not just process dependent, but often depend on the integrator, whereas in some FPGA's there may be different IP wrappers for a particular SRAM configuration that are dependent on the revision of the design tool.
So for these reasons I agree, we should probably move to this type of file hierarchy, and furthermore, we should think about whether we want to propagate SRAMs and mixed-signal blocks to the pad-ring level as well.
-Martin

@aytong aytong removed their assignment Oct 2, 2019
@aytong
Copy link

aytong commented Oct 2, 2019

The approach SGTM. My only comment is to ensure top-level DV TB DUT is top_earlgrey in the new structure, so connections between top_earlgrey_core to padring, etc. are covered.

@aytong aytong added Priority:P1 Priority: high Type:Question Questions labels Oct 2, 2019
@eunchan
Copy link
Contributor

eunchan commented Oct 3, 2019

I am also OK with having top_earlgrey.sv and top_earlgrey_core.sv. For now, I will create PR that moving top_earlgrey.sv to _core and revise three modules to instantiating top_earlgrey_core. Then we can porting the necessary portions from _asic and _nexysvideo module to top_earlgrey.

@imphil
Copy link
Contributor

imphil commented Oct 3, 2019

The current structure allows you to have different toplevel ports for different targets, and have one place to instantiate target/device specific modules. If you remove that, you end up with

  • Ifdefs around toplevel ports, e.g. to support single-ended vs differential clocks, etc.
  • even more wrapping code prim-like modules.

Please see https://github.com/lowRISC/stwg-base/issues/89 for the full discussion that happened when we choose the current structure.

@tjaychen
Copy link
Author

tjaychen commented Oct 3, 2019 via email

@eunchan
Copy link
Contributor

eunchan commented Oct 3, 2019 via email

@msfschaffner
Copy link
Contributor

Concerning the technology dependent modules I second @tjaychen in that I also think they should be abstracted using a prim_* wrapper approach if possible.

However, @imphil is right that we have to be careful with IPs that have a different interface across different technologies. While the version of padmux / padctrl that we are currently working on can handle a parametric number of generic IOs, we may run into issues with clocking infrastructure IP (like the FPGA clock managers, differential clock signals), or blocks that rely on other pad functionality than just CMOS-style IO (like DDR memory controllers).

Let's revisit https://github.com/lowRISC/stwg-base/issues/89 and discuss this in more detail in our next meeting (or if it takes to long, schedule a separate meeting just for that).

@imphil
Copy link
Contributor

imphil commented Oct 4, 2019

A couple comments from my side:

  • I'm rather convinced we need a target-specific toplevel (currently named top_earlgrey_TARGET) if we don't want to have ifdefs in there, as it is impossible to find a common set of top-level ports for all targets. This toplevel can be generated, or hand-written, as it is today. The goal should be to have this target-specific toplevel as lightweight as possible, and be mostly a wrapper about the target-independent one (currently named top_earlgrey).

  • I don't really care if the target-independent toplevel is called top_earlgrey or top_earlgrey_core. We then however need to find a name for the target-specific ones (or name them all top_earlgrey and put them into different directories).

  • prim_ modules are designed to have a common interface across targets.
    So everything that could be wrapped in a prim_ module shouldn't go into the target-specific toplevel, but the target-independent one.
    In the past, we didn't have sufficiently flexible pinmux IP to have this functionality in the target-independent toplevel. That's changed now, so move pinmux to the target-independent toplevel.

  • Our Verilator toplevel is just the most extreme example of "different toplevel ports are needed for different targets".

  • For a large part of the toplevel DV work, the target-independent toplevel can be used. The target-dependent toplevel is likely to have analog IP or vendor primitives in there which are harder to simulate.

@msfschaffner We do have a silicon meeting on Tuesday. Can you talk with Scott to see if we can discuss this topic there. Otherwise, please find a date/time for a dedicated short meeting on this.

@msfschaffner
Copy link
Contributor

Hey Philipp, thanks for your comments. I tend to agree with most of them, but as you say more discussion is needed such that everybody is on the same page here. I'll check with @sjgitty and @eunchan whether we have enough time in our meeting next week.

@tjaychen
Copy link
Author

tjaychen commented Oct 4, 2019 via email

@vogelpi
Copy link
Contributor

vogelpi commented Oct 7, 2019

Thanks @imphil for linking lowRISC/stwg-base#89. That is helpful to understand the current state.

I agree with @msfschaffner . We should avoid ifdefs and abstract technology-dependent stuff using a prim_* wrapper whenever possible. Further, I also think that this approach is not going to work some things that we currently have in the top_earlgrey_*.sv files. It's not just that switching to a different technology requires another clock infrastructure, even the same FPGA device but on a different board may require another clocking IP with a different interface. I don't see how we could reasonably abstract such differences with a prim_* wrapper and that's probably also why these blocks are on the very top-level.

However, I don't understand how having fewer top-levels eases testing (especially since we are not talking about reducing the number of targets but rather having a single top-level supporting all targets). I would rather say what we want is to reduce the differences between the existing top-levels. Looking for example at the Nexys and ASIC top-levels, the main differences are the clocking and the pad control. As outlined above, I don't think we can unify the clocking. But why don't we just move the pad control into top_earlgrey.sv? Especially since we are now about to add the padmux IP. Wouldn't that be totally sufficient?

@tjaychen
Copy link
Author

tjaychen commented Oct 7, 2019

@vogelpi
it was actually on my todo list to wrap the clocking with a prim_clock* , and within that we would target different implementations. Is there a reason you think this would not work? Ultimately clocking generation comes down to input clock (if there is one), configuration signals, and output clocks doesn't it? To me the main difference is that in real silicon, we likely won't have an input clock, but everything else I imagine to be quite similar (the configuration of course can be different, but memory wrappers for example have traditionally had good way of dealing with this via generic vectors)

and yes, the padmux / pinmux IP will go a long way to solving this issue. @msfschaffner / @eunchan will be discussing this tomorrow, so it will be good to get more information then. As you say, the main differences are pad and clocks, and I felt that clocks could be unified with a prim_wrap + a single ifdef for whether there is an external clock (we may even ignore that for now since I'm making guesses about the final silicon).

Lastly, I still prefer a single top because I strongly prefer DV to be able to cover a large majority of code across all platforms. Fairly recently (maybe a month ago), I noticed a connection error in the FGPA pin top that caused GDB to not fully work. If we have a top level test in DV that exercised that path for example, it would have been easily caught. I've also personally run into a lot of cases where things worked in DV and just did not in FPGA, and they were a pain to track down - so i've more or less developed a blind desire to keep things consistent between the two when possible.

There will still be deviations to be sure, but like I mentioned before, I prefer to start very restrictive, and open up later if necessary.

It may be good for others to chime in also.

@imphil
Copy link
Contributor

imphil commented Oct 7, 2019

On prim_clock: I don't see a common interface we could standardize on. We have

  • 0..2 inputs (internally generated, single-ended, differential)
  • something between no or a highly varied set of configuration parameters (just look at the different parameters passed to the Xilinx clocking primitives)
  • And the good news: always one output port

I also don't see much value in spending effort of having a primitive with a shared interface. The clock is always configured target/board specific, and needed in the top-level file. So instantiating the right module there, instead of somehow passing config information to a primitive and instantiating that, doesn't strike me as a significant win.

Regarding the FPGA pin case you mentioned: this was an example where we had different pinmux configurations between FPGA and other targets. If pinmux is able to handle this, there's no need to have this code in the FPGA-specific toplevel any more (and that's what we're aiming for). You'd still need to have DV to test this configuration, of course.

@tjaychen
Copy link
Author

tjaychen commented Oct 7, 2019

regarding output, it's possible to have more than 1 correct? At least OpenTitan as spec'd by scott is meant to have both a primary clock and a fixed clock. But you are right, this pat will always be the same.

Maybe I'm missing something, but when I look at the FPGA clocking, doesn't it just take an external clock and reset pin?
https://github.com/lowRISC/opentitan/blob/master/hw/top_earlgrey/rtl/top_earlgrey_nexysvideo.sv#L78

I'm sure there are more configurations wrapped up underneath, but I don't really think it would be necessary to expose them at a wrapper level.

Yes I do agree there could be different clock inputs, but to me that delta is small enough that I don't think it warrants a separate top. Now if the story becomes our design is big enough it needs to be split into multiple FPGAs, or that the FPGA needs a very specific set of pins that are different from the normal top, then I agree, a separate top makes sense.

In our current scenario though, I really don't think there is much of a difference, and that's one of the reasons I would like to merge them. If they do become sufficiently different it is easy to split off again, and nothing we do now is meant to prevent that choice in the future.

@msfschaffner msfschaffner added Priority:P2 Priority: medium and removed Priority:P1 Priority: high labels Oct 28, 2019
@sjgitty
Copy link
Contributor

sjgitty commented Nov 5, 2019

Boy, this one is getting long, but I didn't want to open a new issue. Cut/Pasting my
comments from #851 regarding confusing documentation around this area:

We need some more collateral on these distinctions, just not sure where. I think we need a) better terminology of our top-level netlists (issue #295); b) better definition of the difference between the autogen'd top and the platform top (or whatever we call it/them); c) documentation for the different platform tops. Perhaps we restrict the top_earlgrey/doc/index.md to just the autogen'd top and introduce a separate platform document, but things get messy.

@eunchan
Copy link
Contributor

eunchan commented Nov 13, 2019

Before this issue moves forward, Can we get an agreement of top_earlgrey_verilator.sv to be moved to dv/ directory other than rtl/? I am pretty sure the top_earlgrey_verilator module is for verification/ validation not the design.

@tjaychen
Copy link
Author

tjaychen commented Nov 13, 2019 via email

@msfschaffner
Copy link
Contributor

yes, I am also in favor of moving it into the dv directory.

@msfschaffner
Copy link
Contributor

Just adding a note that we should remember to implement the top level reorganization as discussed in December at some point.

This issue is related: #892

@tjaychen
Copy link
Author

tjaychen commented Jan 18, 2020 via email

@msfschaffner
Copy link
Contributor

I think that was this one:
Top-level Reorg (3)

@sjgitty
Copy link
Contributor

sjgitty commented Jul 18, 2020

@msfschaffner when you have a moment can you refresh this issue? Perhaps it is closable now, but we should at least refresh with the latest thinking.

@silvestrst
Copy link
Contributor

Question regarding top_earlgrey_verilator has popped up on the SW meeting. As a part of DIF we have at least one "sanity" test per DIF, which is ran on verilator via CI. At the moment Pinmux and Pad Control on verilator doesn't not have the same functionality as on FPGA (please see the related issue: #2707 ), namely padring is not instantiated, which prevents us from writing a comprehensive test at this time.

How important it is for the MS4 I don't know, maybe @gkelly could clarify this for.

@msfschaffner
Copy link
Contributor

msfschaffner commented Sep 17, 2020

Sorry @sjgitty, I completely missed this one.

The top-level organization has in the meantime evolved quite a bit, but in a different way as we've sketched out above. Below is an update on the current plan and status.

Current Plan

Below is a diagram from @tjaychen which reflects the current thinking.

OpenTitan Power _ Clock Chip Partition

First of all, the chip-level top will eventually only contain:

  • the padring, which only consists of wires and pad instances.
  • the analog sensor top (AST) block, which comprises all analog and clock generation resources. on ASIC, this will be a macro that is going to be added by the vendor. for simulation and FPGA synthesis there will be two open-source models available.

Other IPs that need bus access (like the pinmux, or the AST sensor control) will be located inside what we currently call top_earlgrey (this is named base_earlgrey in the diagram above - more on that later).

Since the pinmux and pinout requirements have become significantly more complex, the pinmux and padctrl will be merged and eventually become one templated module, where the pinout, muxing capability and pin mapping can be defined in an hjson configuration. Also, the jtag_mux (currently located in the chip-level top) will be merged into the pinmux as well, since we need to be able to select different TAPs based on strap pin and lifecycle selection signals.

As part of the pinmux templating, we may also choose to template parts of the target-specific top-levels in order to make sure that the pinmux/padring/pad connections are performed correctly. This also presents an opportunity to align and share common parts of the target-specific top-levels, and thereby solve one of the initial concerns that has been brought up further above in this thread. I.e., top_earlgrey_asic, top_earglrey_nexysvideo, etc could be templated as well and generated from one single source by passing them through topgen as well (however, we've not made a final decision on whether to do that just yet).

For the Verilator testbench, the idea is to align it with the FPGA and ASIC tops as well, although there are a couple of hurdles that need to be overcome in order to enable simulation of the AST model, which currently contains constructs that are not supported by Verilator. Also, AST generates multiple clocks with different frequencies and we need to make sure this can be handled in the Verilator case (for instance, by setting them all to the same base clock fed in from the testbench using `ifdef VERILATOR preprocessor tricks).

As indicated in the diagram above, we may rename top_earlgrey into base_earlgrey in order to make the distinction between the chip-level tops (top_earlgrey_asic etc.) and the functional top-level (base_earlgrey) clearer.

@tjaychen, @eunchan let me know if I missed something above.

Current Status

Most of the changes listed below are tasked for MS4.

top_earlgrey (#5221):

  • move JTAG mux into pinmux
  • merge padctrl and pinmux
  • define new pinmux hjson configuration
  • revise and template the pinmux design
  • rename top_earlgrey into base_earlgrey
    we chose to go for a different naming scheme where top_earlgrey still remains top_earlgrey and instead the target specific chip levels are prefixed with chip. E.g. chip_earlgrey_asic.
  • add strap sampling and JTAG qualification functionality (via life cycle) to the pinmux

top_earlgrey_asic (#5221):

top_earlgrey_<fpga target> (#5221):

top_earlgrey_verilator (#6103):

  • padring instantiated and connected
  • implement verilatable AST wrapper model
  • AST wrapper instantiated and connected
  • template top-level according to new pinmux

other:

Related Issues

Another issue that is related to the top-level organization is the actual configuration of the OpenTitan SoC.
Ideally, we should strive to have as few FPGA emulation platforms as possible, and all of them should have the same system configuration as the ASIC configuration in order to ease SW development and emulation efforts, and keep the required maintenance work within bounds.

That may however not be entirely possible, especially if we intend to keep support for the lower cost development boards with Vivado WebPack support alive (the SoC configuration planned for the ASIC implementation does not fit on the nexysvideo, nor other such low-cost boards).

Hence, we should start thinking about ways of supporting different system configurations - but that discussion merits its own thread (just wanted to bring this up here).

@silvestrst
Copy link
Contributor

@silvestrst - for notifications on new comments

@msfschaffner
Copy link
Contributor

FYI, this status tracker has been updated.

Several important updates RE toplevel organization, pinmux and templating have landed.
There are a couple of opens, mostly revolving around synthesizable AST for FPGA and Verilator, and the Verilator TB reorganization.

@tjaychen
Copy link
Author

latest status:

ast has been merged into fpga and verilator.
This means fpga / asic are mostly similar from a functional standpoint, although the we still need to sort out the pinout situation on cw310.

Verilator still has not merged into chiplevel.sv.tpl, but at least functionally it is now closer to asic, and we should be able to run ast related functions (entropy, power down/up etc) without putting in verilator top level hacks.

@tjaychen tjaychen added Priority:P3 Priority: low and removed Priority:P2 Priority: medium labels Mar 3, 2022
@tjaychen
Copy link
Author

most of the work here is done. The remaining work is tracked in #6103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants