Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half-duplex SPI readback options in NU mode #11

Open
wants to merge 4 commits into
base: v1.3.x
Choose a base branch
from

Conversation

pmldrmota
Copy link

Due to hardware constraints, the MISO line is unavailable in NU mode. Therefore, an optional half-duplex SPI protocol is implemented using an additional 8-bit shift register.

Tested by @dnadlinger

@jordens
Copy link
Member

jordens commented Apr 28, 2021

Triage: to merge this we need to figure out how to handle the protocol revision incompatibility.

Thanks for the pr! This looks like a good work around.

But I'm worried about the comb logic to the async inputs of the flip flops. That looks problematic.
And I don't understand the code yet.

@pmldrmota
Copy link
Author

SPI

Not sure if it helps with understanding, but these are my notes from more than a year ago.

@pmldrmota
Copy link
Author

pmldrmota commented Nov 17, 2021

How should we proceed with the proto_rev incompatibility? In our setups, we kept proto_rev = 8 , since it is fully backwards-compatible, and handle the rest on the ARTIQ side (by specifying sync delay seeds in the device_db, we actively opt in to enable CPLD readback, and the user is instructed to only specify those seeds if the CPLD gateware has been flashed. If that was not done, the automatic sync delay tuning in init() fails as no valid window can be found. The error message could mention this.).

Second, how to make this work for >v1.3? The logic here can be copy-pasted without further ado (and the readback FIFO extended to 32 bits as more macrocells are available) and we could test this on hardware in Oxford any time.
I'm hesitant regarding changes to v1.3.x, as this version has been stress-tested and working on dozens of Urukuls for over 1.5 years.

But I'm worried about the comb logic to the async inputs of the flip flops. That looks problematic.

Which flip flops are you referring to? When porting this to >v1.3, we should avoid this. It's possible I was forced into using more comb logic in order to fit the design into the small CPLD (while keeping optimisation goal timing).

@jordens
Copy link
Member

jordens commented Nov 17, 2021

Unfortunately I currently don't have the bandwidth to review the compatibility and migration aspects through the entire supply and development landscape or help with design and implementation or test, debug and maintainance of these new aspects that you want to achieve here.
Given the situation and pending incompatibility, maybe the best approach is to redesign a standard gateware (#7) and a suservo-only cpld gateware. Couple that with a separate ARTIQ coredevice driver.

@pmldrmota
Copy link
Author

It was brought to my attention that this feature is desired by many users (why else would I bother making this public?). I am proposing a backwards-compatible solution without incompatibilities, which is well tested already.
I don't think a separate suservo-only gateware is desireable. But I agree regarding a separate ARTIQ coredevice driver, which exists, and is in the process of being proposed to be upstreamed to m-labs/artiq by popular request.

@jordens
Copy link
Member

jordens commented Nov 17, 2021

We haven't had a single request or mention. And as you can clearly see here and on the other related PRs, nobody voiced their opinion or interest in this for the almost 1.5 years since this has been posted. Hence all the data that I have indicates that there is zero interest.
It's admirable that you make it public and certainly everyone happy with your code and the level of review, testing, and maintenance it receives is welcome to use it.

@dtcallcock
Copy link

dtcallcock commented Nov 17, 2021

Hence all the data that I have indicates that there is zero interest.

We are very interested and I know others are too. I think the lack of comments and opinions are because this is something that has been working well in production at Oxford for a long time and we all know that. We are just waiting for it to get upstreamed so we can use it too. If this kind of work can't get upstreamed because of @jordens bandwidth or his personal take on user interest we have a problem.

@jordens
Copy link
Member

jordens commented Nov 17, 2021

TL:DR; I'm annoyed by this attitude.

  • This is an open source project. You can fork it if you feel constrained or limited, if it's going to slow, or if you think you can do better. There is a big button on the top right for it. It was done in the genesis of this PR. Nobody will be offended and you are encouraged to do so. Don't claim you are tied to me.
  • Don't fault me for your failure to work on this so far. It's not my job. Do some good work and help with reviews or maintenance then maybe you can ask for something in return. I've seen so many plans to contribute or upstream things that then fizzle.
  • Complaining about others not doing what you want them to do is not the right way. You should not wait silently for me or others to give you things you want and complain later if you don't get it. People need to respond to proposals and/or step up and articulate what they want and be willing to discuss solutions and approaches. They need to find people willing to do what they want done. Clearly none of that happened here for the better part of 1.5 years. The fact that this lies dormant for so long without somebody addressing the pertinent issues brought up is usually a reliable indicator that this is at best a nice-to-have and not a high-priority issue and that nobody is really after this. They certainly had better things to do. I have to base my "take" on this. What else should I base it on? You are silently sitting there and waiting for somebody to step up and do a lot of work that nobody else wants to do. The unwillingness to get involved directly relates to the momentum that this gets.
  • We have to invest our time into the things that we see as worthy. That's things that we need to do for various projects or things that we consider in high future demand and good ROI or things that we simply enjoy. It's both a personal take and a business take. Anything else would be plain stupid. And it would be absurd to listen to third-hand anecdotes about the market size especially if the available first-hand data suggests otherwise.
  • More than two years ago, long before this PR, there was a proposal for improving the Urukul CPLD design (Redesign #7). That proposal would dramatically improve the usability of the DRG, RAM, profiles, OSK, speed up the interface, and streamline the API. It would also simplify development and maintenance and would provide clear rules for managing incompatible hardware/gateware/software combinations and migration paths. Some of these areas have been known and well-documented pain points for users (e.g. DRG) and we presented this proposal to those users. You have seen it as well. There has been zero uptake. It certainly shows that you are not interested. Yet I don't see your lack of interest as necessarily problematic.
  • Please consider the value of your argument "X is something has been working well in production at Y for a long time and we all know that" for a few other examples of X and Y. I hope you'll agree that there are many instances of X and Y that the statement is true for (like HFGui and NIST but certainly many others work just as well). But in a lot of these cases the success or quality of the work that was done using X at Y has next to no predictive power for the viability, correctness, quality, desirability, or success of X. It's a fallacy. Don't fall for it.

@pmldrmota
Copy link
Author

Triage: to merge this we need to figure out how to handle the protocol revision incompatibility.
Thanks for the pr! This looks like a good work around.

I proposed a way to handle the protocol revision incompatibility to address your concern. If you don't think this PR is a good work around anymore, could you explain why you changed your view so we can discuss it?

More than two years ago, long before this PR, there was a proposal for improving the Urukul CPLD design

The redesign you reference doesn't propose a solution to the lack of SPI readback, the topic of this PR. Never mind the merits of a redesign, this PR solves a problem for many users, it is well tested, was developed for free, and - even if it may only be a "nice-to-have" in your personal opinion - this PR enables phase-coherent operation of suservo, something we rely heavily on and I have been asked to share. By all means, I don't mind sharing it on a forked channel as you suggest, but it seems unnecessary to diverge on a small feature like this.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Nov 18, 2021

@jordens If you do not have the bandwidth to review something then I recommend that you simply refrain from commenting, or delegate the review to someone else. It certainly took you more time to write the rant above than it would have taken you to answer Peter's legitimate technical questions, with the very real possibility that an ensuing argument uses even more of your (and other people's) time. It is also not encouraging people to make contributions that we want to have, and squandering the goodwill that obviously went into making this PR and offering more tests. This sounds particularly counter-productive to me.

@pmldrmota

  • The comb logic to the async flip flops is what is connected to the PRE inputs of the FDCE. The current code is problematic indeed. Combinatorial circuits can produce glitches which when applied to asynchronous inputs can cause bugs. If you can, connect asynchronous inputs to flip-flop outputs directly. If you cannot do that, things like a single FPGA LUT are usually glitch-free if their inputs have appropriate timing (the CPLD architecture used here would need some research, I am not familiar with it). Here in particular the timing on cs with the comparator ==2 looks suspicious, not all cs bits arrive at the same time.
  • Your code is 100% backward-compatible (in theory at least) if you do not set EN_RB, is this correct? In this case I would suggest setting proto_rev=9 and modifying the ARTIQ driver to accept both 8 and 9. When it is 8, the new functionality would not be available in the driver. This way: a) old Urukuls with new ARTIQ still work as before b) new Urukuls with old ARTIQ print a clear error message, which is superfluous but easily solved by updating ARTIQ (we can make the release branches also accept 9, but without new functionality).
  • Yes we definitely want to support all Urukul hardware revisions and in particular v1.3 and v1.5 which are the most deployed.
  • Why did you remove the test points?
  • Could you explain the clocking changes further?

@dnadlinger
Copy link

Your code is 100% backward-compatible (in theory at least) if you do not set EN_RB, is this correct? In this case I would suggest setting proto_rev=9 and modifying the ARTIQ driver to accept both 8 and 9.

Ack – this is actually what we did internally at first as well. IIRC the reason we switched back to just keeping 8 was just because some people were using older ARTIQ versions.

I'll have to leave explaining the flip-flop logic/clocking situation to Peter – I only vaguely remember that the obvious straightforward implementation didn't fit the smaller <v1.4 CPLD, but of course we need to find a glitch-free way of doing this nevertheless.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Nov 18, 2021

I'll have to leave explaining the flip-flop logic/clocking situation to Peter – I only vaguely remember that the obvious straightforward implementation didn't fit the smaller <v1.4 CPLD

Maybe we can implement this new feature on HW v1.4+ only, and keep proto_rev=8 on the v1.3 gateware. This works with the proposed scheme for the ARTIQ driver.

(By "support all revisions" above, I meant no regressions with ARTIQ updates).

@jordens
Copy link
Member

jordens commented Nov 18, 2021

@sbourdeauducq I chose to address the comments by @pmldrmota and @dtcallcock because they are structural issues that I see as high risk and impact compared to the technical ones. I see that you'd rather refrain from engaging. It's also not my job to recruit people to delegate reviews and maintenance to. Everybody had the chance to review and comment for 1.5 years and nobody bothered here and on #7.
@pmldrmota I don't say that at all. I brought up #7 to show that arguably high value improvements do not receive any attention or any contributions on any level despite there being plenty of requests for the affected functionality. I also brought it up because I believe that first address long standing fundamental issues and then later addressing the potential changes to NU-mode is better than doing it the other way around. Since #7 would break compatibility anyway, there is much less drama and more freedom for changes to NU-mode. That's the reason why I am skeptical of your approach to compatibility. I disagree with your characterization that this is a small change. It makes a fragile somewhat overloaded design even more tricky and marginal.

@pmldrmota
Copy link
Author

pmldrmota commented Nov 18, 2021

Is the revision addressed by this PR (v1.3.x) even affected by #7?

Why did you remove the test points?

The fitter was running low on resources during synthesis.

Could you explain the clocking changes further?

  • The cd_le moved from inside the SR module into the Urukul module [nfc]. The reason for this was just that rb_state (inside Urukul) is synchronously updated with this clock, so I needed to access it.
  • The cd_le is clocked by any CS assertion instead of just CS==1 (CFG). This doesn't affect the function, as all the relevant logic is additionally conditioned on a matching CS.

Thank you @sbourdeauducq for the notes, I see now that the comb logic on the preset input of the FDPE is critical. I think the change from the one-hot sel to the multi-level sel signal was made during one of numerous attempts to reduce the footprint of the design in order to fit the 8-bit shift register.

The changes in this PR have two different purposes:

  1. Adding the half-duplex readback option. This includes the 8-bit shift register, protocol logic (e.g. rb_state), and the clock changes explained above.
  2. Reducing the number of macrocells required in order to fit this into the small CPLD of v1.3.x. This was achieved by removing the test points and removing the one-hot sel.

I should've asked this first. Is -optimize density an option for v1.3.x? If yes, we could save ourselves all of the discussion about the changes related to point 2 in the list. On v1.4+ we don't have these restrictions even with -optimize speed and we could just port the changes related to point 1 in the list if they are ok.

If -optimize speed is a requirement on v1.3.x, we can try reverting sel to a one-hot

sel = Signal(8)
self.comb += Array(sel)[cs].eq(1)

and clock cd_le from any CS (~sel[0]) and latch the attenuator at every CS as well (this saves one macrocell, and latching the attenuator to its last value repeatedly has no effect). I can have a look if there are enough macrocells, but I don't think there are.

For multi-chip sync delay calibration, we only need the four STA_SMP_ERR bits. At the moment, the STA_SMP_ERR bits are located after the RF switch status bits, which is why the readback shift register needs to be 8 bits long. It could also be 4 if we swap the location of STA_RF_SW and STA_SMP_ERR in the CFG register. This would be a breaking change though, and that's why I chose not to do it. Swapping them would make the task of fitting the design into v1.3.x a lot easier.

@pmldrmota
Copy link
Author

I can have a look if there are enough macrocells, but I don't think there are.

Just did, and I have to correct myself: The one-hot doesn't cost any macrocells. Keeping the one-hot the issue with the async input shouldn't exist anymore, is that correct?

pmldrmota and others added 3 commits November 18, 2021 23:51
Due to hardware constraints, the MISO line is unavailable in NU mode.
Therefore, an optional half-duplex SPI protocol is implemented using an
additional 8-bit shift register.
This is necessary to fit the design into XC2C128-6-TQ144
Avoids incompatibility with current ARTIQ coredevice driver.
@pmldrmota
Copy link
Author

Sorry for the force-push wave. I hope it is more readable now.
I will test this gateware as soon as possible.

@sbourdeauducq
Copy link
Member

If -optimize speed is a requirement on v1.3.x, we can try reverting sel to a one-hot

cs = Signal(8)
self.comb += Array(sel)[cs].eq(1)

This would still glitch. You need to register the one-hot signal synchronously to remove any possibility of glitch.

I should've asked this first. Is -optimize density an option for v1.3.x?

If it still meets timing and does not break Urukul synchronization (please test), I would think so.

Do you absolutely need this feature on HW 1.3 or can you use it only on those v1.4+ boards you have?

@pmldrmota
Copy link
Author

Ok, but this was the original implementation! I changed the code yesterday to remove all my changes from this part.

What would be done differently on v1.4+? Now the only difference to the orginal code are

  • the clock changes (cd_le clocked by any CS)
  • the removal of the test points on v1.3

I had hoped that the original implementation was already glitch-free.

@pmldrmota
Copy link
Author

Do you absolutely need this feature on HW 1.3 or can you use it only on those v1.4+ boards you have?

I can't answer this for everyone interested. My experiment will continue to use it on 1.3, mainly because I don't think we have enough v1.4+ to replace them all. But this PR is here for everyone who wants this feature in the future, what do they prefer?

I can do the testing on v1.4 and v1.5 as soon as/if this (in particular in the latest form of v1.3.x-rb (1d6a6b0)) or an alternative approach passes your reviews.

@pmldrmota
Copy link
Author

pmldrmota commented Nov 23, 2021

Update: I tested the fixed-up revision (1d6a6b0) on a v1.3 in NU mode.

  • The AD9910 sync delay auto-tuning is successful on all four channels, windows didn't change compared to original build
  • The I/O update delay auto-tuning is successful on all four channels, values didn't change compared to original build
  • The attenuator readback (one of the most complex half-duplex transactions) shows no errors in >1000 tries
  • Our single-qubit randomised benchmarking that is driven by a channel on this CPLD didn't degrade

Are there any more tests you would like me to do?

@sbourdeauducq
Copy link
Member

Ok, but this was the original implementation! I changed the code yesterday to remove all my changes from this part.

Hmm, that's bad. Would you be able to clean it up?

@pmldrmota
Copy link
Author

Hmm, that's bad. Would you be able to clean it up?

I don't know how.

You need to register the one-hot signal synchronously to remove any possibility of glitch.

Assuming CS is asserted one SPI clock period before the first rising edge, shouldn't this be enough time (2 sysclk cycles) for self.comb += Array(sel)[cs].eq(1) to settle until it needs to be valid? It's not guaranteed by the gateware, but I don't see a way around it without needing many FFs (which aren't available on v1.3).

@pmldrmota
Copy link
Author

I think this PR should be closed and another one opened for v1.4+, where the shift register can be made 32 bits wide, simplifying both the gateware and the ARTIQ driver significantly.

I would agree with

Maybe we can implement this new feature on HW v1.4+ only, and keep proto_rev=8 on the v1.3 gateware. This works with the proposed scheme for the ARTIQ driver.

@sbourdeauducq
Copy link
Member

I don't know how.

Do you not understand the general/fundamental issue with combinatorial glitches into an asynchronous register input, or details of the Urukul CPLD code?

@pmldrmota
Copy link
Author

I understand that it can be a problem. The author of this code @jordens must've been aware of this issue when they wrote it, so I'll leave it to them to explain. If a change is proposed, I'm happy to test it.

# clock the latch domain from selection deassertion but only after
# there was a serial clock edge with asserted select (i.e. ignore
# glitches).
Instance("FDPE", p_INIT=1,
    i_D=0, i_C=ClockSignal("sck1"), i_CE=~sel[0], i_PRE=sel[0],
    o_Q=self.cd_le.clk),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants