-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable mbridge serial port receiver when transmitter is enabled #65
Conversation
many thx interesting ... could you figure out/understand why only disabling the isr doesn't do it? I mean, it should... shouldn't it? pl use the respective LL functions, should be e.g. LL_USART_EnableDirectionRx(). |
I don't see any reason why only disabling the isr would prevent data from entering the USART shift register or Receive Data register. Then, when the interrupt is re-enabled, I'd expect it to fire immediately and the data register would contain data received while the interrupt was disabled. Perhaps a second character from the shift register might also be received, I haven't checked. When I added my dbg output I wasn't looking for this specific bug and I only output the first unexpected character received after each transmit (The dbg output is lower baud rate so I didn't want to output much). So all I can tell you is that in parse_nextchar() in STATE_IDLE when we were expecting MBRIDGE_STX1, we often received 0xA2. Sometimes we received 0x00. Since the data received was pretty consistent, I guessed that this byte might be from MBRIDGE_CMD_TX_LINK_STATS and when I eliminated those from being transmitted, the spurious 0xA2 bytes went away and I sometimes saw other characters (I think mostly 0x00, but I can't remember). At that point, I realized what the problem was, fixed it, and verified that the state machine now received no unexpected characters. I don't know exactly what happens when the USART receive data register overflows. I'm not especially interested in investigating further. BTW, is it possible that this problem also causes the occasional mLRS configurator script crashes and errors like failed to read parameters in this script? |
well, when the isr is re-enabled, also the isr pending bit is cleared and thus the isr does NOT fire opposed to your assumption ... it should fire only when again a new byte is received ... strange
I don't see any occasional mLRS configure script crashes anymore (on OTX at least), are you refereing here to some outdated situation? or you are saying you still get them? besides this, this thought crossed my mind too, that it e.g. may prvent having to reload, or that teh script now also works better on EdgeTx (as said before, the issue should be the same with CRSF) |
I did still get them, but I'm not sure I'm running the latest script. OK. Based on your logic, it can't be left over received data which is received when transmit is done. |
FYI - I did a quick test of these changes on E28 Dual + EdgeTx and didn't notice any difference with the Lua. 31 / 50 Hz generates a fair amount of errors. |
I'm not sure I understand what you are saying: UART_RX_CALLBACK_FULL() is called inside In your point of view, should it therefroe resolve the issue if
(b) could be conveniently done by adding to uart_rx_enableisr() the calls LL_USART_ReadReg(UART_UARTx, REG_SR); LL_USART_ReceiveData8(UART_UARTx); before the IT flags are cleared and enabled. If this would solve the issue too it would be a clean solution. is it possible that all we need to do is to add a LL_USART_ClearFlag_ORE(UART_UARTx) in uart_rx_enableisr()? |
Yes, since UART_IRQHandler() is not checking RXNEIE. If you see a better way to fix it than the obvious way I've proposed, feel free to ignore this PR and fix it. But, on quick |
BTW, I haven't tested this, but I suspect there is actually no need to disable the RXNE interrupt at all if we just disable the receiver (RE) when we transmit. |
first off, I'd like to understand the underlying issue. I generally do not feel well with doing things I don't understand why. This also always bears the risk that while it may mitigate an issue in a particular case, it may actually create an unexpected issue elsewhere later on. |
You seem to be saying you don't yet fully understand the underlying issue. I didn't see the full root cause when I first submitted the PR, but after you pointed out that we clear the interrupt when we enable it, I looked at the library more carefully and now I do think I see the full root cause. I've tried to explain it, but I don't know what you are missing. I explained with references to specific functions and registers above. Please read that again. Meanwhile, I'll try again in more general terms here: There are not separate low level ISR routines for transmit and receive. There is only ONE ISR which handles both transmit and receive, the address of which is stored in the interrupt vector table. If I'm not mistaken, the interrupt vector table for our USART points to the UART_IRQHandler() function. This function is called for both receive and transmit. You tried to prevent receive by disabling the interrupt for receive, but you left the interrupt enabled for transmit. You did not actually disable receive in the USART. Thus, when we transmit on a half duplex line, a transmit complete interrupt is generated and the low level ISR sees the a character has also been received and it calls the functions which handle this as I explained previously. Thus, the root cause is a combination of not disabling the receiver, not disabling the transmit complete interrupt (which we need), and having a single function which handles the interrupt for both transmit and receive (a hardware design fact). Disabling the receiver when we transmit seems the most logical fix to me. Where you put the code to do this is a detail you can decide. Does this make clear the root cause? |
it doesn't really explain it there is a second dangling point, not sure you have noted, there are code fragments there I tried to not having to use the diode for the jrbridge, but whatever I tried so far I couldn't get it to work, but it shouldo if it is as you say. This also makes me suspect that there is a detail still missing. And as third point, I'm using this exact kind of half duplex since ages in the STorM32 project, with no such issues. Also this makes me supsect there is a detail missing. |
Ignoring what I initially concluded was being received based on my limited
dbg output, do you see the bug in the code or not?
…On Mon, Apr 3, 2023, 3:18 AM olliw42 ***@***.***> wrote:
it doesn't really explain it
in transmit the bytes are reflected to the receiver, so the reciever does
ALWAYS receive something, which means that the RXNE flag MUST be set at the
time of TC (if a transmission was happening of course). Therefore, there
ALWAYS would have to be a spurious byte. Which is not what happens, as you
also report. QED
there is a second dangling point, not sure you have noted, there are code
fragments there I tried to not having to use the diode for the jrbridge,
but whatever I tried so far I couldn't get it to work, but it shouldo if it
is as you say. This also makes me suspect that there is a detail still
missing. And as third point, I'm using this exact kind of half duplex since
ages in the STorM32 project, with no such issues. Also this makes me
supsect there is a detail missing.
—
Reply to this email directly, view it on GitHub
<#65 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKQOEPBMQDGCWIF6IJKSNLW7KBUTANCNFSM6AAAAAAWPCMMDA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I don't know what I should make of this post |
Do you agree that disabling rx interrupt is not enough to prevent receiving
back bytes we transmit with your existing code?
…On Mon, Apr 3, 2023, 3:52 AM olliw42 ***@***.***> wrote:
I don't know what I should make of this post
I think I raised a valid contradiction, which should have manifested
itself from day one
if you want to call it a "bug in the code" then your approach is obviously
not the correct solution of it
we probably should change then the lib
—
Reply to this email directly, view it on GitHub
<#65 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKQOEPASJRFNRUFKB2YTJTW7KFVRANCNFSM6AAAAAAWPCMMDA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
LOL the existing code obviously does not prevent back bytes, if one doesn't want ignore what you initially concluded was being received based on your limited dbg output |
Olli, Please be patient with me. I think I've learned it's best to take things one step at a time in situations like this since if either of us misunderstands something basic, we can waste a lot of time in pointless debate so I wanted to confirm we are on the same page so far. So we agree that the existing code ends up processing part or probably all of every transmitted message as received data. I think this depends on details of how long it takes after the last bit is transmitted for the receiver to recognize the stop bit and transfer the shift register to the data register and set the not empty flag. I think it's possible that we might not receive the last byte transmitted. Now we need to consider the effect the received bytes may have on the state machine in parse_nextchar(). We don't transmit a MBRIDGE_STX1 byte at the beginning of the transmitted message from the module to the radio, so most received bytes will stay in STATE_IDLE. But, if the last bugus byte we receive happens to be MBRIDGE_STX1 (about 1/256 data bytes, but we could get lucky or unlucky and have a few of these), we will move to STATE_RECEIVE_MBRIDGE_STX2. This is where the first real trouble happens. Now, the first legitimate byte of the next message will be MBRIDGE_STX1, but we will discard it and move back to STATE_IDLE (either because we weren't already in idle or because of a timeout). Thus, we loose a message in the radio to module direction. I suspect might be is what was causing occasional failures to download parameters since the GCS doesn't seem very smart about retry. This is where I start to run short of understanding what else might be going wrong. I'm not sure I see a likely scenario that explains how we loose messages in the module to radio direction. But I know we do because I see lost messages in this direction also without this fix. Does the radio side do anything different if it doesn't get a message back after sending one to the module? Does this get you closer to being comfortable with what we understand and what we don't? |
fyi, I've pushed updates to the stm32ll-lib with changes to uart, some were looming anyways, but I've also added the isr checks. tested on three mLRS systems, as well as on several STorM32 NT&T systems |
thx for the effort at explaining your picture. I however maintain that it doesn't really explain. The race condition you mention cannot be relevant, since there are always two bytes transmitted at least, the RXNE flag thus must have been raised at the time of TC. In fact, according to the data sheet a second received byte should produce an ORE and not change the DR data byte, that is, the - first - spurious byte in fact should always be a 'O'. Also, you figured yourself that it wouldn't explain things in the other direction (which is the more harmfull in terms of param download). |
Olli, You were the one who was concerned about exactly how this bug was causing the symptoms I saw.
As I've been saying, when we don't agree, it's often a missed fact on one one side or the other. In this case, you seem to have missed that the two sync bytes 'O' and 'W' are not transmitted in the module to radio direction. The first spurious byte I see is indeed the first byte transmitted. I was briefly confused about this myself, but since you wrote the code I didn't expect you might also be confused, so I didn't mention this in my last few comments. I don't know why the two directions don't use the same format. Strangely these two bytes are inserted into the received byte stream by your OpenTX code on the radio side and then, removed again by the receive state machine. This code in OpenTX seems more than a little strange to me and while it's doesn't matter for this discussion I'd like to know why you did that! |
I have no idea how my statent would impart this. your experimntal findings will be the test, so lets hear your experimental findings
argh, indeed, I have confused this
this is a master slave protocol, and the slave wants to be able to sync with the master
the timing/frame detection is done in the isr, this is just a simple method of telling the parser where the frame start is. Use any method you like. |
Because you said this:
And, to be clear, I'd generally prefer to understand why as well, especially when it's my code that's misbehaving. But, in this case, it just seemed so obvious to me that receiving our own data could cause problems that I didn't want to spend time in all of this discussion.
Yes, I should have some time to test your new code today. |
I've tested your latest code and your changes do indeed fix the problem. With my debug code added, I no longer see any spurious bytes received nor any unexpected state transitions and I'm seeing no lost messages. I'm going to close this PR since it is no longer needed. Since the bug is now fixed, I don't want to waste your time or mine with much further discussion, but I do want to mention, as feedback, for your consideration, that our views on the best way to implement half duplex do seem to be quite different. In my opinion, you have effectively added undocumented side-effects when the rx interrupt is disabled (for example, discarding received data) which I think could interfere with other possible use cases (for example, polled reception or briefly disabling the rx interrupt for the purpose of avoiding a race condition). I think this makes your library more complex, harder to understand, and less general. In my opinion, it would be much more clear to simply directly disable the receiver when in transmit state. You could then just leave the interrupts enabled. In short, I think the way you have fixed the problem is less efficient and more complicated than it needs to be. But I'm very happy I was able to isolate the problem I was observing with lost messages and I'm even happier it is fixed now. Thanks! And thanks for mLRS and all the time you spend to make it available to everyone! |
nonsense. The changes I did are all documented - very loudly in fact - in the datasheets. Second, I did these changes not mainly to somehow resolve the issue here, but since they make sense from the general perspective of what one should do when enabling/disabling isrs and only one isr specifically. It is to the contray odd to enable/disable the direction if one wants to enable/disable the isr. good it also solves the issue. So, you were correct that it was the error path in the isr which caused these spurious bytes. Given I use that code since more than ten years I'm a bit surprised I haven't noticed that before, but better late than never. So many thx for that. |
Who says we want or need to disable the ISR? The need is to switch directions, to enable transmit and disable receive on the UART or reverse. There is no need to disable the interrupt if we disable receive. If the above really doesn't make sense to you, we must be thinking about this in very different ways. |
This eliminates the spurious received characters I was seeing in my debug output on my R9M
With this change I'm no longer seeing lost messages when connecting QGC via bluetooth routed from mbridge with openTX.