-
-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
settings for logical switches are sometimes changed by itself #1834
Comments
This is now the third report of such or similar incident I have seen on forums. Possible causes:
@wolkstein please provide more details such as:
|
hi, first
it is always the same change. LS1 get an additional "And Switch" Defined to !=SA↑. this was each time if it happens exactly the same! second
Yes, i have two scripts running. one telemetry script and one model script. and i have two models in use. FLY-AMANITA is my first and old setup, which use this files (bug happens surly at this model)
FLY-bMANITA is my second and new setup, which use this files (i an unsure if it happens with this model)
last
YES!! i often lost connection. if i download my apm logfiles i see that my apm often switch to RTL as defind as Transmitter failsave operation. this happens although i am in a really short distance to model (4-10meters). greetings |
Someone reported a similar issue on a forum, also L1 is being changed but not only the AND switch, also the other values. He seemed to have matched it as happening at the same time as an "SD card error". |
In the LUA script, there is some string concat operation ( .. operator) which means a lot of work for the garbage collector. There is also some local declaration within function. Perhaps it's a similar problem as we had previously with the predimrc.lua script. You should try to remove concat operation for exemple by replacing this kind of code :
by this one :
For the local declaration, you can try for example to move the declaration at the script level by replacing this kind of code :
by this one :
If by testing this you solve the issue, this will help the team to identify the cause of the problem. |
thx, mostly i write in c++. lua is totally new for me. but making all vars script wide is a little bit strange for me. because i often use same var names in different functions. /g edit: |
hi, i have reproduce the error here after ~40 minuts uptime. taranis was running with modified lua scripts as described in previous comment. model was FLY-bMANITA. after ~40 minutes LS1 get an additional "And Switch" Defined to !=SA↑. this happens while taranis play massive audio. i have defined audio Messages to GEAR and FOTO switches. after 40 minuts uptime i toggle this switches very quickly ~30-40 times to produce massive audio messages. and than suddenly my apm switch from stabilize to althold. this happens because LS for my first position become the !=SA↑ and switch. i am not really happy with this problem. it feels that you can not really build on your model settings. i make a binary diff from my models.eepe files bevor and after the bug happens. here the diff output.
/g |
Would you give a link to your 2 eeproms? These eepe files are in compressed format, which is not the raw memory dump. It would be interesting to dump both of them and see which data have been written in the memory write overflow. |
On which screen are you when the problem happens? On your custom Lua telemetry screen? |
1st LS andsw is at offset 8 in the logical switches array. This array comes after the big array of 512 curve points (512 bytes) which is often unused. It would be interesting to know if there is some difference in this array as well ... |
Also would it be reproductible in the latest stable nightly build here: |
My main suspect is the audio thread. There are also some reports (on forums) that audio sometimes stops working (only wav, the tones continue to play) until radio is power cycled. As Bertrand already pointed out, there are a couple of fixes related to this (SD card and audio thread) in master. @wolkstein please test with some of the latest nightly builds and report back. |
hi, i will try a nightly build. can i install(flash) the nightly 2.0.13 builds via dfu util? i am on linux and i have companion Version 2.0.12, Sep 19 2014 installed. my dfu-util version is dfu-util 0.7. edit: |
Those eepe files are compressed dumps, it would be nice to store somewhere the eepe before / after the bug so that I can compare them (after inflate, Companion can do that with one line uncommented in opentxinterface.cpp) Yes using Linux is the easiest way. I do that everyday. |
No news, good news? |
:) yesterday i install a nightly build on my taranis. but i have no time to play around. think at the weekend i will do some tests. |
We all cross the fingers :) |
hi, i do some things to circle the effect. info, i get this (my first taranis) two weeks ago with this sdcard inside!! firstly, i switch back from the nightly build to my old firmware 2.0.12 second, i put the sd card into my computer an open the affected soundfile with audacity. soundfile was good no stuttering. samplerate, sample format and sample depth are as expected. so no problem here. i opend the file directly from sdcard, no caching to my computer!! third, i make a file system check with gparted and fsck.fat.
you can see clusters and sectors are not as expected. next step, i backup my sdcard and reformat the card with fat16 "mkfs.fat -F16 -v -I -n". than i copying all files back to card. after starting taranis with my old 2.0.12 firmware "stabilize active" sounds without distortion. also sound was good after re flashing to nightly build firmware. so at the end no result:( but today, before i noticed the sound problem, i do ~100 file copy actions, mount and umount actions with taranis conected as usbdevice. maybe because this the problem starts. stabilise active was stuttering at the first time it was played today. /g |
Interesting. So you already know our answer ... would you try to reproduce those ~100 copy actions and see if you reproduce the problem ? |
This is very similar to audio stuttering issue in #1779. Looks like reading fragmented file with FatFs takes more SD sectors reads than without fragmentation (well that is expected), but exactly how much more? Maybe I will research this just for my curiosity sake. But to solve this part of the issue, maybe we should recommend users prepare their SD card like so:
|
Hi I know it's my fault because of my lousy script :) Andre's workaround for the audio stuttering problem is the same I've pointed at the other issue #1779 although I'm a bit more radical because I format the card. I'm building some audio files that will have different total occupied sectors (file length). |
So ... do we close both issues and release? |
If I don't have more unexpected reboots :) Since it's raining this weekend I was planning on more testing of a recent build as you asked. |
I had this for the first time ever today, only difference from anything before was I ran this winter postal task script from on4mj. So to be clear it did not change the model with the difference script in but changed the old normal model on load after flying with the other script. |
hi, models in test, FLY-bMANITA and amanita. amanita is the new model i create today from scratch. today, all tests are on firmware nightly build from 5.11.2014. /g edit: |
Do you mean that you had the problem with our nightly builds? |
Same mutexID appearing twice in TCBTbl is perfectly normal. It just means that two threads are using it. One is inside mutex, other is waiting for the first one to release mutex and will take it right after that. What is problematic is the case of the mutex being entered two times (your mutexCheck variable). That is plain wrong and must not happen. There are two possibilities why this happens:
I will do some more research about the second point. Maybe we need to add some memory barrier statements also. Now about the bug in FatFs Anyway it is a simple fix in FatFs that I have already done and have stashed in my workspace. I will commit it later. |
I am just checking assembler output of |
First problem in CoOS. When entering mutex there is a check if scheduler is locked, if it is the call is abandoned, otherwise it continues. Code fragment: if(OSSchedLock != 0) /* Is OS lock? */
{
return E_OS_IN_LOCK; /* Yes,error return */
}
#if CFG_PAR_CHECKOUT_EN >0
if(mutexID >= MutexFreeID) /* Invalid 'mutexID' */
{
return E_INVALID_ID;
}
#endif
OsSchedLock(); // just does OSSchedLock++ I think that the OSSchedLock checking and incrementing should be an atomic operation. As it is now there is possibility of thread being interrupted between OSSchedLock check and increment. Same problem is in CoOS flags and mailboxes, but we don't use them. |
Would it be a problem in such case? I didn't read everything but I think no, it seems rather an internal sanity test, if CoOS schedules another task at this time, the schedlock will be incremented and decremented (possibly even many times if another task also takes any mutex) Instead the code (checking and changing mutexState) after this lock seems to me really problematic. If it is not considered as volatile, I am afraid it can be moved before the lock when O2 optims enabled. |
This is ok, still there is little code reordering, but it does not affect desired result. Same section at
This is very bad: before OSSchedLock is incremented and stored in memory this happens:
@bsongis please review my findings. |
Ha, compiler barrier completely cures above problem with OsSchedLock();
asm volatile ("" : : : "memory"); // prevents compiler reordering
pCurTcb = TCBRunning;
pMutex = &MutexTbl[mutexID]; And resulting assembly:
|
Yes I confirm this is really bad to compile CoOS in -O2 |
With
So yes, I agree on |
And I think that volatile for mutex structure or any of its members is not needed, because it is protected by OSSchedLock. |
Right I introduced it to prevent them from being read/modified before the lock, obviously it works as well, prevents the reorder (I didn't checked though) but it's not needed. |
Will you add memory barriers? |
Yes. Perhaps in changing the OsSchedLock() macro. Are you sure that memory barrier is needed for __disable_irq(), it's already ASM code? |
It can't hurt. |
Congrats guys. |
Congratulation for this bug fix! A difficult one to tackle. 👍 |
It is, it is now optimized for size mainly. |
More room for some new great feature !!! |
I've been using -Os for ersky9x anyway. I had a look at the assembler and did see the odd assembler instruction that appeared in the middle of the sequence for Incrementing the OsSchedLock variable. Mike. |
Yes Mike, now we have: |
But I have the extra asm("") in front as well. This brackets the OSSchedLock++ with 2 asm directives, so the compiler doesn't let other code cross either boundary. Mike. |
In my current understanding putting I am also concerned about IRQ disable/enable functions. I propose (again) that we change them to (only add __attribute__( ( always_inline ) ) static __INLINE void __enable_irq(void)
{
__ASM volatile ("cpsie i" : : : "memory");
}
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
__ASM volatile ("cpsid i" : : : "memory");
} Without the |
I believe there are TWO different things here, what the compiler does and what the processor does. The code I get for OsSchedLock is usually like: 4083ca: 7e5a ldrb r2, [r3, #25] Without the leading asm(""), sometimes other instructions are interleaved within this sequence. I believe this is because the compiler assumes any asm code might have an effect the compiler doesn't "know" about, so must not rearrange an code before the asm to appear after the asm. The second thing is the processor may reorder physical memory accesses, regardless of the actual instruction order. It may be necessary to use a real DMB (Data Memory Barrier) instruction. This may only be necessary for multiple processors, rather than multiple tasks on a single processor. According to the datasheets for the STM ARM processor, and the Atmel ARM processor, this instruction is available. I found this: On the AVR compiler, I have noticed that if you have a return in the middle of a procedure, the compiler sometimes generates a set of pop instructions and a return in addition to those at the end of the procedure. terminating the procedure with: Mike. |
Wow - quite the thread. |
hi,
i noticed a strange problem with logical switches. i have defined 6 Logical switches to get six flight-modes at my apm2.6 using switches SG and SD. I defined for each switch an function(and) V1 and V2 depending to switch position.
"And Switch", Duration and delay for all six Logical switches are unused.
now the problem. if the taranis is powered longer than 30-45 minuts suddenly my LS settings are changed by self. suddenly i can not switch back to first position(first flightmode LS1) . if i open the Logical Switches tab in model settings there is an And Switch Defined for LS 1. connected to !=SA↑. what goes wrong. i using opentx 2.0.12
i noticed this bug two times(taranis uptime >30 minutes). but i can not reliably reconstruct the bug. mostly nothing happens although the taranis uptime is >45 minutes.
/g
wolke
The text was updated successfully, but these errors were encountered: