Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unbounded recursion on _find_port and busy spin in main thread #22

Merged
merged 10 commits into from
Feb 21, 2021

Conversation

b3nj1
Copy link
Contributor

@b3nj1 b3nj1 commented Feb 21, 2021

Hi,
Please consider this PR which fixes a bug and performance issue.

I can't leave my rower connected all the time. When trying to find the rower port, it was calling recursively. Eventually, it hits the stack/recursion limits. This PR fixes that by just using a loop. It also only warns every ~5 minutes.

I also noticed that the script was using 100% CPU while doing nothing. After the threads were launched, there was a busy loop. I changed this to poll the status of the threads, and to bail if any thread dies.

Thanks,

  • Benjamin

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

Many thanks for the code update, I will check it this evening and will merge it.
Hey sorry if I sound dumb but might you explain a bit more about the issue. My python skill are Okey but programming is not my daily business and more sort of hobby as you might have seen from my code.

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

Why did you revert the deepcopy thing. I also have a fork from someone doing the same thing but with copy only. I wasn't aware of that function. But if more people fix my code with copy there must be a reason.

@inonoob inonoob added the bug Something isn't working label Feb 21, 2021
@sjeri
Copy link

sjeri commented Feb 21, 2021

Can't speak for b3nj1, but in my fork (which you may mean in this case) I switched to use copy because python is 'pass by object reference'. What you enqueued was not an own object but a reference to the original. Which means when you enqueued it the object was in a state but may have changed already when dequeued. Or even worse, in my case where I am working on the object by adding HR data afterwards from a different thread, it would be a simultaneous write to the same object.
Try this in a shell:

>>> from collections import deque
>>> out_queue = deque(maxlen=1)
>>> obj = [0,1,2,3]
>>> out_queue.append(obj)
>>> obj.append(66)
>>> obj
[0, 1, 2, 3, 66]
>>> out_queue.pop()
[0, 1, 2, 3, 66]

By doing a obj.copy() when enqueuing, this gets fixed:

>>> from collections import deque
>>> out_queue = deque(maxlen=1)
>>> obj = [0,1,2,3]
>>> out_queue.append(obj.copy())
>>> obj.append(66)
>>> obj
[0, 1, 2, 3, 66]
>>> out_queue.pop()
[0, 1, 2, 3]

Also, in my case, I need to dequeue and store the records for FIT processing. If I would not use copy, I would end up with a list of all items being the same object (last record created in all indexes).
So in the end I had two possibilities: either copy() or fully defining a new object each time.
In your case where the data is not touched or needed to be stored for anytime later, but only read it may work without a copy I think.. I even thought of leaving the queue away but I kind of liked it ;)

@sjeri
Copy link

sjeri commented Feb 21, 2021

The same applies btw when you are trying to reset the values... Suddenly you may end up working on the reset object (and thus not resetting to e.g. 0 anymore but to the already set values (speed/power stays at last value even though stopped):

>>> worker = [0,1,2,3]
>>> reset = []
>>> obj = worker
>>> obj
[0, 1, 2, 3]
>>> obj = reset
>>> obj
[]
>>> obj.append(66)
>>> obj
[66]
>>> reset
[66]

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

Oh many thanks for the reference part python I wasn't full aware of that !! Many thanks.
I still have the question for deep copy and copy. The copy function creates a new object of the list but the values inside are still referenced to the value of the previous list. I read that the deep copy makes a complet new cop of the list withou any reference. So Then I don't understand the difference between both :(

@sjeri
Copy link

sjeri commented Feb 21, 2021

Anytime. Happy to be of help ;)

I think in this case it doesn't matter.
Copy as you wrote just makes a copy of the top level and keeps references to objects in values. While deepcopy clones all. But in your case the WRValues is a flat dict. There are no deeper objects in values. And a new assignement with a different string is an assignement with a different object. If there would be a sub dict or list with below values, this would matter ;)

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

Oki, what I will do is to push my code with the update for the screen. Then I will check @b3nj1 pull request. And then I will rename and organize the folder as you have done on your fork @sjeri.

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

@b3nj1 can you give a feedback why you reverted the copy section of your code and also the install.sh update ?

@b3nj1
Copy link
Contributor Author

b3nj1 commented Feb 21, 2021

@inonoob , Let me know if I missed any of your questions:

I reverted the copy and install.sh, because I meant for this PR to only include the fixes to make the script run efficiently while waiting for the rower to be connected. The install.sh PR request should still be merged if you found it helpful.

While trying to understand what the script did, I noticed the dictionary copies, so I just started refactoring to fix those. I put that as lower priority. @sjeri's explanation covers why this is important. I'll go ahead an push that change in a new PR. BTW, where are @sjeri's changes which do the copy fixes? I couldn't find them in either PRs or the repository list.

This loop keeps the CPU busy because it actually executes code to check if True is True and branches back to check again over and over.
while True: pass

The new code uses threading's join() to wait for any thread to finish. join() is very lightweight. The reason I use the timeout on join() is so that I can check each thread in series. Otherwise, if I wait on thread[0].join(), but only thread[1] died, we wouldn't know about thread[1].

The recursion in find_port() was not necessary. Before, if find_port() failed, it would call itself (a function calling itself is recursive). Every time you call a function, the call is added to the call stack. When that call is returned, it gets popped. Think about the backtraces you've seen when code fails. The backtrace is showing what was on the function call stack (a called b which called c which called d...). That stack is a limited resource. In the case of find_port(), I found this because I had hours worth of calls and it exhausted python's stack.

Hope that helps!

  • Benjamin

@sjeri
Copy link

sjeri commented Feb 21, 2021

where are @sjeri's changes which do the copy fixes?

I did a big refactoring of the code for my usecase and in my case it is now running as a service + a bit different thread design (thread pausing etc to only have threads started if required). As it is not running as a CLI anymore but rather config based and detects users by HRM device ID, I did not want to put a pull request as it differs too much from the original code. I made the code available to @inonoob though to have access to my changes and the code for FIT file and HRM integration. I will also put my code for OLED display menu live once ready (almost done and quite flexible to change). I am just not a big fan of publishing my variant of the code openly as most of the code goes back to @inonoob and I did not want to build a concurrent project competing with his ;)
I may have a bit of time in 2-3 weeks and will check if a pull request for the original code is possible. Or maybe an alternative branch?

@inonoob
Copy link
Owner

inonoob commented Feb 21, 2021

@b3nj1 oh man I haven't seen this with the serial port search. You are completely correct. I build a fork bomb until the connection with the S4 has been established! Nice catch. And many thanks for pointing that out.
And thanks again for the tip with the while loop. As you wrote I looked a bit to read about it.
Thanks guy for the support, I learnt so much today.

I would like to do the following applie the changes from @b3nj1 first. I may add my code for the screen. Would call it v1.1.0. I also would like to finish the second ble Gatt server to have the SmartRow passthrough.

And then If it is Okey for you @sjeri take your code you try to integrate the changes from v.1.1.0 and we merge it as v2.0.0 as the changes are Hugh from you part.

We might have to think about having the HRM and also the FE ant as modules ?

@inonoob inonoob merged commit 925a6a9 into inonoob:master Feb 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants