-
Notifications
You must be signed in to change notification settings - Fork 181
Multicore merge problem tracker #1123
Comments
Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly. |
#1145 This happened after update |
Not yet. It would be helpful to know what you had to change though |
From @pmcstone
Reply from @ArcEye
|
From @pmcstone
Reply from @ArcEye
|
Above 2 entries to preserve items in forum posts |
Yes I am willing to donate it. Thanks for everything Arc! |
@ArcEye will you integrate the driver? |
Just done so at #1150 |
|
https://github.com/machinekit/machinekit-docs/blob/master/docs/hal/instcomp.asciidoc#instanceparams
Same goes for all instanceparams and argc/argv for that matter, they all now have |
Another design change since the multicore merge is that array |
It is not a change to arrays, it is the convenience macros used by comp and instcomp. The convenience macros You can still refer to What you can't do is
because there is no macro defining |
But that is what I need, so don't stop pointing out things like that. Over familiarity prevents me from looking at things the same way as others on some occasions 😄 |
I have two more problems:
The other problem is related to Haltalk. I have one |
I can verify the second problem on an isolated setup - the problem seems to be applicable for all |
I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore |
Can you attach a link to something I can test and I will look after lunch |
I can reproduce this one.
becomes
It is a pretty severe test, 600 loads per minute. |
The problem appears likely to be in here I have it running at present with the debug section reporting memory enabled |
@machinekoder Same results. |
Okay, I tested with Jessie and RT Preempt kernel on the BBB and still getting the same results too. However, I cannot reproduce it on my desktop machine. I will test with a second x86 machine and see if I can reproduce the problem on a device where it is easier to debug. |
I spent the whole afternoon debugging the problem and still can't find it. The problem is not reproducible on fast desktop machines and it looks like it is related to delayed responses from Haltalk on the BBB. I have no idea what has changed in the multicore-branch that either affects Haltalks interaction performance with HAL or the network latency in general. |
@ArcEye Maybe you can reproduce this: I did a new build with
|
The switch is Yes wholesale segfaults just doing
|
My forked repo is completely up to date and now produces the same error when it gets to make setuid The segfault in dmsg is coming from flavor, which again ties in with the problems users have had recently, with segfaults and selecting default of posix even if it doesn't exist
|
Now doing a process of elimination to try and find the point beyond which problems occur |
Found the error point This works fine
But including next commit errors
|
That was actually a false point, the main problem is a couple of commits further in with rtapi_compat.c I have reverted the changes and tidied the halcmd_main.c buffer size back to a local 200 figure (albeit this was not the problem) PR en route |
Help me with my GitHub understanding: It seems that #1144 should be included in the latest automatic builds, available at deb.machinekit.io. Yet trying 0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie doesn't have rapidrate defined as the pull should have done. |
The changes are in the repo I don't use python or machinetalk, hopefully @machinekoder can assist with why it is not being found. |
I can only imagine that there is something wrong with the automated build process. I agree the changes are clearly in the repo, but just as clearly comparing status_pb2.py from the deb.machinekit.io package with status_pb2.py from my own build shows the package is missing the rapidrate definition. My source tree is directly from the github. |
You will need to be precise as to which package you are using and attach whatever program you are using that produces this error. I have downloaded |
Thank you ArcEye you helped me find the error. It turns out that I have copies of status_pb2.py, one in /usr/lib and one in /usr/local/lib. One has rapidrate, the other does not, and obviously the one without must be earlier in the search path. I don't know how I ended up with two copies, but some early fumbling with the Vagrant install is certainly to blame. |
The source of the offending out of date prototype is from https://pypi.python.org/pypi/machinetalk-protobuf/1.0.6, most likely something I installed via pip from the command line in my vagrant box when I was learning. The pypi package appears to derive from https://github.com/machinekit/machinetalk-protobuf. I pulled an issue against machinekit/machinetalk-protobuf at machinekit/machinetalk-protobuf#76 |
I still have problems with |
It all still comes down to this problem: #1060 The armhf builds were truncated to get them in within the time limit for Travis, with the result that the This fell off the radar but the underlying problem was never resolved, because it would require proper armhf builds by a different means and probably completely dropping Wheezy, if something like @zultron s Docker build was adopted. |
The comp component build issue is hopefully solved by #1230 |
multicore was merged 18 months ago, closing |
I regularily hit the |
Can you move this to the relevant Issue tracker as per email on the list https://groups.google.com/forum/#!topic/machinekit/I70IfT-wan0 Issue tracker will be https://github.com/machinekit/machinekit-hal/issues Will also need to explain what exactly you are doing and why you think that particular commit causes it. There is a known problem with repeatedly polling I think it is may be due to the way memory is ordered on boundaries to enable atomic operations. |
I am at the issue tracker :) |
@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer. |
You are at CLOSED general Issue tracker that mentions the problem amoungst many others. The problem is it will remain closed, so to air this issue you need to open a new one.
There is nothing programmatic about using a bash script to repeatedly call halcmd and then try to parse the output. I was referring to finding the A comment that github decided to hide for some reason, showed that using
The issue was never 'solved', it just appeared an extreme use of halcmd which appeared unlikely to be encountered. I will move this into a separate issue, in the new repo and see if I can find time write a user component Will have to look at how this is used though, I imagine the call to M109 is blocking and only returns when the bed is up to temp, thus pausing the GCode. |
The issue will occur on any computer, doubtless it appears a lot quicker on a BBB, with its limited resources and processing power. |
Transferred to machinekit/machinekit-hal#142 Please do not use this Issue any further |
This is the issue tracker to which any problems related to the merge of multicore code into the main repo, should be reported.
The text was updated successfully, but these errors were encountered: