Skip to content
This repository has been archived by the owner on Mar 6, 2020. It is now read-only.

Multicore merge problem tracker #1123

Closed
ArcEye opened this issue Feb 6, 2017 · 90 comments
Closed

Multicore merge problem tracker #1123

ArcEye opened this issue Feb 6, 2017 · 90 comments

Comments

@ArcEye
Copy link

ArcEye commented Feb 6, 2017

This is the issue tracker to which any problems related to the merge of multicore code into the main repo, should be reported.

@machinekoder
Copy link
Member

#1131

@machinekoder
Copy link
Member

#1137

@machinekoder
Copy link
Member

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

@pmcstone
Copy link

pmcstone commented Feb 22, 2017

#1145 This happened after update

@ArcEye
Copy link
Author

ArcEye commented Feb 23, 2017

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

Not yet. It would be helpful to know what you had to change though

@ArcEye
Copy link
Author

ArcEye commented Feb 24, 2017

From @pmcstone

This issue has been resolved by manually installing icomps. However now I ran into another issue with a custom driver/protocol for my IO hardware (communicates via USB to RS485) Please see attached files and error messages. Any help would be greatly appreciated since I am merely just a power user. Thanks

starting mklauncher... done
starting configserver... done
starting ./python/pmcsfile_service.py... done
starting machinekit... MACHINEKIT - 0.1
Machine configuration directory is '/home/pmcs/Downloads/pmcs-rt'
Machine configuration file is 'v6.ini'
Starting Machinekit...
io started
halcmd loadusr io started
done
hal/v6.hal:14: insmod failed, returned -1:
do_load_cmd: dlopen: /usr/lib/linuxcnc/rt-preempt/hal_p260c.so: undefined symbol: hal_exit
rpath=/usr/lib/linuxcnc/rt-preempt
See /var/log/linuxcnc.log for more information.
Shutting down and cleaning up Machinekit...
Traceback (most recent call last):
File "/home/pmcs/bin/estop.py", line 16, in
Traceback (most recent call last):
File "/home/pmcs/bin/mtc.py", line 15, in
time.sleep(2.00)
KeyboardInterrupttime.sleep(2.00)

KeyboardInterrupt
Cleanup done
Machinekit terminated with an error. You can find more information in the log:
/home/pmcs/linuxcnc_debug.txt
and
/home/pmcs/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal
stopping mklauncher... done
stopping configserver... done
stopping ./python/pmcsfile_service.py... done

Reply from @ArcEye

It indicates incorrect linkage in the build of the component.
Without the component code and knowing how it was built, unable to guess further

If hal_exit() did not exist, machinekit would not run, there are about 1230 binaries and libs linked against it.

Running nm -C hal_p260c | grep " U " from the dir it is in, will list all the symbols which are undefined. (U)
I would suspect a great deal more than just hal_exit()

hal_exit() is an inline accessor to halg_exit() contained in https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal.h#L379
So you may see references to halg_exit

@ArcEye
Copy link
Author

ArcEye commented Feb 24, 2017

From @pmcstone

Well I think I might of broke something very bad.......running the command nm -C hal_p260c | grep " U " gave this:

             U cfsetispeed@@GLIBC_2.2.5
             U cfsetospeed@@GLIBC_2.2.5
             U close@@GLIBC_2.2.5
             U hal_exit
             U hal_export_funct
             U hal_malloc
             U hal_param_bit_newf
             U hal_param_s32_newf
             U hal_pin_bit_newf
             U hal_pin_s32_newf
             U hal_ready
             U hal_xinit
             U ioctl@@GLIBC_2.2.5
             U memset@@GLIBC_2.2.5
             U open@@GLIBC_2.2.5
             U read@@GLIBC_2.2.5
             U rtapi_print_msg
             U rtapi_snprintf
             U rtapi_switch
             U strtok@@GLIBC_2.2.5
             U strtol@@GLIBC_2.2.5
             U tcdrain@@GLIBC_2.2.5
             U tcflush@@GLIBC_2.2.5
             U tcgetattr@@GLIBC_2.2.5
             U tcsetattr@@GLIBC_2.2.5
             U write@@GLIBC_2.2.5

Which seems like everything is undefined

Reply from @ArcEye

If you would like to 'donate' the driver, I can add it to the repo and it will get built properly, automatically
at any rebuild,

Just tested and

root@INTEL-i7:/usr/src/machinekit# DEBUG=5 realtime restart
root@INTEL-i7:/usr/src/machinekit# halcmd loadrt hal_p260c
:0: Realtime module 'hal_p260c' loaded
root@INTEL-i7:/usr/src/machinekit# halcmd show pin
Component Pins:
Comp Inst Type Dir Value Name Epsilon Flags linked to:
78 bit OUT FALSE hal_p260c.0.pin-01-in --l-
78 bit IN FALSE hal_p260c.0.pin-01-out --l-
78 bit OUT FALSE hal_p260c.0.pin-02-in --l-
78 bit IN FALSE hal_p260c.0.pin-02-out --l-
78 bit OUT FALSE hal_p260c.0.pin-03-in --l-
78 bit IN FALSE hal_p260c.0.pin-03-out --l-
78 bit OUT FALSE hal_p260c.0.pin-04-in --l-
78 bit IN FALSE hal_p260c.0.pin-04-out --l-
78 bit OUT FALSE hal_p260c.0.pin-05-in --l-
78 bit IN FALSE hal_p260c.0.pin-05-out --l-
78 bit OUT FALSE hal_p260c.0.pin-06-in --l-
78 bit IN FALSE hal_p260c.0.pin-06-out --l-
78 bit OUT FALSE hal_p260c.0.pin-07-in --l-
78 bit IN FALSE hal_p260c.0.pin-07-out --l-
78 bit OUT FALSE hal_p260c.0.pin-08-in --l-
78 bit IN FALSE hal_p260c.0.pin-08-out --l-
78 bit OUT FALSE hal_p260c.0.pin-09-in --l-
78 bit IN FALSE hal_p260c.0.pin-09-out --l-
78 bit OUT FALSE hal_p260c.0.pin-10-in --l-
78 bit IN FALSE hal_p260c.0.pin-10-out --l-
78 bit OUT FALSE hal_p260c.0.pin-11-in --l-
78 bit IN FALSE hal_p260c.0.pin-11-out --l-
78 bit OUT FALSE hal_p260c.0.pin-12-in --l-
78 bit IN FALSE hal_p260c.0.pin-12-out --l-
78 bit OUT FALSE hal_p260c.0.pin-13-in --l-
78 bit IN FALSE hal_p260c.0.pin-13-out --l-
78 bit OUT FALSE hal_p260c.0.pin-14-in --l-
78 bit IN FALSE hal_p260c.0.pin-14-out --l-
78 bit OUT FALSE hal_p260c.0.pin-15-in --l-
78 bit IN FALSE hal_p260c.0.pin-15-out --l-
78 bit OUT FALSE hal_p260c.0.pin-16-in --l-
78 bit IN FALSE hal_p260c.0.pin-16-out --l-
78 s32 IN 0 hal_p260c.0.rx_cnt_error --l-
78 bit OUT FALSE hal_p260c.0.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.0.rx_perm_error --l-
78 s32 OUT 0 hal_p260c.refresh.time ----
78 s32 I/O 0 hal_p260c.refresh.tmax ----
78 bit OUT FALSE hal_p260c.refresh.tmax-inc ----
78 bit OUT FALSE hal_p260c.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.rx_perm_error --l-
78 bit IN FALSE hal_p260c.rx_reset_error --l-
78 s32 IN 0 hal_p260c.sys_max_read --l-
78 s32 IN 0 hal_p260c.sys_max_write --l-
78 s32 IN 0 hal_p260c.sys_writecnt --l-

I suspect you may have been trying to use the old module and not re-built to account
for linkage relocations, this will solve it in future.

@ArcEye
Copy link
Author

ArcEye commented Feb 24, 2017

Above 2 entries to preserve items in forum posts

@pmcstone
Copy link

Yes I am willing to donate it. Thanks for everything Arc!

@machinekoder
Copy link
Member

@ArcEye will you integrate the driver?

@ArcEye
Copy link
Author

ArcEye commented Feb 24, 2017

Just done so at #1150

@machinekoder
Copy link
Member

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

https://github.com/machinekit/machinekit-docs/blob/master/docs/hal/instcomp.asciidoc#instanceparams

pincount does not work inside the function, because instcomp sets it to -1 at instantiation, so that any value passed to one instance is not then passed to any subsequent instances that don't specify pincount.

Same goes for all instanceparams and argc/argv for that matter, they all now have local_xxxx copies which can be used safely

@machinekoder
Copy link
Member

Another design change since the multicore merge is that array variable types have changed as follows:
Previously one could access a variable hal_bit_t sample[TRIGGER] in the component: sample[0], now one can use sample(0). No problem, but a design change that should be noted down.

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

It is not a change to arrays, it is the convenience macros used by comp and instcomp.
The variables in question are not local ones within the function, but instance ones contained in the
*ip instance structure.

The convenience macros #define pins to a dereferenced pointer of the same name and the struct address of variables, so that users can just refer to the name.
Square brackets are changed to parenthesis brackets, for operations involving these macros
https://github.com/machinekit/machinekit/blob/master/src/hal/utils/instcomp.g#L1001

You can still refer to ip->local_variable[x] or use local_variable(x).

What you can't do is

int int_array[3] = {1,2,3};
int num = int_array(0);

because there is no macro defining int_array(x) and the compiler will expect a function

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

But that is what I need, so don't stop pointing out things like that.

Over familiarity prevents me from looking at things the same way as others on some occasions 😄

@machinekoder
Copy link
Member

I have two more problems:
When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:

halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

The other problem is related to Haltalk. I have one U32 out of HAL remote component pin that never is updated in HAL. I still have to figure out whats happening here.

@machinekoder
Copy link
Member

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

@machinekoder
Copy link
Member

I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore
so users can check out this tag in case there are problems.

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

Can you attach a link to something I can test and I will look after lunch

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:
halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

I can reproduce this one.
It appears to be a memory leak from launching halcmd 10x every second.
Something is not getting freed and eventually it runs out of memory

Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user halg_xinitfv:271 HAL: singleton component 'hal_lib16104' id=1014 initialized
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user --halcmd show pin db.funct.time
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1016 'halcmd16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=1014
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1014 'hal_lib16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:316 HAL: _halerrno=0

becomes

Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:58 HAL: extending arena by 512 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:61 HAL error: can't extend arena - below minfree: 944
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user shmalloc_desc:85 HAL error: giving up - can't allocate 96 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_create_objectfv:155 HAL error: insufficient memory for COMPONENT hal_lib13547 size=96
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=13693
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_exit:289 HAL error: no such component with id 13693

It is a pretty severe test, 600 loads per minute.
Not something that would have shown up under normal use.

@ArcEye
Copy link
Author

ArcEye commented Feb 28, 2017

The problem appears likely to be in here
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L279 in halg_exit()

I have it running at present with the debug section reporting memory enabled
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L317
to see what happens

@einstine909
Copy link

@machinekoder Same results.

@machinekoder
Copy link
Member

Okay, I tested with Jessie and RT Preempt kernel on the BBB and still getting the same results too. However, I cannot reproduce it on my desktop machine. I will test with a second x86 machine and see if I can reproduce the problem on a device where it is easier to debug.

@machinekoder
Copy link
Member

I spent the whole afternoon debugging the problem and still can't find it. The problem is not reproducible on fast desktop machines and it looks like it is related to delayed responses from Haltalk on the BBB. I have no idea what has changed in the multicore-branch that either affects Haltalks interaction performance with HAL or the network latency in general.

@machinekoder
Copy link
Member

@ArcEye Maybe you can reproduce this: I did a new build with ./configure --with-examples and now I'm getting segfaults all over the place. The backtrace looks something along the lines:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
81          return (__atomic_fetch_or(bitmap + RTAPI_BIT_WORD(nr),
(gdb) bt
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
#1  0x00007f99a459a9a2 in rtapi_mutex_get (mutex=0x8) at rtapi/rtapi.h:553
#2  0x00007f99a459bb16 in halg_ready (use_hal_mutex=1, comp_id=38) at hal/lib/hal_comp.c:351
#3  0x0000000000403cf0 in hal_ready (comp_id=<optimized out>) at hal/lib/hal.h:413
#4  hal_setup (self=0x7ffd94777a70) at machinetalk/haltalk/haltalk_main.cc:248
#5  main (argc=1, argv=0x7ffd94778538) at machinetalk/haltalk/haltalk_main.cc:456

@ArcEye
Copy link
Author

ArcEye commented Mar 20, 2017

I did a new build with ./configure --with-examples

The switch is --enable-examples, but I don't know if it actually does anything.
Just downloaded and building now.

Yes wholesale segfaults just doing make setuid, something badly wrong, which probably is why 2 users are getting strange faults with recent builds and the website builds were failing even before the protobuf issue with a corrupt git object

fatal: loose object ee30f70e7ace4c41a3d901a014acfe09d80ba58c (stored in .git/objects/ee/30f70e7ace4c41a3d901a014acfe09d80ba58c) is corrupt

@ArcEye
Copy link
Author

ArcEye commented Mar 20, 2017

@machinekoder

My forked repo is completely up to date and now produces the same error when it gets to make setuid

The segfault in dmsg is coming from flavor, which again ties in with the problems users have had recently, with segfaults and selecting default of posix even if it doesn't exist

[ 2047.893665] flavor[17698]: segfault at 0 ip 00002adc3f54ec3a sp 00007fffab59cec8 error 4 in libc-2.19.so[2adc3f4cd000+1a1000]
[ 2047.901586] flavor[17707]: segfault at 0 ip 00002ae142da6c3a sp 00007fffc3f78798 error 4 in libc-2.19.so[2ae142d25000+1a1000]
[ 2047.955207] flavor[17718]: segfault at 0 ip 00002ab5f1bc3c3a sp 00007fff83984918 error 4 in libc-2.19.so[2ab5f1b42000+1a1000]
[ 2047.964343] flavor[17727]: segfault at 0 ip 00002b0ca275dc3a sp 00007ffd04726658 error 4 in libc-2.19.so[2b0ca26dc000+1a1000]
[ 2052.895429] flavor[17901]: segfault at 0 ip 00002af88c685c3a sp 00007fff78aeeeb8 error 4 in libc-2.19.so[2af88c604000+1a1000]
[ 2052.898255] flavor[17905]: segfault at 0 ip 00002b3493e8cc3a sp 00007ffca32b4328 error 4 in libc-2.19.so[2b3493e0b000+1a1000]
[ 2096.600142] flavor[17833]: segfault at 0 ip 00007f1d205ebc3a sp 00007ffcf1b7c7c8 error 4 in libc-2.19.so[7f1d2056a000+1a1000]
[ 2096.604187] flavor[17840]: segfault at 0 ip 00007fadc0f0fc3a sp 00007ffca3edef18 error 4 in libc-2.19.so[7fadc0e8e000+1a1000]
[ 2096.608005] flavor[17847]: segfault at 0 ip 00007f6accadfc3a sp 00007ffd4c7dd9b8 error 4 in libc-2.19.so[7f6acca5e000+1a1000]
[ 2096.611567] flavor[17854]: segfault at 0 ip 00007f72be22ac3a sp 00007ffd12ecb648 error 4 in libc-2.19.so[7f72be1a9000+1a1000]
[ 2096.615511] flavor[17861]: segfault at 0 ip 00007fb3365f7c3a sp 00007ffe95992e38 error 4 in libc-2.19.so[7fb336576000+1a1000]
[ 2096.620116] flavor[17868]: segfault at 0 ip 00007f3043ffbc3a sp 00007ffc43b9e058 error 4 in libc-2.19.so[7f3043f7a000+1a1000]
[ 2096.623251] flavor[17872]: segfault at 0 ip 00007f6862a98c3a sp 00007ffcdc7e6b88 error 4 in libc-2.19.so[7f6862a17000+1a1000]
[ 2096.625898] flavor[17876]: segfault at 0 ip 00007f70f1071c3a sp 00007ffcb533a178 error 4 in libc-2.19.so[7f70f0ff0000+1a1000]
[ 2096.628564] flavor[17880]: segfault at 0 ip 00007f37e18c3c3a sp 00007fffbf400058 error 4 in libc-2.19.so[7f37e1842000+1a1000]
[ 2096.638829] flavor[17886]: segfault at 0 ip 00007fc7d1841c3a sp 00007fffe29cd2e8 error 4 in libc-2.19.so[7fc7d17c0000+1a1000]
[ 2118.277886] show_signal_msg: 30 callbacks suppressed

@ArcEye
Copy link
Author

ArcEye commented Mar 20, 2017

git reset --hard to commit of around 9th March and builds and runs fine.
That was when I last updated my working copy locally.

Now doing a process of elimination to try and find the point beyond which problems occur

@ArcEye ArcEye closed this as completed Mar 20, 2017
@ArcEye ArcEye reopened this Mar 20, 2017
@ArcEye
Copy link
Author

ArcEye commented Mar 20, 2017

Found the error point

This works fine

commit b7f7d8a468fd2a38c2b446fc6a07a1a6c2261cf1
Author: Mick <arceye@mgware.co.uk>
Date:   Sat Mar 18 14:20:28 2017 +0000

    Remove stray readme.md from machinekit-multicore testing phase
    
    Signed-off-by: Mick <arceye@mgware.co.uk>

But including next commit errors

commit 51709ebf9924b8635975dd20b8e79105e3e05748
Author: Alexander Rössler <mail@roessler.systems>
Date:   Sun Mar 19 11:27:43 2017 +0100

    halcmd_main: replace magic numbers

@ArcEye
Copy link
Author

ArcEye commented Mar 20, 2017

Found the error point

That was actually a false point, the main problem is a couple of commits further in with rtapi_compat.c

I have reverted the changes and tidied the halcmd_main.c buffer size back to a local 200 figure (albeit this was not the problem)

PR en route

@bschousek
Copy link

Help me with my GitHub understanding: It seems that #1144 should be included in the latest automatic builds, available at deb.machinekit.io. Yet trying 0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie doesn't have rapidrate defined as the pull should have done.

@ArcEye
Copy link
Author

ArcEye commented May 23, 2017

The changes are in the repo
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/status.proto#L401
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/types.proto#L594
A package build just builds what is in the repo, so those definitions should be there.

I don't use python or machinetalk, hopefully @machinekoder can assist with why it is not being found.

@bschousek
Copy link

I can only imagine that there is something wrong with the automated build process. I agree the changes are clearly in the repo, but just as clearly comparing status_pb2.py from the deb.machinekit.io package with status_pb2.py from my own build shows the package is missing the rapidrate definition. My source tree is directly from the github.

@ArcEye
Copy link
Author

ArcEye commented May 24, 2017

You will need to be precise as to which package you are using and attach whatever program you are using that produces this error.

I have downloaded machinekit_0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie_amd64.deb
and opened it and it does contain definitions of machinetalk.EmcStatusMotion.rapidrate and status_pb2.py is byte identical to the file produced in my RIP build.

@bschousek
Copy link

Thank you ArcEye you helped me find the error. It turns out that I have copies of status_pb2.py, one in /usr/lib and one in /usr/local/lib. One has rapidrate, the other does not, and obviously the one without must be earlier in the search path. I don't know how I ended up with two copies, but some early fumbling with the Vagrant install is certainly to blame.

@bschousek
Copy link

bschousek commented May 24, 2017

The source of the offending out of date prototype is from https://pypi.python.org/pypi/machinetalk-protobuf/1.0.6, most likely something I installed via pip from the command line in my vagrant box when I was learning. The pypi package appears to derive from https://github.com/machinekit/machinetalk-protobuf.

I pulled an issue against machinekit/machinetalk-protobuf at machinekit/machinetalk-protobuf#76

@machinekoder
Copy link
Member

I still have problems with comp and instcomp for package install even if gcc-4.7 is enabled per default. The problems disappear when using RIP install on the same machine.

@ArcEye
Copy link
Author

ArcEye commented Jun 19, 2017

It all still comes down to this problem: #1060
and related issues regards wrong flavor and the makefile.inc not being set to build for other than posix.

The armhf builds were truncated to get them in within the time limit for Travis, with the result that the
packages don't build components properly.

This fell off the radar but the underlying problem was never resolved, because it would require proper armhf builds by a different means and probably completely dropping Wheezy, if something like @zultron s Docker build was adopted.

@ArcEye
Copy link
Author

ArcEye commented Jul 3, 2017

The comp component build issue is hopefully solved by #1230

@ArcEye
Copy link
Author

ArcEye commented Jul 31, 2018

multicore was merged 18 months ago, closing

@ArcEye ArcEye closed this as completed Jul 31, 2018
@l29ah
Copy link

l29ah commented Aug 10, 2018

I regularily hit the halcmd show pin foo hanging problem when i poll temperature on my 3d printer on am3358; 3f1e265 here.

@ArcEye
Copy link
Author

ArcEye commented Aug 10, 2018

Can you move this to the relevant Issue tracker as per email on the list https://groups.google.com/forum/#!topic/machinekit/I70IfT-wan0

Issue tracker will be https://github.com/machinekit/machinekit-hal/issues

Will also need to explain what exactly you are doing and why you think that particular commit causes it.

There is a known problem with repeatedly polling halcmd pin <foo> to get an output instead of doing it in a programmatic way.
#1123 (comment)

I think it is may be due to the way memory is ordered on boundaries to enable atomic operations.
This would result in orphaned memory and if an operation is repeated a huge number of times, you run out of hal memory.

@l29ah
Copy link

l29ah commented Aug 10, 2018

I am at the issue tracker :)
I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way. Sometimes it will never return, and poking halcmd suggesting the problem outlined in #1123 (comment)
As far as i understood, this issue is considered fixed, so i'm writing to say it is not.
The commit hash is the history point where i observe the behaviour; i don't tell that the commit is the case.

@luminize
Copy link

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

@ArcEye
Copy link
Author

ArcEye commented Aug 10, 2018

You are at CLOSED general Issue tracker that mentions the problem amoungst many others.

The problem is it will remain closed, so to air this issue you need to open a new one.

I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way.

There is nothing programmatic about using a bash script to repeatedly call halcmd and then try to parse the output.

I was referring to finding the hal_pin_t struct for the pin name in question and reading its _data_ptr_addr for the value.
This doesn't have the side effects of repeatedly loading the whole halcmd component

A comment that github decided to hide for some reason, showed that using halpr_find_pin_by_name() , which does what I described above, ran a huge number of times without issue (10,000 times or equivalent to running M109 for 2.777 hours continuously)

Just run a program testmem via halcmd loadusr -W testmem 10000
that does 10,000 iterations of halpr_find_pin_by_name(), gets the value and prints the result.
Ran to the end without any issues.

The issue was never 'solved', it just appeared an extreme use of halcmd which appeared unlikely to be encountered.
I was not aware there was a M code script which did exactly the same thing, but then I am not into plastic squirting 😜

I will move this into a separate issue, in the new repo and see if I can find time write a user component
which takes a pin name and value and outputs TRUE when the value is met or exceeded, or similar.

Will have to look at how this is used though, I imagine the call to M109 is blocking and only returns when the bed is up to temp, thus pausing the GCode.

@ArcEye
Copy link
Author

ArcEye commented Aug 10, 2018

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

The issue will occur on any computer, doubtless it appears a lot quicker on a BBB, with its limited resources and processing power.

@ArcEye
Copy link
Author

ArcEye commented Aug 10, 2018

Transferred to machinekit/machinekit-hal#142

Please do not use this Issue any further

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants