New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore merge problem tracker #1123

Closed
ArcEye opened this Issue Feb 6, 2017 · 90 comments

Comments

Projects
None yet
8 participants
@ArcEye

ArcEye commented Feb 6, 2017

This is the issue tracker to which any problems related to the merge of multicore code into the main repo, should be reported.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder
Member

machinekoder commented Feb 12, 2017

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder
Member

machinekoder commented Feb 15, 2017

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 22, 2017

Member

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

Member

machinekoder commented Feb 22, 2017

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

@pmcstone

This comment has been minimized.

Show comment
Hide comment
@pmcstone

pmcstone Feb 22, 2017

#1145 This happened after update

pmcstone commented Feb 22, 2017

#1145 This happened after update

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 23, 2017

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

Not yet. It would be helpful to know what you had to change though

ArcEye commented Feb 23, 2017

Is there any writeup of the changes? I had to modify several HAL configs and components to get the multicore branch working properly.

Not yet. It would be helpful to know what you had to change though

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 24, 2017

From @pmcstone

This issue has been resolved by manually installing icomps. However now I ran into another issue with a custom driver/protocol for my IO hardware (communicates via USB to RS485) Please see attached files and error messages. Any help would be greatly appreciated since I am merely just a power user. Thanks

starting mklauncher... done
starting configserver... done
starting ./python/pmcsfile_service.py... done
starting machinekit... MACHINEKIT - 0.1
Machine configuration directory is '/home/pmcs/Downloads/pmcs-rt'
Machine configuration file is 'v6.ini'
Starting Machinekit...
io started
halcmd loadusr io started
done
hal/v6.hal:14: insmod failed, returned -1:
do_load_cmd: dlopen: /usr/lib/linuxcnc/rt-preempt/hal_p260c.so: undefined symbol: hal_exit
rpath=/usr/lib/linuxcnc/rt-preempt
See /var/log/linuxcnc.log for more information.
Shutting down and cleaning up Machinekit...
Traceback (most recent call last):
File "/home/pmcs/bin/estop.py", line 16, in
Traceback (most recent call last):
File "/home/pmcs/bin/mtc.py", line 15, in
time.sleep(2.00)
KeyboardInterrupttime.sleep(2.00)

KeyboardInterrupt
Cleanup done
Machinekit terminated with an error. You can find more information in the log:
/home/pmcs/linuxcnc_debug.txt
and
/home/pmcs/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal
stopping mklauncher... done
stopping configserver... done
stopping ./python/pmcsfile_service.py... done

Reply from @ArcEye

It indicates incorrect linkage in the build of the component.
Without the component code and knowing how it was built, unable to guess further

If hal_exit() did not exist, machinekit would not run, there are about 1230 binaries and libs linked against it.

Running nm -C hal_p260c | grep " U " from the dir it is in, will list all the symbols which are undefined. (U)
I would suspect a great deal more than just hal_exit()

hal_exit() is an inline accessor to halg_exit() contained in https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal.h#L379
So you may see references to halg_exit

ArcEye commented Feb 24, 2017

From @pmcstone

This issue has been resolved by manually installing icomps. However now I ran into another issue with a custom driver/protocol for my IO hardware (communicates via USB to RS485) Please see attached files and error messages. Any help would be greatly appreciated since I am merely just a power user. Thanks

starting mklauncher... done
starting configserver... done
starting ./python/pmcsfile_service.py... done
starting machinekit... MACHINEKIT - 0.1
Machine configuration directory is '/home/pmcs/Downloads/pmcs-rt'
Machine configuration file is 'v6.ini'
Starting Machinekit...
io started
halcmd loadusr io started
done
hal/v6.hal:14: insmod failed, returned -1:
do_load_cmd: dlopen: /usr/lib/linuxcnc/rt-preempt/hal_p260c.so: undefined symbol: hal_exit
rpath=/usr/lib/linuxcnc/rt-preempt
See /var/log/linuxcnc.log for more information.
Shutting down and cleaning up Machinekit...
Traceback (most recent call last):
File "/home/pmcs/bin/estop.py", line 16, in
Traceback (most recent call last):
File "/home/pmcs/bin/mtc.py", line 15, in
time.sleep(2.00)
KeyboardInterrupttime.sleep(2.00)

KeyboardInterrupt
Cleanup done
Machinekit terminated with an error. You can find more information in the log:
/home/pmcs/linuxcnc_debug.txt
and
/home/pmcs/linuxcnc_print.txt
as well as in the output of the shell command 'dmesg' and in the terminal
stopping mklauncher... done
stopping configserver... done
stopping ./python/pmcsfile_service.py... done

Reply from @ArcEye

It indicates incorrect linkage in the build of the component.
Without the component code and knowing how it was built, unable to guess further

If hal_exit() did not exist, machinekit would not run, there are about 1230 binaries and libs linked against it.

Running nm -C hal_p260c | grep " U " from the dir it is in, will list all the symbols which are undefined. (U)
I would suspect a great deal more than just hal_exit()

hal_exit() is an inline accessor to halg_exit() contained in https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal.h#L379
So you may see references to halg_exit

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 24, 2017

From @pmcstone

Well I think I might of broke something very bad.......running the command nm -C hal_p260c | grep " U " gave this:

             U cfsetispeed@@GLIBC_2.2.5
             U cfsetospeed@@GLIBC_2.2.5
             U close@@GLIBC_2.2.5
             U hal_exit
             U hal_export_funct
             U hal_malloc
             U hal_param_bit_newf
             U hal_param_s32_newf
             U hal_pin_bit_newf
             U hal_pin_s32_newf
             U hal_ready
             U hal_xinit
             U ioctl@@GLIBC_2.2.5
             U memset@@GLIBC_2.2.5
             U open@@GLIBC_2.2.5
             U read@@GLIBC_2.2.5
             U rtapi_print_msg
             U rtapi_snprintf
             U rtapi_switch
             U strtok@@GLIBC_2.2.5
             U strtol@@GLIBC_2.2.5
             U tcdrain@@GLIBC_2.2.5
             U tcflush@@GLIBC_2.2.5
             U tcgetattr@@GLIBC_2.2.5
             U tcsetattr@@GLIBC_2.2.5
             U write@@GLIBC_2.2.5

Which seems like everything is undefined

Reply from @ArcEye

If you would like to 'donate' the driver, I can add it to the repo and it will get built properly, automatically
at any rebuild,

Just tested and

root@INTEL-i7:/usr/src/machinekit# DEBUG=5 realtime restart
root@INTEL-i7:/usr/src/machinekit# halcmd loadrt hal_p260c
:0: Realtime module 'hal_p260c' loaded
root@INTEL-i7:/usr/src/machinekit# halcmd show pin
Component Pins:
Comp Inst Type Dir Value Name Epsilon Flags linked to:
78 bit OUT FALSE hal_p260c.0.pin-01-in --l-
78 bit IN FALSE hal_p260c.0.pin-01-out --l-
78 bit OUT FALSE hal_p260c.0.pin-02-in --l-
78 bit IN FALSE hal_p260c.0.pin-02-out --l-
78 bit OUT FALSE hal_p260c.0.pin-03-in --l-
78 bit IN FALSE hal_p260c.0.pin-03-out --l-
78 bit OUT FALSE hal_p260c.0.pin-04-in --l-
78 bit IN FALSE hal_p260c.0.pin-04-out --l-
78 bit OUT FALSE hal_p260c.0.pin-05-in --l-
78 bit IN FALSE hal_p260c.0.pin-05-out --l-
78 bit OUT FALSE hal_p260c.0.pin-06-in --l-
78 bit IN FALSE hal_p260c.0.pin-06-out --l-
78 bit OUT FALSE hal_p260c.0.pin-07-in --l-
78 bit IN FALSE hal_p260c.0.pin-07-out --l-
78 bit OUT FALSE hal_p260c.0.pin-08-in --l-
78 bit IN FALSE hal_p260c.0.pin-08-out --l-
78 bit OUT FALSE hal_p260c.0.pin-09-in --l-
78 bit IN FALSE hal_p260c.0.pin-09-out --l-
78 bit OUT FALSE hal_p260c.0.pin-10-in --l-
78 bit IN FALSE hal_p260c.0.pin-10-out --l-
78 bit OUT FALSE hal_p260c.0.pin-11-in --l-
78 bit IN FALSE hal_p260c.0.pin-11-out --l-
78 bit OUT FALSE hal_p260c.0.pin-12-in --l-
78 bit IN FALSE hal_p260c.0.pin-12-out --l-
78 bit OUT FALSE hal_p260c.0.pin-13-in --l-
78 bit IN FALSE hal_p260c.0.pin-13-out --l-
78 bit OUT FALSE hal_p260c.0.pin-14-in --l-
78 bit IN FALSE hal_p260c.0.pin-14-out --l-
78 bit OUT FALSE hal_p260c.0.pin-15-in --l-
78 bit IN FALSE hal_p260c.0.pin-15-out --l-
78 bit OUT FALSE hal_p260c.0.pin-16-in --l-
78 bit IN FALSE hal_p260c.0.pin-16-out --l-
78 s32 IN 0 hal_p260c.0.rx_cnt_error --l-
78 bit OUT FALSE hal_p260c.0.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.0.rx_perm_error --l-
78 s32 OUT 0 hal_p260c.refresh.time ----
78 s32 I/O 0 hal_p260c.refresh.tmax ----
78 bit OUT FALSE hal_p260c.refresh.tmax-inc ----
78 bit OUT FALSE hal_p260c.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.rx_perm_error --l-
78 bit IN FALSE hal_p260c.rx_reset_error --l-
78 s32 IN 0 hal_p260c.sys_max_read --l-
78 s32 IN 0 hal_p260c.sys_max_write --l-
78 s32 IN 0 hal_p260c.sys_writecnt --l-

I suspect you may have been trying to use the old module and not re-built to account
for linkage relocations, this will solve it in future.

ArcEye commented Feb 24, 2017

From @pmcstone

Well I think I might of broke something very bad.......running the command nm -C hal_p260c | grep " U " gave this:

             U cfsetispeed@@GLIBC_2.2.5
             U cfsetospeed@@GLIBC_2.2.5
             U close@@GLIBC_2.2.5
             U hal_exit
             U hal_export_funct
             U hal_malloc
             U hal_param_bit_newf
             U hal_param_s32_newf
             U hal_pin_bit_newf
             U hal_pin_s32_newf
             U hal_ready
             U hal_xinit
             U ioctl@@GLIBC_2.2.5
             U memset@@GLIBC_2.2.5
             U open@@GLIBC_2.2.5
             U read@@GLIBC_2.2.5
             U rtapi_print_msg
             U rtapi_snprintf
             U rtapi_switch
             U strtok@@GLIBC_2.2.5
             U strtol@@GLIBC_2.2.5
             U tcdrain@@GLIBC_2.2.5
             U tcflush@@GLIBC_2.2.5
             U tcgetattr@@GLIBC_2.2.5
             U tcsetattr@@GLIBC_2.2.5
             U write@@GLIBC_2.2.5

Which seems like everything is undefined

Reply from @ArcEye

If you would like to 'donate' the driver, I can add it to the repo and it will get built properly, automatically
at any rebuild,

Just tested and

root@INTEL-i7:/usr/src/machinekit# DEBUG=5 realtime restart
root@INTEL-i7:/usr/src/machinekit# halcmd loadrt hal_p260c
:0: Realtime module 'hal_p260c' loaded
root@INTEL-i7:/usr/src/machinekit# halcmd show pin
Component Pins:
Comp Inst Type Dir Value Name Epsilon Flags linked to:
78 bit OUT FALSE hal_p260c.0.pin-01-in --l-
78 bit IN FALSE hal_p260c.0.pin-01-out --l-
78 bit OUT FALSE hal_p260c.0.pin-02-in --l-
78 bit IN FALSE hal_p260c.0.pin-02-out --l-
78 bit OUT FALSE hal_p260c.0.pin-03-in --l-
78 bit IN FALSE hal_p260c.0.pin-03-out --l-
78 bit OUT FALSE hal_p260c.0.pin-04-in --l-
78 bit IN FALSE hal_p260c.0.pin-04-out --l-
78 bit OUT FALSE hal_p260c.0.pin-05-in --l-
78 bit IN FALSE hal_p260c.0.pin-05-out --l-
78 bit OUT FALSE hal_p260c.0.pin-06-in --l-
78 bit IN FALSE hal_p260c.0.pin-06-out --l-
78 bit OUT FALSE hal_p260c.0.pin-07-in --l-
78 bit IN FALSE hal_p260c.0.pin-07-out --l-
78 bit OUT FALSE hal_p260c.0.pin-08-in --l-
78 bit IN FALSE hal_p260c.0.pin-08-out --l-
78 bit OUT FALSE hal_p260c.0.pin-09-in --l-
78 bit IN FALSE hal_p260c.0.pin-09-out --l-
78 bit OUT FALSE hal_p260c.0.pin-10-in --l-
78 bit IN FALSE hal_p260c.0.pin-10-out --l-
78 bit OUT FALSE hal_p260c.0.pin-11-in --l-
78 bit IN FALSE hal_p260c.0.pin-11-out --l-
78 bit OUT FALSE hal_p260c.0.pin-12-in --l-
78 bit IN FALSE hal_p260c.0.pin-12-out --l-
78 bit OUT FALSE hal_p260c.0.pin-13-in --l-
78 bit IN FALSE hal_p260c.0.pin-13-out --l-
78 bit OUT FALSE hal_p260c.0.pin-14-in --l-
78 bit IN FALSE hal_p260c.0.pin-14-out --l-
78 bit OUT FALSE hal_p260c.0.pin-15-in --l-
78 bit IN FALSE hal_p260c.0.pin-15-out --l-
78 bit OUT FALSE hal_p260c.0.pin-16-in --l-
78 bit IN FALSE hal_p260c.0.pin-16-out --l-
78 s32 IN 0 hal_p260c.0.rx_cnt_error --l-
78 bit OUT FALSE hal_p260c.0.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.0.rx_perm_error --l-
78 s32 OUT 0 hal_p260c.refresh.time ----
78 s32 I/O 0 hal_p260c.refresh.tmax ----
78 bit OUT FALSE hal_p260c.refresh.tmax-inc ----
78 bit OUT FALSE hal_p260c.rx_comm_error --l-
78 bit OUT FALSE hal_p260c.rx_perm_error --l-
78 bit IN FALSE hal_p260c.rx_reset_error --l-
78 s32 IN 0 hal_p260c.sys_max_read --l-
78 s32 IN 0 hal_p260c.sys_max_write --l-
78 s32 IN 0 hal_p260c.sys_writecnt --l-

I suspect you may have been trying to use the old module and not re-built to account
for linkage relocations, this will solve it in future.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 24, 2017

Above 2 entries to preserve items in forum posts

ArcEye commented Feb 24, 2017

Above 2 entries to preserve items in forum posts

@pmcstone

This comment has been minimized.

Show comment
Hide comment
@pmcstone

pmcstone Feb 24, 2017

Yes I am willing to donate it. Thanks for everything Arc!

pmcstone commented Feb 24, 2017

Yes I am willing to donate it. Thanks for everything Arc!

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 24, 2017

Member

@ArcEye will you integrate the driver?

Member

machinekoder commented Feb 24, 2017

@ArcEye will you integrate the driver?

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 24, 2017

Just done so at #1150

ArcEye commented Feb 24, 2017

Just done so at #1150

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 28, 2017

Member

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

Member

machinekoder commented Feb 28, 2017

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

https://github.com/machinekit/machinekit-docs/blob/master/docs/hal/instcomp.asciidoc#instanceparams

pincount does not work inside the function, because instcomp sets it to -1 at instantiation, so that any value passed to one instance is not then passed to any subsequent instances that don't specify pincount.

Same goes for all instanceparams and argc/argv for that matter, they all now have local_xxxx copies which can be used safely

ArcEye commented Feb 28, 2017

localpincount is now named local_pincount for instcomps. This is an important change especially since one can use pincount as well in the components. However, using pincount results in things not working.

https://github.com/machinekit/machinekit-docs/blob/master/docs/hal/instcomp.asciidoc#instanceparams

pincount does not work inside the function, because instcomp sets it to -1 at instantiation, so that any value passed to one instance is not then passed to any subsequent instances that don't specify pincount.

Same goes for all instanceparams and argc/argv for that matter, they all now have local_xxxx copies which can be used safely

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 28, 2017

Member

Another design change since the multicore merge is that array variable types have changed as follows:
Previously one could access a variable hal_bit_t sample[TRIGGER] in the component: sample[0], now one can use sample(0). No problem, but a design change that should be noted down.

Member

machinekoder commented Feb 28, 2017

Another design change since the multicore merge is that array variable types have changed as follows:
Previously one could access a variable hal_bit_t sample[TRIGGER] in the component: sample[0], now one can use sample(0). No problem, but a design change that should be noted down.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

It is not a change to arrays, it is the convenience macros used by comp and instcomp.
The variables in question are not local ones within the function, but instance ones contained in the
*ip instance structure.

The convenience macros #define pins to a dereferenced pointer of the same name and the struct address of variables, so that users can just refer to the name.
Square brackets are changed to parenthesis brackets, for operations involving these macros
https://github.com/machinekit/machinekit/blob/master/src/hal/utils/instcomp.g#L1001

You can still refer to ip->local_variable[x] or use local_variable(x).

What you can't do is

int int_array[3] = {1,2,3};
int num = int_array(0);

because there is no macro defining int_array(x) and the compiler will expect a function

ArcEye commented Feb 28, 2017

It is not a change to arrays, it is the convenience macros used by comp and instcomp.
The variables in question are not local ones within the function, but instance ones contained in the
*ip instance structure.

The convenience macros #define pins to a dereferenced pointer of the same name and the struct address of variables, so that users can just refer to the name.
Square brackets are changed to parenthesis brackets, for operations involving these macros
https://github.com/machinekit/machinekit/blob/master/src/hal/utils/instcomp.g#L1001

You can still refer to ip->local_variable[x] or use local_variable(x).

What you can't do is

int int_array[3] = {1,2,3};
int num = int_array(0);

because there is no macro defining int_array(x) and the compiler will expect a function

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

But that is what I need, so don't stop pointing out things like that.

Over familiarity prevents me from looking at things the same way as others on some occasions 😄

ArcEye commented Feb 28, 2017

But that is what I need, so don't stop pointing out things like that.

Over familiarity prevents me from looking at things the same way as others on some occasions 😄

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 28, 2017

Member

I have two more problems:
When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:

halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

The other problem is related to Haltalk. I have one U32 out of HAL remote component pin that never is updated in HAL. I still have to figure out whats happening here.

Member

machinekoder commented Feb 28, 2017

I have two more problems:
When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:

halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

The other problem is related to Haltalk. I have one U32 out of HAL remote component pin that never is updated in HAL. I still have to figure out whats happening here.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 28, 2017

Member

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

Member

machinekoder commented Feb 28, 2017

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Feb 28, 2017

Member

I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore
so users can check out this tag in case there are problems.

Member

machinekoder commented Feb 28, 2017

I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore
so users can check out this tag in case there are problems.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

Can you attach a link to something I can test and I will look after lunch

ArcEye commented Feb 28, 2017

I can verify the second problem on an isolated setup - the problem seems to be applicable for all U32 pins.

Can you attach a link to something I can test and I will look after lunch

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:
halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

I can reproduce this one.
It appears to be a memory leak from launching halcmd 10x every second.
Something is not getting freed and eventually it runs out of memory

Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user halg_xinitfv:271 HAL: singleton component 'hal_lib16104' id=1014 initialized
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user --halcmd show pin db.funct.time
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1016 'halcmd16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=1014
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1014 'hal_lib16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:316 HAL: _halerrno=0

becomes

Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:58 HAL: extending arena by 512 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:61 HAL error: can't extend arena - below minfree: 944
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user shmalloc_desc:85 HAL error: giving up - can't allocate 96 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_create_objectfv:155 HAL error: insufficient memory for COMPONENT hal_lib13547 size=96
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=13693
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_exit:289 HAL error: no such component with id 13693

It is a pretty severe test, 600 loads per minute.
Not something that would have shown up under normal use.

ArcEye commented Feb 28, 2017

When I start to watch halcmd using watch -n 0.1 halcmd show pin foo rtapi seems to die after some time:
halcmd: hal_init() failed: -12
NOTE: 'rtapi' module must be loaded

I can reproduce this one.
It appears to be a memory leak from launching halcmd 10x every second.
Something is not getting freed and eventually it runs out of memory

Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:23:39 INTEL-i7 msgd:0: ulapi:16104:user halg_xinitfv:271 HAL: singleton component 'hal_lib16104' id=1014 initialized
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user --halcmd show pin db.funct.time
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1016 'halcmd16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=1014
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:293 HAL: removing component 1014 'hal_lib16104'
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 14:23:39 INTEL-i7 msgd:0: hal_lib:16104:user halg_exit:316 HAL: _halerrno=0

becomes

Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 rtapi:0: 4:rtapi_app:14901:user pid=14901 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:58 HAL: extending arena by 512 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user hal_heap_addmem:61 HAL error: can't extend arena - below minfree: 944
Feb 28 14:34:15 INTEL-i7 msgd:0: hal heap:13547:user rtapi_malloc: out of memory (size=96 arena=522560)
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user shmalloc_desc:85 HAL error: giving up - can't allocate 96 bytes
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_create_objectfv:155 HAL error: insufficient memory for COMPONENT hal_lib13547 size=96
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=13693
Feb 28 14:34:15 INTEL-i7 msgd:0: ulapi:13547:user halg_exit:289 HAL error: no such component with id 13693

It is a pretty severe test, 600 loads per minute.
Not something that would have shown up under normal use.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

The problem appears likely to be in here
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L279 in halg_exit()

I have it running at present with the debug section reporting memory enabled
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L317
to see what happens

ArcEye commented Feb 28, 2017

The problem appears likely to be in here
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L279 in halg_exit()

I have it running at present with the debug section reporting memory enabled
https://github.com/machinekit/machinekit/blob/master/src/hal/lib/hal_comp.c#L317
to see what happens

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

This is where it failed with the hal_sweep enabled

Feb 28 15:20:00 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user halg_xinitfv:271 HAL: singleton component 'hal_lib25051' id=32762 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32764 'halcmd25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32762
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32762 'hal_lib25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=1569872 allocated=1569872 freed=1568064 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=724926 allocated=786216 freed=261344 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:168 HAL:   strings on global heap: alloc=200446 freed=200163 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:320 HAL: hal_sweep: 2 objects freed
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user halg_xinitfv:271 HAL: singleton component 'hal_lib25056' id=32766 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32766
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:293 HAL: removing component 32766 'hal_lib25056'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=1570064 allocated=1570064 freed=1568256 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=724951 allocated=786248 freed=261376 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:168 HAL:   strings on global heap: alloc=200471 freed=200188 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 32770 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_xinitfv:265 HAL error: hal_ready(32770) failed rc=-22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 22 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32770
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 32770

Doesn't exactly make it clearer.
The crash immediately follows the line
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed
whereas every other print has said 2 objects freed

ArcEye commented Feb 28, 2017

This is where it failed with the hal_sweep enabled

Feb 28 15:20:00 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25051:user halg_xinitfv:271 HAL: singleton component 'hal_lib25051' id=32762 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32764 'halcmd25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32762
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:293 HAL: removing component 32762 'hal_lib25051'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=1569872 allocated=1569872 freed=1568064 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_heapstatus:157 HAL:   requested=724926 allocated=786216 freed=261344 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:168 HAL:   strings on global heap: alloc=200446 freed=200163 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25051:user halg_exit:320 HAL: hal_sweep: 2 objects freed
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25056:user halg_xinitfv:271 HAL: singleton component 'hal_lib25056' id=32766 initialized
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32766
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:293 HAL: removing component 32766 'hal_lib25056'
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:315 HAL: hal_errorcount()=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:316 HAL: _halerrno=0
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: HAL heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=262144 totail_avail=260112 fragments=1 largest=260112
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=1570064 allocated=1570064 freed=1568256 waste=0%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:151 HAL: global heap heap status
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:153 HAL:   arena=787136 totail_avail=262032 fragments=1 largest=262032
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_heapstatus:157 HAL:   requested=724951 allocated=786248 freed=261376 waste=7%
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:168 HAL:   strings on global heap: alloc=200471 freed=200188 balance=283
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:175 HAL:   RT objects: 464  alignment loss: 7  (1%)
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:177 HAL:   hal_malloc():   1
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user report_memory_usage:179 HAL:   unused:   261360
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed
Feb 28 15:20:01 INTEL-i7 rtapi:0: 4:rtapi_app:15998:user pid=15998 flavor=rt-preempt gcc=4.9.2 git=unknown
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user _ulapi_init(): ulapi rt-preempt unknown loaded
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 32770 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_xinitfv:265 HAL error: hal_ready(32770) failed rc=-22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_ready:354 HAL error: component 22 not found
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user --halcmd show pin db.funct.time
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 22
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user ulapi_hal_lib_cleanup:235 HAL: lib_module_id=32770
Feb 28 15:20:01 INTEL-i7 msgd:0: ulapi:25061:user halg_exit:289 HAL error: no such component with id 32770

Doesn't exactly make it clearer.
The crash immediately follows the line
Feb 28 15:20:01 INTEL-i7 msgd:0: hal_lib:25056:user halg_exit:320 HAL: hal_sweep: 1 objects freed
whereas every other print has said 2 objects freed

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Feb 28, 2017

If you want to 'try this at home'

mick@INTEL-i7:/usr/src/machinekit$ DEBUG=5 realtime restart
mick@INTEL-i7:/usr/src/machinekit$ halcmd newinst debounce db pincount=4
<commandline>:0: Realtime module 'debounce' loaded
mick@INTEL-i7:/usr/src/machinekit$ halcmd newthread servo 1000000 fp
mick@INTEL-i7:/usr/src/machinekit$ halcmd addf db servo
<commandline>:0: Function 'db' added to thread 'servo', rmb=0 wmb=0
mick@INTEL-i7:/usr/src/machinekit$ halcmd start
<commandline>:0: Realtime threads started
mick@INTEL-i7:/usr/src/machinekit$ watch -n0.1 halcmd show pin db.funct.time

ArcEye commented Feb 28, 2017

If you want to 'try this at home'

mick@INTEL-i7:/usr/src/machinekit$ DEBUG=5 realtime restart
mick@INTEL-i7:/usr/src/machinekit$ halcmd newinst debounce db pincount=4
<commandline>:0: Realtime module 'debounce' loaded
mick@INTEL-i7:/usr/src/machinekit$ halcmd newthread servo 1000000 fp
mick@INTEL-i7:/usr/src/machinekit$ halcmd addf db servo
<commandline>:0: Function 'db' added to thread 'servo', rmb=0 wmb=0
mick@INTEL-i7:/usr/src/machinekit$ halcmd start
<commandline>:0: Realtime threads started
mick@INTEL-i7:/usr/src/machinekit$ watch -n0.1 halcmd show pin db.funct.time

ArcEye added a commit to ArcEye/machinekit that referenced this issue Feb 28, 2017

Fix example testalloc.comp
Whilst experimenting to try to find source of problem with memory leak
machinekit#1123 (comment)
discovered that this component used old struct addressing to get size,
before it was made a union, requiring extra `.tag` inserted

Signed-off-by: Mick <arceye@mgware.co.uk>
@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 1, 2017

@machinekoder
I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore so users can check out this tag in case there are problems.

I was able to check this out and create a branch from it, albeit I think it should go much further back, to 75c06ff

However it contains stuff it shouldn't do and fails to build because the conv macros are in both src/hal/i_components and src/hal/components
Since the move to i_components was in a fairly recent commit by yourself at dda1f01, that is just peculiar.

However git reset --hard 75c06ff works fine and the result builds.

I am just running a sanity test, to ensure that this memory leak did not pre-exist the wholesale changes that @mhaberler made to memory allocation in multicore.

Result: It doesn't, so back to the current HEAD and valgrind or similar, if I can get MK to run in it.

ArcEye commented Mar 1, 2017

@machinekoder
I created a tag before the multicore-merge: https://github.com/machinekit/machinekit/tree/before-multicore so users can check out this tag in case there are problems.

I was able to check this out and create a branch from it, albeit I think it should go much further back, to 75c06ff

However it contains stuff it shouldn't do and fails to build because the conv macros are in both src/hal/i_components and src/hal/components
Since the move to i_components was in a fairly recent commit by yourself at dda1f01, that is just peculiar.

However git reset --hard 75c06ff works fine and the result builds.

I am just running a sanity test, to ensure that this memory leak did not pre-exist the wholesale changes that @mhaberler made to memory allocation in multicore.

Result: It doesn't, so back to the current HEAD and valgrind or similar, if I can get MK to run in it.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 1, 2017

Member

@ArcEye you need to run make clean first to get rid of the conv macros

Member

machinekoder commented Mar 1, 2017

@ArcEye you need to run make clean first to get rid of the conv macros

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 1, 2017

I thought I had cleaned it, but no matter. git reset --hard 75c06ff works fine

ArcEye commented Mar 1, 2017

I thought I had cleaned it, but no matter. git reset --hard 75c06ff works fine

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 1, 2017

The only thing I can say for sure about this error, is that it is directly related to the number of times
that halcmd is run and hal_lib is loaded.
Changing the frequency of watch to watch -n0.5 will extend the time it takes to run out of memory by 5x.

Next need to run a command which increments a param but does not display anything.
That should hopefully point towards whether it is the loading itself or the searching and display that has the leak.

ArcEye commented Mar 1, 2017

The only thing I can say for sure about this error, is that it is directly related to the number of times
that halcmd is run and hal_lib is loaded.
Changing the frequency of watch to watch -n0.5 will extend the time it takes to run out of memory by 5x.

Next need to run a command which increments a param but does not display anything.
That should hopefully point towards whether it is the loading itself or the searching and display that has the leak.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 1, 2017

Running halcmd setp db.delay $counter from within a loop which increments and prints $counter
produces 2495 iterations before running out of memory.

This corresponds exactly with the time that the previous tests ran before error
eg. watch -n0.1.......... lasted 4 mins 9 secs approx, which is almost exactly 2495 / 600 (4.158)

So it is nothing within the print_pin_info() display routine and looks like being the same amount of memory lost per load / unload.

ArcEye commented Mar 1, 2017

Running halcmd setp db.delay $counter from within a loop which increments and prints $counter
produces 2495 iterations before running out of memory.

This corresponds exactly with the time that the previous tests ran before error
eg. watch -n0.1.......... lasted 4 mins 9 secs approx, which is almost exactly 2495 / 600 (4.158)

So it is nothing within the print_pin_info() display routine and looks like being the same amount of memory lost per load / unload.

@luminize

This comment has been minimized.

Show comment
Hide comment
@luminize

luminize Mar 13, 2017

Member

wait a minute, I think this is some fallout which was covered before. It probably hast to do with the instantiation and it's arguments. i'll dig some some history. Be right back

Member

luminize commented Mar 13, 2017

wait a minute, I think this is some fallout which was covered before. It probably hast to do with the instantiation and it's arguments. i'll dig some some history. Be right back

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 13, 2017

I take it getting the name from argv[1] and changing the fields cures the problem?

ArcEye commented Mar 13, 2017

I take it getting the name from argv[1] and changing the fields cures the problem?

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 13, 2017

Member

@einstine909 can you share the configuration?

Member

machinekoder commented Mar 13, 2017

@einstine909 can you share the configuration?

@luminize

This comment has been minimized.

Show comment
Hide comment
@luminize

luminize Mar 13, 2017

Member

@ArcEye yes, that was the problem. I've got jplan working again. Gonna test the interpolator later (deadline in sight)

Member

luminize commented Mar 13, 2017

@ArcEye yes, that was the problem. I've got jplan working again. Gonna test the interpolator later (deadline in sight)

@einstine909

This comment has been minimized.

Show comment
Hide comment
@einstine909

einstine909 commented Mar 13, 2017

@machinekoder here are my config files: SyilX4-Config.zip

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 13, 2017

Member

I will test the configuration when I have access to my BBB.
@einstine909 Do you also experience the same problems when you don't load haltalk?

Member

machinekoder commented Mar 13, 2017

I will test the configuration when I have access to my BBB.
@einstine909 Do you also experience the same problems when you don't load haltalk?

@einstine909

This comment has been minimized.

Show comment
Hide comment
@einstine909

einstine909 Mar 13, 2017

@machinekoder: Just tested after removing haltalk from the config. I get the same problems as before.

einstine909 commented Mar 13, 2017

@machinekoder: Just tested after removing haltalk from the config. I get the same problems as before.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 17, 2017

Member

@einstine909 Did you find time to test with Jessie?

Member

machinekoder commented Mar 17, 2017

@einstine909 Did you find time to test with Jessie?

@einstine909

This comment has been minimized.

Show comment
Hide comment
@einstine909

einstine909 commented Mar 18, 2017

@machinekoder Same results.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 19, 2017

Member

Okay, I tested with Jessie and RT Preempt kernel on the BBB and still getting the same results too. However, I cannot reproduce it on my desktop machine. I will test with a second x86 machine and see if I can reproduce the problem on a device where it is easier to debug.

Member

machinekoder commented Mar 19, 2017

Okay, I tested with Jessie and RT Preempt kernel on the BBB and still getting the same results too. However, I cannot reproduce it on my desktop machine. I will test with a second x86 machine and see if I can reproduce the problem on a device where it is easier to debug.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 19, 2017

Member

I spent the whole afternoon debugging the problem and still can't find it. The problem is not reproducible on fast desktop machines and it looks like it is related to delayed responses from Haltalk on the BBB. I have no idea what has changed in the multicore-branch that either affects Haltalks interaction performance with HAL or the network latency in general.

Member

machinekoder commented Mar 19, 2017

I spent the whole afternoon debugging the problem and still can't find it. The problem is not reproducible on fast desktop machines and it looks like it is related to delayed responses from Haltalk on the BBB. I have no idea what has changed in the multicore-branch that either affects Haltalks interaction performance with HAL or the network latency in general.

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Mar 20, 2017

Member

@ArcEye Maybe you can reproduce this: I did a new build with ./configure --with-examples and now I'm getting segfaults all over the place. The backtrace looks something along the lines:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
81          return (__atomic_fetch_or(bitmap + RTAPI_BIT_WORD(nr),
(gdb) bt
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
#1  0x00007f99a459a9a2 in rtapi_mutex_get (mutex=0x8) at rtapi/rtapi.h:553
#2  0x00007f99a459bb16 in halg_ready (use_hal_mutex=1, comp_id=38) at hal/lib/hal_comp.c:351
#3  0x0000000000403cf0 in hal_ready (comp_id=<optimized out>) at hal/lib/hal.h:413
#4  hal_setup (self=0x7ffd94777a70) at machinetalk/haltalk/haltalk_main.cc:248
#5  main (argc=1, argv=0x7ffd94778538) at machinetalk/haltalk/haltalk_main.cc:456
Member

machinekoder commented Mar 20, 2017

@ArcEye Maybe you can reproduce this: I did a new build with ./configure --with-examples and now I'm getting segfaults all over the place. The backtrace looks something along the lines:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
81          return (__atomic_fetch_or(bitmap + RTAPI_BIT_WORD(nr),
(gdb) bt
#0  0x00007f99a459a8a2 in rtapi_test_and_set_bit (nr=0, bitmap=0x8) at rtapi/rtapi_bitops.h:81
#1  0x00007f99a459a9a2 in rtapi_mutex_get (mutex=0x8) at rtapi/rtapi.h:553
#2  0x00007f99a459bb16 in halg_ready (use_hal_mutex=1, comp_id=38) at hal/lib/hal_comp.c:351
#3  0x0000000000403cf0 in hal_ready (comp_id=<optimized out>) at hal/lib/hal.h:413
#4  hal_setup (self=0x7ffd94777a70) at machinetalk/haltalk/haltalk_main.cc:248
#5  main (argc=1, argv=0x7ffd94778538) at machinetalk/haltalk/haltalk_main.cc:456
@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 20, 2017

I did a new build with ./configure --with-examples

The switch is --enable-examples, but I don't know if it actually does anything.
Just downloaded and building now.

Yes wholesale segfaults just doing make setuid, something badly wrong, which probably is why 2 users are getting strange faults with recent builds and the website builds were failing even before the protobuf issue with a corrupt git object

fatal: loose object ee30f70e7ace4c41a3d901a014acfe09d80ba58c (stored in .git/objects/ee/30f70e7ace4c41a3d901a014acfe09d80ba58c) is corrupt

ArcEye commented Mar 20, 2017

I did a new build with ./configure --with-examples

The switch is --enable-examples, but I don't know if it actually does anything.
Just downloaded and building now.

Yes wholesale segfaults just doing make setuid, something badly wrong, which probably is why 2 users are getting strange faults with recent builds and the website builds were failing even before the protobuf issue with a corrupt git object

fatal: loose object ee30f70e7ace4c41a3d901a014acfe09d80ba58c (stored in .git/objects/ee/30f70e7ace4c41a3d901a014acfe09d80ba58c) is corrupt
@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 20, 2017

@machinekoder

My forked repo is completely up to date and now produces the same error when it gets to make setuid

The segfault in dmsg is coming from flavor, which again ties in with the problems users have had recently, with segfaults and selecting default of posix even if it doesn't exist

[ 2047.893665] flavor[17698]: segfault at 0 ip 00002adc3f54ec3a sp 00007fffab59cec8 error 4 in libc-2.19.so[2adc3f4cd000+1a1000]
[ 2047.901586] flavor[17707]: segfault at 0 ip 00002ae142da6c3a sp 00007fffc3f78798 error 4 in libc-2.19.so[2ae142d25000+1a1000]
[ 2047.955207] flavor[17718]: segfault at 0 ip 00002ab5f1bc3c3a sp 00007fff83984918 error 4 in libc-2.19.so[2ab5f1b42000+1a1000]
[ 2047.964343] flavor[17727]: segfault at 0 ip 00002b0ca275dc3a sp 00007ffd04726658 error 4 in libc-2.19.so[2b0ca26dc000+1a1000]
[ 2052.895429] flavor[17901]: segfault at 0 ip 00002af88c685c3a sp 00007fff78aeeeb8 error 4 in libc-2.19.so[2af88c604000+1a1000]
[ 2052.898255] flavor[17905]: segfault at 0 ip 00002b3493e8cc3a sp 00007ffca32b4328 error 4 in libc-2.19.so[2b3493e0b000+1a1000]
[ 2096.600142] flavor[17833]: segfault at 0 ip 00007f1d205ebc3a sp 00007ffcf1b7c7c8 error 4 in libc-2.19.so[7f1d2056a000+1a1000]
[ 2096.604187] flavor[17840]: segfault at 0 ip 00007fadc0f0fc3a sp 00007ffca3edef18 error 4 in libc-2.19.so[7fadc0e8e000+1a1000]
[ 2096.608005] flavor[17847]: segfault at 0 ip 00007f6accadfc3a sp 00007ffd4c7dd9b8 error 4 in libc-2.19.so[7f6acca5e000+1a1000]
[ 2096.611567] flavor[17854]: segfault at 0 ip 00007f72be22ac3a sp 00007ffd12ecb648 error 4 in libc-2.19.so[7f72be1a9000+1a1000]
[ 2096.615511] flavor[17861]: segfault at 0 ip 00007fb3365f7c3a sp 00007ffe95992e38 error 4 in libc-2.19.so[7fb336576000+1a1000]
[ 2096.620116] flavor[17868]: segfault at 0 ip 00007f3043ffbc3a sp 00007ffc43b9e058 error 4 in libc-2.19.so[7f3043f7a000+1a1000]
[ 2096.623251] flavor[17872]: segfault at 0 ip 00007f6862a98c3a sp 00007ffcdc7e6b88 error 4 in libc-2.19.so[7f6862a17000+1a1000]
[ 2096.625898] flavor[17876]: segfault at 0 ip 00007f70f1071c3a sp 00007ffcb533a178 error 4 in libc-2.19.so[7f70f0ff0000+1a1000]
[ 2096.628564] flavor[17880]: segfault at 0 ip 00007f37e18c3c3a sp 00007fffbf400058 error 4 in libc-2.19.so[7f37e1842000+1a1000]
[ 2096.638829] flavor[17886]: segfault at 0 ip 00007fc7d1841c3a sp 00007fffe29cd2e8 error 4 in libc-2.19.so[7fc7d17c0000+1a1000]
[ 2118.277886] show_signal_msg: 30 callbacks suppressed

ArcEye commented Mar 20, 2017

@machinekoder

My forked repo is completely up to date and now produces the same error when it gets to make setuid

The segfault in dmsg is coming from flavor, which again ties in with the problems users have had recently, with segfaults and selecting default of posix even if it doesn't exist

[ 2047.893665] flavor[17698]: segfault at 0 ip 00002adc3f54ec3a sp 00007fffab59cec8 error 4 in libc-2.19.so[2adc3f4cd000+1a1000]
[ 2047.901586] flavor[17707]: segfault at 0 ip 00002ae142da6c3a sp 00007fffc3f78798 error 4 in libc-2.19.so[2ae142d25000+1a1000]
[ 2047.955207] flavor[17718]: segfault at 0 ip 00002ab5f1bc3c3a sp 00007fff83984918 error 4 in libc-2.19.so[2ab5f1b42000+1a1000]
[ 2047.964343] flavor[17727]: segfault at 0 ip 00002b0ca275dc3a sp 00007ffd04726658 error 4 in libc-2.19.so[2b0ca26dc000+1a1000]
[ 2052.895429] flavor[17901]: segfault at 0 ip 00002af88c685c3a sp 00007fff78aeeeb8 error 4 in libc-2.19.so[2af88c604000+1a1000]
[ 2052.898255] flavor[17905]: segfault at 0 ip 00002b3493e8cc3a sp 00007ffca32b4328 error 4 in libc-2.19.so[2b3493e0b000+1a1000]
[ 2096.600142] flavor[17833]: segfault at 0 ip 00007f1d205ebc3a sp 00007ffcf1b7c7c8 error 4 in libc-2.19.so[7f1d2056a000+1a1000]
[ 2096.604187] flavor[17840]: segfault at 0 ip 00007fadc0f0fc3a sp 00007ffca3edef18 error 4 in libc-2.19.so[7fadc0e8e000+1a1000]
[ 2096.608005] flavor[17847]: segfault at 0 ip 00007f6accadfc3a sp 00007ffd4c7dd9b8 error 4 in libc-2.19.so[7f6acca5e000+1a1000]
[ 2096.611567] flavor[17854]: segfault at 0 ip 00007f72be22ac3a sp 00007ffd12ecb648 error 4 in libc-2.19.so[7f72be1a9000+1a1000]
[ 2096.615511] flavor[17861]: segfault at 0 ip 00007fb3365f7c3a sp 00007ffe95992e38 error 4 in libc-2.19.so[7fb336576000+1a1000]
[ 2096.620116] flavor[17868]: segfault at 0 ip 00007f3043ffbc3a sp 00007ffc43b9e058 error 4 in libc-2.19.so[7f3043f7a000+1a1000]
[ 2096.623251] flavor[17872]: segfault at 0 ip 00007f6862a98c3a sp 00007ffcdc7e6b88 error 4 in libc-2.19.so[7f6862a17000+1a1000]
[ 2096.625898] flavor[17876]: segfault at 0 ip 00007f70f1071c3a sp 00007ffcb533a178 error 4 in libc-2.19.so[7f70f0ff0000+1a1000]
[ 2096.628564] flavor[17880]: segfault at 0 ip 00007f37e18c3c3a sp 00007fffbf400058 error 4 in libc-2.19.so[7f37e1842000+1a1000]
[ 2096.638829] flavor[17886]: segfault at 0 ip 00007fc7d1841c3a sp 00007fffe29cd2e8 error 4 in libc-2.19.so[7fc7d17c0000+1a1000]
[ 2118.277886] show_signal_msg: 30 callbacks suppressed
@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 20, 2017

git reset --hard to commit of around 9th March and builds and runs fine.
That was when I last updated my working copy locally.

Now doing a process of elimination to try and find the point beyond which problems occur

ArcEye commented Mar 20, 2017

git reset --hard to commit of around 9th March and builds and runs fine.
That was when I last updated my working copy locally.

Now doing a process of elimination to try and find the point beyond which problems occur

@ArcEye ArcEye closed this Mar 20, 2017

@ArcEye ArcEye reopened this Mar 20, 2017

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 20, 2017

Found the error point

This works fine

commit b7f7d8a468fd2a38c2b446fc6a07a1a6c2261cf1
Author: Mick <arceye@mgware.co.uk>
Date:   Sat Mar 18 14:20:28 2017 +0000

    Remove stray readme.md from machinekit-multicore testing phase
    
    Signed-off-by: Mick <arceye@mgware.co.uk>

But including next commit errors

commit 51709ebf9924b8635975dd20b8e79105e3e05748
Author: Alexander Rössler <mail@roessler.systems>
Date:   Sun Mar 19 11:27:43 2017 +0100

    halcmd_main: replace magic numbers

ArcEye commented Mar 20, 2017

Found the error point

This works fine

commit b7f7d8a468fd2a38c2b446fc6a07a1a6c2261cf1
Author: Mick <arceye@mgware.co.uk>
Date:   Sat Mar 18 14:20:28 2017 +0000

    Remove stray readme.md from machinekit-multicore testing phase
    
    Signed-off-by: Mick <arceye@mgware.co.uk>

But including next commit errors

commit 51709ebf9924b8635975dd20b8e79105e3e05748
Author: Alexander Rössler <mail@roessler.systems>
Date:   Sun Mar 19 11:27:43 2017 +0100

    halcmd_main: replace magic numbers
@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Mar 20, 2017

Found the error point

That was actually a false point, the main problem is a couple of commits further in with rtapi_compat.c

I have reverted the changes and tidied the halcmd_main.c buffer size back to a local 200 figure (albeit this was not the problem)

PR en route

ArcEye commented Mar 20, 2017

Found the error point

That was actually a false point, the main problem is a couple of commits further in with rtapi_compat.c

I have reverted the changes and tidied the halcmd_main.c buffer size back to a local 200 figure (albeit this was not the problem)

PR en route

@bschousek

This comment has been minimized.

Show comment
Hide comment
@bschousek

bschousek May 23, 2017

Help me with my GitHub understanding: It seems that #1144 should be included in the latest automatic builds, available at deb.machinekit.io. Yet trying 0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie doesn't have rapidrate defined as the pull should have done.

bschousek commented May 23, 2017

Help me with my GitHub understanding: It seems that #1144 should be included in the latest automatic builds, available at deb.machinekit.io. Yet trying 0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie doesn't have rapidrate defined as the pull should have done.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye May 23, 2017

The changes are in the repo
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/status.proto#L401
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/types.proto#L594
A package build just builds what is in the repo, so those definitions should be there.

I don't use python or machinetalk, hopefully @machinekoder can assist with why it is not being found.

ArcEye commented May 23, 2017

The changes are in the repo
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/status.proto#L401
https://github.com/machinekit/machinekit/blob/master/src/machinetalk/proto/src/machinetalk/protobuf/types.proto#L594
A package build just builds what is in the repo, so those definitions should be there.

I don't use python or machinetalk, hopefully @machinekoder can assist with why it is not being found.

@bschousek

This comment has been minimized.

Show comment
Hide comment
@bschousek

bschousek May 24, 2017

I can only imagine that there is something wrong with the automated build process. I agree the changes are clearly in the repo, but just as clearly comparing status_pb2.py from the deb.machinekit.io package with status_pb2.py from my own build shows the package is missing the rapidrate definition. My source tree is directly from the github.

bschousek commented May 24, 2017

I can only imagine that there is something wrong with the automated build process. I agree the changes are clearly in the repo, but just as clearly comparing status_pb2.py from the deb.machinekit.io package with status_pb2.py from my own build shows the package is missing the rapidrate definition. My source tree is directly from the github.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye May 24, 2017

You will need to be precise as to which package you are using and attach whatever program you are using that produces this error.

I have downloaded machinekit_0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie_amd64.deb
and opened it and it does contain definitions of machinetalk.EmcStatusMotion.rapidrate and status_pb2.py is byte identical to the file produced in my RIP build.

ArcEye commented May 24, 2017

You will need to be precise as to which package you are using and attach whatever program you are using that produces this error.

I have downloaded machinekit_0.1.1495389287-1mk.travis.master.git466cbe1f~1jessie_amd64.deb
and opened it and it does contain definitions of machinetalk.EmcStatusMotion.rapidrate and status_pb2.py is byte identical to the file produced in my RIP build.

@bschousek

This comment has been minimized.

Show comment
Hide comment
@bschousek

bschousek May 24, 2017

Thank you ArcEye you helped me find the error. It turns out that I have copies of status_pb2.py, one in /usr/lib and one in /usr/local/lib. One has rapidrate, the other does not, and obviously the one without must be earlier in the search path. I don't know how I ended up with two copies, but some early fumbling with the Vagrant install is certainly to blame.

bschousek commented May 24, 2017

Thank you ArcEye you helped me find the error. It turns out that I have copies of status_pb2.py, one in /usr/lib and one in /usr/local/lib. One has rapidrate, the other does not, and obviously the one without must be earlier in the search path. I don't know how I ended up with two copies, but some early fumbling with the Vagrant install is certainly to blame.

@bschousek

This comment has been minimized.

Show comment
Hide comment
@bschousek

bschousek May 24, 2017

The source of the offending out of date prototype is from https://pypi.python.org/pypi/machinetalk-protobuf/1.0.6, most likely something I installed via pip from the command line in my vagrant box when I was learning. The pypi package appears to derive from https://github.com/machinekit/machinetalk-protobuf.

I pulled an issue against machinekit/machinetalk-protobuf at machinekit/machinetalk-protobuf#76

bschousek commented May 24, 2017

The source of the offending out of date prototype is from https://pypi.python.org/pypi/machinetalk-protobuf/1.0.6, most likely something I installed via pip from the command line in my vagrant box when I was learning. The pypi package appears to derive from https://github.com/machinekit/machinetalk-protobuf.

I pulled an issue against machinekit/machinetalk-protobuf at machinekit/machinetalk-protobuf#76

@machinekoder

This comment has been minimized.

Show comment
Hide comment
@machinekoder

machinekoder Jun 19, 2017

Member

I still have problems with comp and instcomp for package install even if gcc-4.7 is enabled per default. The problems disappear when using RIP install on the same machine.

Member

machinekoder commented Jun 19, 2017

I still have problems with comp and instcomp for package install even if gcc-4.7 is enabled per default. The problems disappear when using RIP install on the same machine.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Jun 19, 2017

It all still comes down to this problem: #1060
and related issues regards wrong flavor and the makefile.inc not being set to build for other than posix.

The armhf builds were truncated to get them in within the time limit for Travis, with the result that the
packages don't build components properly.

This fell off the radar but the underlying problem was never resolved, because it would require proper armhf builds by a different means and probably completely dropping Wheezy, if something like @zultron s Docker build was adopted.

ArcEye commented Jun 19, 2017

It all still comes down to this problem: #1060
and related issues regards wrong flavor and the makefile.inc not being set to build for other than posix.

The armhf builds were truncated to get them in within the time limit for Travis, with the result that the
packages don't build components properly.

This fell off the radar but the underlying problem was never resolved, because it would require proper armhf builds by a different means and probably completely dropping Wheezy, if something like @zultron s Docker build was adopted.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Jul 3, 2017

The comp component build issue is hopefully solved by #1230

ArcEye commented Jul 3, 2017

The comp component build issue is hopefully solved by #1230

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Jul 31, 2018

multicore was merged 18 months ago, closing

ArcEye commented Jul 31, 2018

multicore was merged 18 months ago, closing

@ArcEye ArcEye closed this Jul 31, 2018

@l29ah

This comment has been minimized.

Show comment
Hide comment
@l29ah

l29ah Aug 10, 2018

I regularily hit the halcmd show pin foo hanging problem when i poll temperature on my 3d printer on am3358; 3f1e265 here.

l29ah commented Aug 10, 2018

I regularily hit the halcmd show pin foo hanging problem when i poll temperature on my 3d printer on am3358; 3f1e265 here.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Aug 10, 2018

Can you move this to the relevant Issue tracker as per email on the list https://groups.google.com/forum/#!topic/machinekit/I70IfT-wan0

Issue tracker will be https://github.com/machinekit/machinekit-hal/issues

Will also need to explain what exactly you are doing and why you think that particular commit causes it.

There is a known problem with repeatedly polling halcmd pin <foo> to get an output instead of doing it in a programmatic way.
#1123 (comment)

I think it is may be due to the way memory is ordered on boundaries to enable atomic operations.
This would result in orphaned memory and if an operation is repeated a huge number of times, you run out of hal memory.

ArcEye commented Aug 10, 2018

Can you move this to the relevant Issue tracker as per email on the list https://groups.google.com/forum/#!topic/machinekit/I70IfT-wan0

Issue tracker will be https://github.com/machinekit/machinekit-hal/issues

Will also need to explain what exactly you are doing and why you think that particular commit causes it.

There is a known problem with repeatedly polling halcmd pin <foo> to get an output instead of doing it in a programmatic way.
#1123 (comment)

I think it is may be due to the way memory is ordered on boundaries to enable atomic operations.
This would result in orphaned memory and if an operation is repeated a huge number of times, you run out of hal memory.

@l29ah

This comment has been minimized.

Show comment
Hide comment
@l29ah

l29ah Aug 10, 2018

I am at the issue tracker :)
I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way. Sometimes it will never return, and poking halcmd suggesting the problem outlined in #1123 (comment)
As far as i understood, this issue is considered fixed, so i'm writing to say it is not.
The commit hash is the history point where i observe the behaviour; i don't tell that the commit is the case.

l29ah commented Aug 10, 2018

I am at the issue tracker :)
I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way. Sometimes it will never return, and poking halcmd suggesting the problem outlined in #1123 (comment)
As far as i understood, this issue is considered fixed, so i'm writing to say it is not.
The commit hash is the history point where i observe the behaviour; i don't tell that the commit is the case.

@luminize

This comment has been minimized.

Show comment
Hide comment
@luminize

luminize Aug 10, 2018

Member

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

Member

luminize commented Aug 10, 2018

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Aug 10, 2018

You are at CLOSED general Issue tracker that mentions the problem amoungst many others.

The problem is it will remain closed, so to air this issue you need to open a new one.

I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way.

There is nothing programmatic about using a bash script to repeatedly call halcmd and then try to parse the output.

I was referring to finding the hal_pin_t struct for the pin name in question and reading its _data_ptr_addr for the value.
This doesn't have the side effects of repeatedly loading the whole halcmd component

A comment that github decided to hide for some reason, showed that using halpr_find_pin_by_name() , which does what I described above, ran a huge number of times without issue (10,000 times or equivalent to running M109 for 2.777 hours continuously)

Just run a program testmem via halcmd loadusr -W testmem 10000
that does 10,000 iterations of halpr_find_pin_by_name(), gets the value and prints the result.
Ran to the end without any issues.

The issue was never 'solved', it just appeared an extreme use of halcmd which appeared unlikely to be encountered.
I was not aware there was a M code script which did exactly the same thing, but then I am not into plastic squirting 😜

I will move this into a separate issue, in the new repo and see if I can find time write a user component
which takes a pin name and value and outputs TRUE when the value is met or exceeded, or similar.

Will have to look at how this is used though, I imagine the call to M109 is blocking and only returns when the bed is up to temp, thus pausing the GCode.

ArcEye commented Aug 10, 2018

You are at CLOSED general Issue tracker that mentions the problem amoungst many others.

The problem is it will remain closed, so to air this issue you need to open a new one.

I'm using the included nc_files/M109 to wait for the temperature to settle. As far i understand, this IS a programmatic way.

There is nothing programmatic about using a bash script to repeatedly call halcmd and then try to parse the output.

I was referring to finding the hal_pin_t struct for the pin name in question and reading its _data_ptr_addr for the value.
This doesn't have the side effects of repeatedly loading the whole halcmd component

A comment that github decided to hide for some reason, showed that using halpr_find_pin_by_name() , which does what I described above, ran a huge number of times without issue (10,000 times or equivalent to running M109 for 2.777 hours continuously)

Just run a program testmem via halcmd loadusr -W testmem 10000
that does 10,000 iterations of halpr_find_pin_by_name(), gets the value and prints the result.
Ran to the end without any issues.

The issue was never 'solved', it just appeared an extreme use of halcmd which appeared unlikely to be encountered.
I was not aware there was a M code script which did exactly the same thing, but then I am not into plastic squirting 😜

I will move this into a separate issue, in the new repo and see if I can find time write a user component
which takes a pin name and value and outputs TRUE when the value is met or exceeded, or similar.

Will have to look at how this is used though, I imagine the call to M109 is blocking and only returns when the bed is up to temp, thus pausing the GCode.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Aug 10, 2018

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

The issue will occur on any computer, doubtless it appears a lot quicker on a BBB, with its limited resources and processing power.

ArcEye commented Aug 10, 2018

@l29ah i think this is a beaglebone issue. Can you ask on the machinekit Google group list (after searching that list first)? That might raise your chance on a satisfactory answer.

The issue will occur on any computer, doubtless it appears a lot quicker on a BBB, with its limited resources and processing power.

@ArcEye

This comment has been minimized.

Show comment
Hide comment
@ArcEye

ArcEye Aug 10, 2018

Transferred to machinekit/machinekit-hal#142

Please do not use this Issue any further

ArcEye commented Aug 10, 2018

Transferred to machinekit/machinekit-hal#142

Please do not use this Issue any further

ArcEye added a commit to ArcEye/machinekit that referenced this issue Aug 11, 2018

Replace M109 bash script with C component
Current M109 uses very inefficient method.
Calls `halcmd show pin Therm.Temp0` and parses output to check if up to required.

Has an unwanted side effect with new memory allocation routines brought in
to align memory for use in atomic operations.

Repeatedly loading and unloading the halcmd component just to read a pin value,
seems to orphan small amounts of memory prior to the memory boundary.
Eventually HAL memory pool is unable to supply the required memory and crashes.

The new component reads the pin value directly through HAL API

Issue #1123 and machinekit-hal/issues/#142 refer

Signed-off-by: Mick <arceye@mgware.co.uk>

ArcEye added a commit to ArcEye/machinekit that referenced this issue Aug 12, 2018

Replace M109 bash script with C component
Current M109 uses very inefficient method.
Calls `halcmd show pin Therm.Temp0` and parses output to check if up to required.

Has an unwanted side effect with new memory allocation routines brought in
to align memory for use in atomic operations.

Repeatedly loading and unloading the halcmd component just to read a pin value,
seems to orphan small amounts of memory prior to the memory boundary.
Eventually HAL memory pool is unable to supply the required memory and crashes.

The new component reads the pin value directly through HAL API

Issue #1123 and machinekit-hal/issues/#142 refer

Signed-off-by: Mick <arceye@mgware.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment