Skip to content

Conversation

@zultron
Copy link
Contributor

@zultron zultron commented Jul 11, 2019

This PR is a port of machinekit/machinekit#1462 to the machinekit-hal repo. There's lengthy discussion there about this work. In summary, this work merges separate flavor builds into one, solving long-standing problems from the multiple-flavor "unified build" and greatly simplifying the code, the build system and the packaging.

Changes in this PR:

  • Separate (but identical) modules in rtlib/<flavor>/*.so are built once into rtlib/modules/*.so
    • These were actually binary-identical, and were needlessly separated into flavor-specific directories
  • Separate rtapi_app_<flavor> binaries are merged into a single rtapi_app
    • The rtapi_switch is removed and changed back to something like it was before the "unified build"
    • A new "flavor" API builds off the former flavor "hooks" mechanism, and runs all flavor-specific code and data through a struct that all flavors implement, similar to the former rtapi_switch
    • The rtapi_app now picks a flavor internally (using the same routines as the old flavor utility)
    • Flavor-specific code had been selected by #ifdef blocks is now selected at run-time, either moved into functions behind the flavor API, or by checking generic characteristics of the running flavor (e.g. RT or non-RT) exposed through the flavor API
      • The flavor rtapi_app picks is now mostly irrelevant outside of that executable; e.g. the realtime script doesn't need to know about flavors, and code outside the src/rtapi/ directory only occasionally queries those generic flavor characteristics

This work expects to convey at least the following benefits:

  • Finally resolve the problems with building out-of-tree modules
    • The comp/instcomp tools don't need to worry about what flavors to build or where to install the modules
  • Reduce build system complexity
    • The autoconf script doesn't have to template so many config files and scripts to account for flavors being built
    • Likewise, the Makefile and Submakefiles don't have to deal with flavors
    • Kernel thread support has been removed for a long time, but this PR cleans out hundreds of lines of build system code that managed the build system for that
    • Modules are now only built once; recursive make calls for thread flavor builds are now gone
  • Reduce packaging complexity
    • Although the flavor-specific e.g. machinekit-hal-rt-preempt packages are preserved, they are simply optional metapackages that pull in the needed kernel, and the base machinekit-hal package is sufficient
    • The machinekit-hal-dev package is now revived

This PR also ULAPI (non-RT) threads a separate, co-equal thread system, and removes flavor-specific configuration from the global data segment. Before, ULAPI threads used flavor-specific thread facilities (e.g. clock routines), and queried the global data segment for flavor information. That was unnecessary, though, since ULAPI by definition doesn't require RT performance. Flavor-specific code was changed to use POSIX thread facilities instead, and flavor information was removed from the global data segment. This further simplifies the code and further makes the thread flavor transparent outside of rtapi_app.

luminize and others added 30 commits November 10, 2018 14:22
hal/hostmot2: make watchdog timeout_ns a pin
scripts/linuxcnc.in: more meaningful instructions
6ca7c99806 Merge pull request machinekit#86 from machinekoder/fix-security
2c4db3edfd update npm package to fix security vulnerability
6c52a98b0a enable support for Python 3.6/3.7

git-subtree-dir: src/machinetalk/proto
git-subtree-split: 6ca7c99806401179ece164b07dc87852bfa8df9c
…el-threads

Revert "Remove hard kernel threads references"
machinekit/launcher: make sure zombies are excluded from running process
lib/python/launcher: fix  problems with superfluous whitespace
Add missing index handling to the pru encoder
zultron added 2 commits July 12, 2019 16:17
In previous tests, I somehow missed that when the Xenomai libs are
linked to an executable (or a `dlopen()`ed plugin), if they fail to
detect Xenomai on the running system, they will exit the application
with this message:

    Xenomai: /dev/rtheap is missing
    (chardev, major=10 minor=254)

This seems unavoidable, because this happens in `ld-linux.so` before a
single line of application code is executed, and the [Xenomai code][1]
appears not to provide a way to turn the check off.

This patch works around the problem by splitting `xenomai.c` into two.

`xenomai.c`:
- Still linked to `rtapi.so`
- Retains the flavor descriptor struct
- Retains the flavor checks and module init and exit hooks
- Module init now:
  - Calls `dlopen("xenomai_lodaer.so")`,
  - Extracts the `xenomai_descriptor_updater()` function, and
  - Calls that, passing the flavor descriptor to be updated

`xenomai_loader.c`:
- Linked into separate module with Xenomai libraries
- Contains all functions that depend on Xenomai libs
- Contains a `xenomai_descriptor_updater()` function that plugs those
  functions into the flavor descriptor

In this way, the Xenomai libraries are only loaded when the Xenomai
flavor is picked, and don't exit `rtapi_app` on non-Xenomai systems.

[1]:  https://gitlab.denx.de/Xenomai/xenomai/blob/eol/v2.6.x/src/skins/common/bind.c#L67-72
@zultron zultron force-pushed the mk-hal-single-module-dir-zultron-pr branch from 9aac00c to 1b768b7 Compare July 12, 2019 21:23
@cerna
Copy link
Contributor

cerna commented Jul 12, 2019

@zultron, I have been looking into your changes in 8653ff2 and trying to understand it. I have been under impression, (based little a bit on information #201 (comment) which I probably misinterpreted) that you will install one machinekit-hal package which will be the Core and then machinekit-hal-_flavor_ which will bring in the _flavor_.so API hook implementations, so you could start with Preempt_rt, then just install Xenomai kernel patch, libXenomai and machinekit-hal-xenomai_X_ and without reinstall suddently use Xenomai (As the kernel selection will identify impossibility to run Preempt_RT, but possibility to run Xenomai).

However, I must have been mistaken, because this looks like the same functionality just in not very portable package. (Portable means no reinstall or update of the main machinekit-hal package would be needed when new flavor will be added/developed, it will just try all accessible .so. And yes, I though the solution was not developed yet.)

Does that mean that I will have to implement this convoluted solution for all my flavored endeavors?

@zultron
Copy link
Contributor Author

zultron commented Jul 13, 2019

@zultron, I have been looking into your changes in 8653ff2 and trying to understand it. I have been under impression, (based little a bit on information #201 (comment) which I probably misinterpreted) that you will install one machinekit-hal package which will be the Core and then machinekit-hal-_flavor_ which will bring in the _flavor_.so API hook implementations, so you could start with Preempt_rt, then just install Xenomai kernel patch, libXenomai and machinekit-hal-xenomai_X_ and without reinstall suddently use Xenomai (As the kernel selection will identify impossibility to run Preempt_RT, but possibility to run Xenomai).

However, I must have been mistaken, because this looks like the same functionality just in not very portable package.

Does that mean that I will have to implement this convoluted solution for all my flavored endeavors?

The one machinekit-hal package will be the Core, but there aren't any required e.g. machinekit-hal-xenomai flavor packages (these packages do still exist as metapackage stubs so that e.g. the machinekit-hal-rt-preempt package can pull in the RT_PREEMPT kernel).

This is how it is right now, and it is a somewhat convoluted solution, although I don't think it's too bad. In the packaging, it's a trivial change to put the Xenomai plugin into the machinekit-hal-xenomai package if it turns out to be undesirable to have the Xenomai libs installed. The C boilerplate (e.g. load_xenomai_plugin() and xenomai_descriptor_updater()) can be pulled out into rtapi_flavor.c and rtapi_flavor_loader.c (no such thing right now). The Submakefile boilerplate can also be templated. After that it's just a matter of splitting your rtapi/flavor/new_flavor.c file into two: one file with the can_run_flavor() check (linked to rtapi.so) and one with the other routines in it (linked to rtapi_flavor_loader.c and the thread system support libs). This is actually much, much easier than adding new flavors right now in the current master branch.

Don't forget, this is still WIP, and you're welcome to jump in and improve things!

@zultron
Copy link
Contributor Author

zultron commented Jul 13, 2019

Update: I've solved all problems with amd64 tests except for one regression in Jessie only. Right now, hm2-idrom test fails because realtime stop; realtime start too fast causes conflicts. When realtime stop finishes, the msgd:0 and rtapi:0 processes and any /dev/shm/linuxcnc* segments should should be completely cleared up, but that's not happening. I should be able to solve that tomorrow.

Next step after that will be to test on armhf architectures and to test Xenomai. The above Xenomai workaround does build, but is absolutely untested at this point.

@zultron
Copy link
Contributor Author

zultron commented Jul 13, 2019

BTW, I believe that regression is actually present in the current master branch. We never see it because it only happens in Jessie amd64 (and possibly i386), but not armhf, Stretch or Buster, so it neatly side-steps the Jenkins tests.

When it does show up, it shows up nondeterministically, starting in some random tests; once it hits, every following test will fail.

In Stretch and Buster, output formats may be combined with commas:

    $ ps -p 1 -o pid=,s=
        1 S

On Jessie, this does not work:

    $ ps -p 1 -o pid=,s=
      ,s=
        1

But separating them as separate options does:

    $ ps -p 1 -o pid= -o s=
        1 S

This patch fixes e.g. `rtapi:0` exit detection in the `realtime`
script.
@zultron
Copy link
Contributor Author

zultron commented Jul 13, 2019

Simple matter of the behavior of ps -o changing after Jessie, fixed in e898d13, also submitted separately in #228. Tests now pass for amd64_8 tag as well.

@cerna
Copy link
Contributor

cerna commented Jul 14, 2019

Don't forget, this is still WIP, and you're welcome to jump in and improve things!

OK, what about compiling the [.c/.h]/more flavor API hooks into .so of name {number}-flavor-{name}.so, which will be put into some specific Machinekit directory. In rtapi_flavor.c delete any hard-coded info about flavors (generally speaking, purge them from everywhere, some tests included), and do dlopen from folder of .so with specific names ordered by number like udev rules or if user specifically want some flavor until that flavor is found.

Packaging will be machinekit-hal + machinekit-hal-_flavor_, for example machinekit-hal + machinekit-hal-preempt-rt. That way, Machinekit-HAL will not care about flavors, you can add any flavor after Machinekit-HAL is already compiled, and you cannot install machinekit-hal-xenomai without libxenomai so the problem should be moot.

Can this work?

If so I can do it. (After beating the buildsystem into submission.)

@zultron
Copy link
Contributor Author

zultron commented Jul 14, 2019

OK, what about compiling the [.c/.h]/more flavor API hooks into .so of name {number}-flavor-{name}.so, which will be put into some specific Machinekit directory. In rtapi_flavor.c delete any hard-coded info about flavors (generally speaking, purge them from everywhere, some tests included), and do dlopen from folder of .so with specific names ordered by number like udev rules or if user specifically want some flavor until that flavor is found.

Packaging will be machinekit-hal + machinekit-hal-_flavor_, for example machinekit-hal + machinekit-hal-preempt-rt. That way, Machinekit-HAL will not care about flavors, you can add any flavor after Machinekit-HAL is already compiled, and you cannot install machinekit-hal-xenomai without libxenomai so the problem should be moot.

I like this kind of idea. Isolating anything flavor-specific into separate files was one of my goals for this project, of course. I considered going that last mile to remove that small mess from rtapi_flavor.c, but at the time (still, really) I was still just trying to achieve a POC, and the mess didn't seem too horrible. I hadn't thought to make flavors pluggable, though, very nifty.

Can this work?

That could work, but of course you'll have to build two modules for each flavor: One module that does the flavor check always gets loaded; the other module that does the real work only gets loaded after that flavor is selected.

You could also teach the autoconf system to look in e.g. m4/flavor/*.m4 so that adding a new flavor in-source would be a matter of four files.

If so I can do it. (After beating the buildsystem into submission.)

Go for it, sounds like fun. For me, I'm focusing on finishing basic functionality first, since my time is limited. I'll start testing ARM and Xenomai soon (waiting on a replacement for the USB UART adapter I left back in China). In the meantime, I'm working on getting LCNC to build against MK HAL, starting with some work done by @ArcEye and of course more beating on the build system.

@cerna
Copy link
Contributor

cerna commented Jul 14, 2019

That could work, but of course you'll have to build two modules for each flavor: One module that does the flavor check always gets loaded; the other module that does the real work only gets loaded after that flavor is selected.

Wait, why? Constructor function query the system if running threads of type is possible, i.e. calling the {flavor}_can_run_flavor local to solib at dlopen and if so then register the global flavor pointer. Check if the pointer is registered and if so, be done, if not, dlclose and dlopen another candidate. Given that there is only one plugin registered at any given time when running the rtapi_app, it should work. (True, I did not consider the possibility of two rtapi_app, each having it's own flavor, like posix and Xenomai or posix and Preempt_rt, because meh, it's highly irregular and better served with SCHED_WEAK rt-like thread in Xenomai or Preempt_rt flavor. [EVL Core has it, Xenomai has it, Preempt_rt by definition has it and what will come we will see.]) Or is there any reason why this should not work?

BTW, is there any reason why posix non-realtime flavor should always run? (I have seen it in some comment.) For me it is potential security risk, if it is selected by some reason over hard RT flavor (in my use case if someone deleted the .solib) and the user does not notice.

You could also teach the autoconf system to look in e.g. m4/flavor/*.m4 so that adding a new flavor in-source would be a matter of four files.

I could, but - don't laugh - I have never written a line of .m4 code.

EDIT:
But then I would have to change all functions of type:

int flavor_module_init_hook(flavor_descriptor_ptr f)
{
    SET_FLAVOR_DESCRIPTOR_DEFAULT();
    if (f->module_init_hook)
        return f->module_init_hook();
    else
        return 0;
}

Because on any given time there would be only one flavor_descriptor_ptr so that signature would not make sense. That's pretty big change, how are you feeling about it?

@zultron
Copy link
Contributor Author

zultron commented Jul 15, 2019

[...] Constructor function query the system if running threads of type is possible, i.e. calling the {flavor}_can_run_flavor local to solib at dlopen and if so then register the global flavor pointer. Check if the pointer is registered and if so, be done, if not, dlclose and dlopen another candidate.

The problem is as soon as you dlopen() a .so file that has libxenomai.so.0 linked to it, Xenomai's own init code runs, and if it doesn't find /dev/rtheap, it calls exit(). See the original problem description above. That's the reason I split up the xenomai.c source file.

BTW, is there any reason why posix non-realtime flavor should always run? (I have seen it in some comment.) For me it is potential security risk, if it is selected by some reason over hard RT flavor (in my use case if someone deleted the .solib) and the user does not notice.

POSIX threads should always be able to run. I've also often said that non-RT threads shouldn't be allowed to run without the user explicitly requesting them, but then I never did anything about it. (That actually happened once, but we actually didn't notice for quite a while because we were getting ~7us max jitter on an isolated J1900 core!)

But then I would have to change all functions of type: [...]

Is this still true if we have to split up the modules? I'll have to think about this after some sleep.

@cerna
Copy link
Contributor

cerna commented Jul 15, 2019

The problem is as soon as you dlopen() a .so file that has libxenomai.so.0 linked to it, Xenomai's own init code runs, and if it doesn't find /dev/rtheap, it calls exit(). See the original problem description above. That's the reason I split up the xenomai.c source file.

Didn't realize that the startup sequence I talked about in machinekit/machinekit#1462 would not work in Xenomai 2.6 because of no provisions from the development team. Theoretically it could be still possible to dlopen the Xenomai library only if the flavor finds working system, but then I would need to reference all used functions.

Point is to create encapsulated flavor module. (So Machinekit-HAL only cares/knows about one solib. Calls dlopen on it and if after return there is nothing registered, that's it.) If I have to use two solibs for one flavor, I will grouch, but it will be OK.

POSIX threads should always be able to run.

Yeah, well, English lesson received.

I've also often said that non-RT threads shouldn't be allowed to run without the user explicitly requesting them, but then I never did anything about it.

This could work that way. If the user do not install the flavor solib, which he will have to do explicitly, into the specified folder, then there will be no chance of loading it.

@cerna
Copy link
Contributor

cerna commented Feb 19, 2020

I think that this pull request spent enough time on the ice and that it is time to pull it into the fold. There is starting of the materialization of rot and regression given that the Machinekit-HAL master is 44 commits ahead. So to prevent further branch splitting (for lack of better terminus technicus) I propose merging the conflict and then merging the pull into Machinekit proper.

Discussing this with @luminize, we came upon the idea that the fact that no new .debs are currently pushed to main Machinekit repository actually may come handy as there may be enough time to solve any issues. (Not that I come upon any in my testing.)

(I am somehow biased about it as my work is building atop it.)

@cerna cerna changed the title DO NOT MERGE YET - Merge flavor builds: modules, rtapi_app, ulapi Merge flavor builds: modules, rtapi_app, ulapi Feb 19, 2020
@zultron
Copy link
Contributor Author

zultron commented Feb 21, 2020

I've had to let this one drop because of time constraints. If somebody wants to take it over, I would be very happy to have multiple thread flavors simultaneously installed in the same OS image again, a feature needed for portable Docker images that I use but currently must hack around packaging file conflicts introduced at some point.

I forgot exactly what is in this branch. The scope of the original PR (achieved) was merging modules into a single directory and the bonus goal of loading flavors as modules into a single, unified rtapi_app (also achieved). The main outstanding problem I remember was Xenomai threads didn't work on BeagleBone in my very brief tests. If this branch is still at that point, I bet it would be an easy job for @cerna and others to fix that problem and test.

I don't remember if this branch contains a big stack of commits that restore LCNC EMC API compatibility by reverting a number of minor API changes made since the MK fork. If those commits are in this branch, they would break the MK EMC application build, and would need to be removed.

Single module merge

Merging @cerna's conflict resolutions for changes in upstream master since 2019-07 or so
@cerna cerna merged commit 0e2d6d4 into machinekit:master Feb 29, 2020
@zultron zultron deleted the mk-hal-single-module-dir-zultron-pr branch September 6, 2020 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.