Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2778: @link doesn't play nice with glue code #3389

Merged
merged 3 commits into from Oct 9, 2023

Conversation

ekrich
Copy link
Member

@ekrich ekrich commented Jul 14, 2023

Edit: This latest version handles all the special setup used in nativelib and handles conditional compilation of libz in the javalib. The code is now simpler as well.

This allows for conditionally compiling glue code by using a deployment descriptor like the one used to determine which library is nativelib. Using this method removes the ugly code used for filtering the optional zlib as well.

The scala-native.properties "Descriptor" adds an entry nir.link.names = z which can also be a comma delimited list of libs.
Once this is done you need to allow for conditional compilation in your source files as follows where the name is
LINK_LIB_ + the lib name uppercased:

#ifdef LINK_LIB_Z
#include <zlib.h>
int scalanative_z_no_flush() { return Z_NO_FLUSH; }
// ...
#endif

If you have source files that are shared you can use #if defined(LINK_LIB_A) && defined(LINK_LIB_B) and all the other combinations of boolean logic to control compilation.

This cleans up the code considerably and allows us also to move the optional z.c file to the javalib. We no longer need to stringify our paths to just convert them back after "filtering". Some of the names of routines may need refinement now.

We have the potential here to allow libraries the ability to supply their own compile args. For example, clib should use C99 or some standard to compile and posixlib could perhaps use a POSIX conformance version. Now we use gnu11 for all C. Our static/dynamic library feature that needs linking could be done on the target platform versus have to setup a special project or the author having to compile for each platform. These are just a few of the things we could do but then the "Descriptor" risks starts to look like a declarative build system.

@LeeTibbert
Copy link
Contributor

Eric,

Nice to see you making progress on this. The concept looks good.

Seems like there would be less of a chance for name collision if the pre-processor macro ID prefix LINK_
were something like SCALANATIVE_LINK_, SCALAN_LINK_ or SN_LINK_`.

I believe that recent C standards say that the first 63 characters of a macro ID are taken as unique.
The first alternative would take 17 characters, that third would take only 8, three more than
you are currently using.

In another world, the PREFIX could be specified as an argument to @LINK. An interesting
design concept and worth ruling out: I think it would be gilding the lily and stressing
limited implementation cycles.

@ekrich
Copy link
Member Author

ekrich commented Jul 17, 2023

We can change the prefix as needed for sure. Unlikely, to get a conflict but a longer prefix probably would be better although a little less pretty in the code. Maybe the SCALANATIVE_LINK_ to match the GC #ifdefs.

I had a discussion with Arman on Discord so I can summarize my thoughts.

  1. Even if two Scala Native libraries/artifacts use the same C lib to link it wouldn't be too bad except that some code could compile that wouldn't be used - shouldn't cause any errors like we have now.
  2. If we got the package and link info from the NIR and then somehow made the define to match that, perhaps it could be package specific and avoid compiling for that dependency if not used. In reality, the code is compiled but essentially empty via the #ifdef.

Note: somehow immix-commix/GCRoots.c has no conditional guards. Not sure why not or how that actually works.

@LeeTibbert
Copy link
Contributor

It dawned on me in the wee hours that I missed the obvious prefix: SNL_ (ScalaNativeLink), drum roll, tada!

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems interesting, thank you for making this draft.
It makes sense that the special handling would be enabled on demand when properties are defined. So in most of glue code we would compile everything without any changes to the projects.

tools/src/main/scala/scala/scalanative/build/Filter.scala Outdated Show resolved Hide resolved
tools/src/main/scala/scala/scalanative/build/Filter.scala Outdated Show resolved Hide resolved
@ekrich
Copy link
Member Author

ekrich commented Jul 18, 2023

Notes:

# could use target triple in keys to have options for different platforms and architectures
# https://clang.llvm.org/docs/CrossCompilation.html#general-cross-compilation-options-in-clang
# triple specification <arch><sub>-<vendor>-<sys>-<env> Use * for any, can drop env

# from the pom specification
project.groupId = org.scala-native
project.artifactId = javalib

# configuration for glue code
# macro define is SCALANATIVE_LINK_Z if library @link("z") annnotation is used (found in NIR)
# comma delimited libraries used
nir.link.names = z

@ekrich ekrich marked this pull request as ready for review July 18, 2023 22:50
@ekrich
Copy link
Member Author

ekrich commented Jul 18, 2023

If we decide we want to go with this, we will need some documentation, perhaps experimental as it may be subject to change.

@WojciechMazur
Copy link
Contributor

If we decide we want to go with this, we will need some documentation, perhaps experimental as it may be subject to change

I think we can go with this feature for now. Can you please add mentioned documentation?

@armanbilge
Copy link
Member

Sorry to be commenting a bit late. Firstly, thanks for tackling the issue I reported, I really appreciate the effort and ingenuity here 👏 I want to take this opportunity to share a couple high-level thoughts.

Converting link-time reach-ability to C pre-processor definitions is very, very smart. I think this is an excellent idea and it makes a lot of sense. Depending on what Scala code is being linked or not we can conditionally include/exclude C code. 💯 this is exactly what we want.

I also have a controversial opinion: I believe that the Scala Native toolchain should get out of the business of trying to add -l flags. I know this is supposed to be a convenience, but it has some caveats and I am seriously considering removing all instances of @link from the libraries I maintain.

Here are two pain points that I have encountered with it.

  1. If you have both dynamic and static versions of a library installed on your system, if you use the -l flag then clang will always link against the dynamic one. So if the Scala Native toolchain is adding the -l flag for you then the only way to build a static binary is to change your build environment which is extremely annoying.

  2. Sometimes there are multiple libraries that implement the same binary API. A great example is OpenSSL vs LibreSSL vs BoringSSL vs Apple CommonCrypto Framework. If the Scala Native toolchain is adding the -l flag for you then it is forcing a particular library and removing the user choice.

Essentially, the user should be in charge of these important decisions. When we use @link to add -l flags, we take away the flexibility for the user to do static linking or use an alternative but equivalent library.

Furthermore I feel like we keep circling around this issue of creating some kind of build tool in the Scala Native toolchain e.g. that can dynamically change the flags based on the target triple. The thing is, in Scala ecosystem we already have several build tools! It's already an option today for a developer to create an sbt/mill plugin that has this sort of complex logic for detecting the build environment and adding the appropriate flags. Instead of trying to design another build system, I think we should take advantage of the ones we already have.

@ekrich
Copy link
Member Author

ekrich commented Jul 21, 2023

Converting link-time reach-ability to C pre-processor definitions is very, very smart. I think this is an excellent idea and it makes a lot of sense. Depending on what Scala code is being linked or not we can conditionally include/exclude C code. 💯 this is exactly what we want.

Thanks.

I also have a controversial opinion: I believe that the Scala Native toolchain should get out of the business of trying to add -l flags. I know this is supposed to be a convenience, but it has some caveats and I am seriously considering removing all instances of @link from the libraries I maintain.

Link flags are something the end developer can manage but we should probably have the ability to not add them automatically. Auto is super nice for the default case - the end user should just be able to install a C library and go - and not have to wrangle to get things to work.

Compile flags should probably be controlled to some extent from the project (downloaded artifact) we are compiling. So far we can use the Descriptor for that part but it could almost become a language especially if you have to work with different options for different triples. After the project is published we just get the code so we have to have something to tell us what to do so I don't see much choice here.

Furthermore I feel like we keep circling around this issue of creating some kind of build tool in the Scala Native toolchain e.g. that can dynamically change the flags based on the target triple. The thing is, in Scala ecosystem we already have several build tools! It's already an option today for a developer to create an sbt/mill plugin that has this sort of complex logic for detecting the build environment and adding the appropriate flags. Instead of trying to design another build system, I think we should take advantage of the ones we already have.

I kept circling around this issue for a long time too trying to find something appropriate. I think there are competing interests so we are trying to thread that needle.

We are trying to make everything downstream from the compiler plugin native (the tools native initiative). The native idea might allow us access to native LLVM like in bindgen so we could access simple macros and generate interopt code on the fly for the target triple of interest. As far as I know it seems that Swift does this. We could call make or CMake etc. but this may be too high a burden. We have worked really hard to decouple the tool chain from the build system to make integration which different build systems really easy.

Comment on lines 461 to 471
# configuration for glue code
# defines SCALANATIVE_LINK_Z if @link("z") annnotation is used (found in NIR)
# libraries used, comma delimited
nir.link.names = z

3. Now in your native "glue" code add the following. The macro is named
``SCALANATIVE_LINK_`` plus the uppercased name of the library.

.. code-block:: c

#ifdef SCALANATIVE_LINK_Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should decouple this feature from the @link annotation. As I mentioned, it's a really smart idea and I believe it can be useful in situations not involving linking to libraries.

For example, consider some bindings to a system/OS API that require some glue code. Maybe these APIs are only available in newer systems or otherwise optional. Or perhaps it's a non-trivial amount of glue code. io_uring is a great example of this: it's only available in newer Linux kernels and requires substantial glue (see armanbilge/fs2-io_uring#62).

This is a case where (1) conditional compilation based on linktime reachability is valuable but (2) we are actually not linking to a particular library with an -l flag.

So perhaps it would be interesting to introduce a new annotation e.g. @define("USER_CHOOSES_MACRO_NAME"). If the target of that annotation is linktime reachable, then the macro is defined.

(I'm having a deja vu that maybe a similar idea like this was previously discussed 🤔 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bring up some good points but it is hard to know all the ins and outs without a trial. This does put a small burden on the library developer but could really help the users as you adamantly pointed out in the original issue.

  1. It should fix most of the cases you brought up.
  2. It democratizes the feature so the internal build can use it plus users
  3. It allow us to move the optional z lib to javalib where it belongs, See number 2
  4. It simplifies the build by removing a bunch of filtering code
  5. Downside: the optional z code does not get compiled / executed unless a scripted or other test checks it out. I hate to add one but maybe we need one?

I think we should decouple this feature from the @link annotation. As I mentioned, it's a really smart idea and I believe it can be useful in situations not involving linking to libraries.

Well this is how this works now because if the link is in the NIR we can add the -D so the code compiles and the -lxx gets added to the link phase. This is actually a pretty cool side effect or feature of dead code elimination with NIR optimization.

I am not sure if you can just omit the @link but if you can then this feature would not help or affect the code at all. We can't look at the link flags because the users can put anything there.

For example, consider some bindings to a system/OS API that requires some glue code. Maybe these APIs are only available in newer systems or otherwise optional. Or perhaps it's a non-trivial amount of glue code. io_uring is a great example of this: it's only available in newer Linux kernels and requires substantial glue (see armanbilge/fs2-io_uring#62).

I think there are kernel versions and such that you can use to conditionally compile. Maybe?
https://stackoverflow.com/questions/56485820/conditional-compilation-based-on-functionality-in-linux-kernel-headers

I am not sure what other features could help in this area. I would be glad to look at specific issue with you and see if we could do more. The tricky thing is deciding how to give flexibility without too much complexity and to allow the simple use cases too.

I could think of a general way to handle the GC code or I would have done that too :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am saying we should make it possible to add the -D without the -l. There's no reason that adding macro definitions needs to be coupled to library linking. It's an independent and useful feature in its own right, which is exactly why I like it so much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could add all sorts of things to the descriptor I think without adding too much complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and I think you could leave out the @link annotation in your code and add the conditional compilation. Then have the users add the -DSCALANATIVE_LINK_<LIB> command and add the -l<lib> option if decide to use the feature. I haven't tried this but you could try in out even without this feature.

@armanbilge
Copy link
Member

armanbilge commented Jul 22, 2023

we should probably have the ability to not add them automatically. Auto is super nice for the default case - the end user should just be able to install a C library and go

That's a good point, I agree. The pain points I raised could be addressed by adding a configuration to prevent @link from adding flags automatically.

We have worked really hard to decouple the tool chain from the build system to make integration which different build systems really easy.

Yes, and we should keep that way. But if we keep down this road we will end up creating a build system inside the Scala Native tools, and we should really try to avoid that 😁

That's why I think that for any advanced manipulation of compiler flags, developers should just create sbt/mill plugins. This allows them to support arbitrarily complex logic. We already have good build tools, we should lean into them. I realize it's a bit unusual to add a library dependency by enabling a plugin in your build, but I don't think we should shy away from this if it best supports Scala Native's unique needs. Native is hard.

@ekrich
Copy link
Member Author

ekrich commented Jul 25, 2023

@WojciechMazur I decided to add a page just for native code and fixed some of the links in that section.

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 27, 2023

That's why I think that for any advanced manipulation of compiler flags, developers should just create sbt/mill plugins.

I strongly concur that SN should not be setting up its own complicated (and wrong) build system.

Years ago I was trying to speed up the build by consolidating all the .o files SN creates into one or more
archive files. That would give faster symbol lookup and eliminate unused symbols. I ran aground on the
fact that at that time SN always appended (to the right of main.o) any compiler/linker flags given in the SBT
file. I requested an option to specify options (such as -L) to the left of the main.o. I think that never happened.
I believe that with those two, a lever, and a place to stand, any fancy compiler/link could be done in the
users SBT file. SN would be out of the game. True, the developer of the .sbt file would need to know what
they were doing in their own .sbt file. No spooky action under the covers by SN.

Where is the much un-loved "make" when you need it?

@armanbilge
Copy link
Member

Btw, it's worth pointing out that sbt and mill plugins for configuring flags already exist. For example:

For both plugins users already have to manually specify which native libraries they are using.

The point I'm trying to make is that @link is bringing limited value: even though it automates adding -l flag it cannot automate the installation or configuring of the library. So the user must still be manually involved anyway, possibly with the help of an sbt/mill plugin that is also fully capable of adding the -l flag.

So, on the one hand, @link doesn't fully solve the problem. On the other hand, it makes it difficult to use static linking or link to a different library with identical ABI.

For me, this is a convincing argument to stop using @link in my libraries. It doesn't fully solve the problem since user still has to do extra work, and in many cases it gets in the way when the user wants to customize.

@WojciechMazur
Copy link
Contributor

WojciechMazur commented Aug 4, 2023

What @armanbilge pointed our really makes sense. However, I really wish that we would be able to maintain the easy entrylevel, and for that the default libs makes sense. I think that the good middle ground would be giving a choice for more advanced users. I have one idea that might be tested out:

  • be default @link works as it is right now

  • we add a new entry to NativeConfig autoLinkNativeLibraries: Boolean = true, when disabled the user would be in charge of populating the list of required libraries. This way the build tools can populate required libraries

  • alternative implementation to the previous point is having a libraryLinkMode: LibraryLinkMode = LibraryLinkMode.Auto:

     enum LibraryLinkMode: 
       case Auto   // automatic linking based on @link
       case Manual // discard @link results, manual linking
       case Mapped(libraryMapping: PartialFunction[String, Seq[String]]) // apply mapping for each entry of @link

    Automatic resolution of libs still can make sense in some environments, eg. if we build in well know environment with exactly 1 known implementation of given common ABI, or when having them in a common directory passed using -L flag to compiler.

@ekrich
Copy link
Member Author

ekrich commented Aug 8, 2023

This at least solves the compile issue (bug) if the auto lib is not used and the user puts the dependency is in the classpath and glue code would be compiled.

I think this will still work even if we go ahead with one of the above suggestions. There may need to be a different mapping depending what we want to do later. We could look for the included link and or the ones added depending on the settings perhaps as shown above.

Right now the link lib is not associated with the backend lib. It doesn't affect anything here but would mean that if multiple libs link the same native lib then all the glue code would be compiled for all libs that are participating with the conditional compilation using this feature. This PR focuses on the compile phase (which the user can't really control directly in a meaningful way) rather than the link phase (which the user can control).

I would really like to see the ability for the library author to have more control over compile options and we certainly need to open up the link phase for things like static linking for the end user and maybe dependency plugins. So if a plugin would install and compile dependencies then the libs could be passed to the native plugin I suspect.

So far a properties file type approach is ok for the library author (for compile options) but we could decide at some point to use CMake or something else as an advanced option if needed.

@WojciechMazur
Copy link
Contributor

Even though in general this change is controversial I'll say to currently solution. As pointed out it would be 1 less in the toolchain.

When it comes to giving access to more advanced settings, for example by usage of custom CMake I'll be strongly against it if would be propagated in Jar and out of the end user control, as it imposes severe security concern.
In general the native-sources feature should be only used for compilation of glue code or parts that cannot be expressed in Scala. All the other stuff should be published as standalone library and should not be a concern for Scala Native toolchain.

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though in general this change is controversial I'll say to currently solution. As pointed out it would be 1 less in the toolchain.

IMHO this PR is overly complicated to address that issue. I just opened #3427 which extracts the best idea from this PR (automatically adding macro definitions to compile flags) and solve that bug. The implementation is straightforward and it is easy for the user to understand how it works. It doesn't involve any properties files or complex configuration. Finally, it is a feature independent of the @link annotation.

@ekrich
Copy link
Member Author

ekrich commented Aug 16, 2023

Just to clarify, this setting nir.link.names = z is saying "If I find the link z, create a define SCALANATIVE_LINK_Z" as described in the docs and above. This is the same but safer than a generic unbounded addition of a @define as it reduces the potential for conflict between native dependencies (ones that have native glue code).

IMHO this PR is overly complicated to address that issue.

The PR mentioned is more complicated by lines of code. Most of the LOC here are documentation. We do add reading the descriptor file as well as simplification. We add one switch to determine conclusively which project has the GC code via the artifact info which is guaranteed unique via the Maven coordinates which is the only current use for these settings.

This fixes the issue at hand and only affects the settings for the dependency native code being compiled. It opens up very little for the developer to abuse and can be enhanced to handle the LibraryLinkMode above or other situations we encounter like alternative link targets specific to the project being compiled. It is an implementation detail after the setting by the user.

The descriptor is declarative and only allows what the tools code allows and is not a build system. It is possible to open more settings and affect the compile but only for the library/dependency in question and only for the glue code being compiled for each native file. Even at the extreme it is far from a build system; only the ability to set settings for the compile command of the dependency being compiled (that is recognized and allowed if needed) similar to the NativeConfig object which is best used for global and link settings.

I have been involved in the glue code since the beginning and have been thinking about how to improve the build code and project organization by combining C code and Scala code together as the Scala Native project has progressed. I have also been involved in most of the compile phase improvements of the build including the build by project and aggregation of files to pass to the link phase as well as this latest -Ddefine stuff for the GC and this PR. I also believe I have a solution for the compiled in min and max memory but was prioritizing on this first as the complaints were quite pronounced. I also spearheaded the removal of build logic from the sbt build so that our tools code could be more easily integrated in Mill and scala-cli. I think scala-cli has their own code but perhaps they can use ours directly in the future.

My overall point is that this work has been done after research, thinking, and listening to all the Scala Native build complaints. Currently, I believe the build descriptor is the best way forward IMHO and is extensible to other features as currently requested.

@ekrich
Copy link
Member Author

ekrich commented Aug 17, 2023

FWIW, I was hoping to extend the Descriptor idea for Scala Native Library projects so when the build finds a descriptor for a library it could do the right thing. Not sure if that is possible but if it was we could publish them to Maven Central and it would just work.

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current form it looks quite simple and opens new possibilities. I think that in long term we should expand the descriptor to tell us how to compile the native code, eg. pass extra compilation options which would be fixed and limited to single jar.
I think we should merge both approaches to handling definitions, this solution would be great for fixed compilation, while the @define approach would allow for dynamic changes based on the linked code and taken execution paths and more freedom when it comes to specifying definition names. For now they can coexist. Let's observe how they behave outside of core library.

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am still not so sure about this PR in its current state. It is introducing the concept of Maven Coordinates into the Scala Native toolchain which should be agnostic to the dependency scheme or repository format.

I've repeated my primary concern below, but see also my concerns iterated in #3427 (comment).

docs/user/native.rst Outdated Show resolved Hide resolved
@WojciechMazur WojciechMazur merged commit 6952d99 into scala-native:main Oct 9, 2023
80 of 81 checks passed
@@ -0,0 +1,12 @@
# output for debugging
project.orgainzation = org.scala-native
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit too late, but I've just spotted a typo project.orgainzation -> project.organization

@@ -1,3 +1,5 @@
#ifdef SCALANATIVE_LINK_Z
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't now how I missed this one also, but now we have bindings defined in different project then their implementation. They shall be defined in the same project. I've raised a question how to deal with them in #3555

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to temporaly move them back to nativelib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do a followup PR. I moved the z.c to the javalib to follow the bindings with the Scala code that calls them but missed it on rebase,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that I'll recover that in rebase of Armans PR

@ekrich ekrich deleted the topic/glue-code branch October 9, 2023 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants