New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: internal_export #7407
base: master
Are you sure you want to change the base?
Feature: internal_export #7407
Conversation
Feature `internal_export` is added as experimental. Simply, feature looks at functions tagged with "internal_export" attribute and adds them to the list of exports (while keepining "internal_export" attribute as it is). This is just to avoid code duplication. This way it will be easier to support internal_exports as a top level beam chunk if some day it gets implemented.
New predefined variable IX added - list of all functiones tagged with both "internali_export" and "export". IX is available in both modes. Two new analysis types are defined: "internal_use" and "illegal_internal_use" - ONLY AVAILABLE IN FUNCTIONS MODE (because they depend on predefined variable E). "internal_use" - all calls to functions from IX "illegal_internal_use" - all internal uses that are not allowed IX is a subset of already existing variable X, contatining only exported functions marked as internal_export. This might change in the future, if internals become top-level chunk, not only annotation for exported functions. This variable is crutial for catching unwanted calls to internal functions.
Most of the work is in single test case, because there is a lot of setup to be done, so I find it convenient. There are some comments for clarification. Transitive and circular dependencies are tested.
Described IX variable and "internal_use" and "illegal_internal_use" analyses. Described the difference between legal and illegal calls to internally exported functions.
CT Test Results 4 files 382 suites 50m 21s ⏱️ For more details on these failures, see this check. Results for commit 87f3a83. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
@MarkoMin, I haven't looked over the implementation, but as a fellow Erlang user, I think this design solves a genuine issue where users of a library can take dependencies on parts of the library code which the library authors want the flexibility to change in the future. I've seen several workarounds for the lack of this feature in the wild, such as splitting libraries into two applications: the "API" application and the "private logic" application, then rely on enforcing application dependencies. The approach you have here is more natural and convenient, which can only be a good thing relative to those other options, in my mind, because keeping a growing codebase well-structured is a difficult task. My gut says static checks are enough for most use cases. That isn't to say you shouldn't add runtime checks, but more that what you have outlined here is already really good :) |
The only part possibly problematic to me is the "and its dependencies". I understand the need, but it generates interesting problems. The other question I would have is why not make this the default for all not exported functions? I understand that it would be an invariant break compared to today's reality, but it is one that cannot break anything, as it is only opening existing "private" functions that were by definition not called from anywhere before. |
I'd just like to add that at work we already have a similar static analyzer check: there's a separate configuration file specifying what are the API modules - and if an other application calls an exported function of a non-API module, the static analyzer triggers a warning. An attribute inside the Erlang module is a cleaner implementation. |
This is probably not a great idea. Many tools (dialyzer and xref, for instance) rely on the knowledge that certain functions cannot be used outside of the module that's under analysis. That allows them to more accurately detect if there are type discrepancies, lack of function usage, etc. While the tools can be adjusted to account for the extended scope, I'm pretty sure they will either become either noticeably slower or more imprecise (depending on what the maintainers choose to do to account for possible calls from the outside of the module in question). |
Deps can be tricky indeed. I tested circular and transitive dependencies, but more edge-cases can exists. Do you have any concrete example? The thing is, when I did the algorithm for call checking, I prepared the stage for dynamic checks. Stuff about dependencies can be removed if:
That would simplify the solution even more, resulting in a very short description for this feature: "Internally exported functions can be statically called only from within the same application". |
@MarkoMin Shouldn't this be an EEP proposal, with this PR as the reference implementation? AFAICS, you have written up most of what is needed in the opening post already. The discussion seems to be more appropriate there and, this being a change in/near the core of the language itself, the OTP team/technical board will probably require one anyway 😅 |
I can create one if proven necessary, its not a problem and shouldn't take so long. At this point, when I already submitted a PR, I'll wait for feedback from Core team and then create an EEP if needed. |
@MarkoMin I think it is better to create the EEP so that we can flesh out details. The PR can then be mostly a conversation about design and implementation of the code. |
@kikofernandez Just submitted the EEP |
As EEP-67 has been rejected on the basis that documentation attributes can provide this functionality, using We thank you for your effort. |
I have re-open the PR because most of the implementation refers to |
Internal export
What it is
internal_export
is an attribute that gives you more control over function exporting. In some cases, functions are only exported for internal usage, for testing purposes or because you must implement behavior callbacks. Those functions are not meant to be used by the end user of your application and they shouldn't exists in public interface of a module. Using them is currently considered bad practice, but with this feature it can be integrated into testing pipelines.Best description of this problem I've found is from dr. sc. Richard A. O'Keefe: "there are many “applications” in the Erlang/OTP system which consist of a flotilla of modules. A module in such a flotilla cannot export it to another member or other members of the flotilla without exporting it to absolutely every module"
By internal usage I mean used by some module that belongs to the application. Module belongs to application if it is listed in
modules
property in application resource file. Most of Erlang programmers use LSP which by default suggest all exported functions - even undocumented ones, functions documented with%% testing only
and callbacks. The user can, on purpose or not, use those functions it won't get noticed by any tool. Also, user can in the same way use all undocumented modules - a quick script showed that over 66% of modules inkernel
,stdlib
,mnesia
,ssh
,ssl
,crypto
andinets
are undocumented and therefore considered internal.By combining
internal_export
attribute andxref
analysis, it is possible to detect static calls to functions marked asinternal_export
and decide whether they are legal or not.Motivation
Current state is that a function is either exported or not. Although very simple, I think this mechanism lacks some control over scoping those exports because of already mentioned reasons. I eventually started a thread some time ago and I'd say it received a good feedback. EEP05 was the main source of inspiration, but I changed some things in design (explained in the next chapter).
This PR is related only for static analysis, but I also plan to work on dynamic checks if the static part receives a good feedback.
Design & implementation
How much control do we want?
In EEP05, it is proposed that the user should manually specify the scope of a function export. That would allow fine-grained control over exporting a function, but would potentially bloat the source file and produce some amount of boilerplate. It would also require maintenance each time new modules are added. Ut would require developers to know more stuff about the dependencies they're using - e.g. if you want to internally export
handle_call/3
- do you export it togen_server
which defines it orgen
which calls it dynamically? Do you really need to know that information?I propose a mechanism where you only declare functions that are not meant to be used by the user - callbacks and internal functions. Those functions will be available to all modules in your application and its dependencies. Dependencies of an application are applications listed in applications, included_applications or optional_applications in the application resource file including their own dependencies (e.g. if A depends on B and B depends on C then A also depends on C).
I didn't find the use-case where you want to export a function to a module/application that is not captured by the mechanism above. Counterexamples are welcome.
Exporting to modules or to applications?
There is an open question whether to export functions to modules or to applications. Exporting to modules is currently proposed in EEP05.
If we don't allow user to declare additional scope - then it doesn't really matter (at least for static analysis), it is just the matter of implementation. It is easier to implement it by exporting to applications. If we do allow user to declare additional scope, specifying applications would be easier to maintain. Otherwise, if you want to export functions internally for your application, you'd have to enumerate all modules in the application, which would be pain. Another example is exporting a callback - if dynamic checks ever get implemented, that would require you to know exactly which module is calling your function (e.g. you'd need to export
handle_call/3
togen
which shouldn't be directly used, notgen_server
). That could also cause backward incompatibilities if some internal changes occur ingen_server
. User shouldn't rely on internal implementation.Notice that it is possible to convert current approach to module-wise by just replacing applications with value of
modules
property from application resource file.Considering dynamic call checking, tradeoff must be made. Application-wise approach would be significantly less memory hungry, but would eventually introduce more overhead to dynamic call checks because you'd need to map caller and callee modules to the applications they belong to and then decide whether the call is legal or not. At the moment I can't really estimate how big this would be, but I guess pretty high / unacceptable. On the other side, module-wise approach would be very memory hungry. It would take O(1) per MFA as suggested in EEP05, but due to the current mechanism, there would be a lot of unused entries that wouldn't be used even once, but would clog the VM.
For now, my best idea is to combine both - store information about applications, but cache the results. That would make large part overhead disappear from every-but-first dynamic check, but would introduce moderate memory consumption. Any suggestions about dynamic checks are very welcome.
Xref modifications
New predefined variable
IX
added - list of all functions tagged with bothinternal_export
andexport
. It is a subset of already existing variable X and is available in bothxref_mode
s.Two new analysis types are defined:
internal_use
andillegal_internal_use
- ONLY AVAILABLE IN FUNCTIONS MODE (because they depend on predefined variable E).internal_use
- all usages of functions from IXillegal_internal_use
- all usages of functions from IX which are not allowedSyntax
Syntax is simple:
I considered shorter name -
internal
, but find it kinda confusing because we refer to internal functions in another way. Also, original nameexport_to
was the option, but it's also confusing since we don't explicitly state to what we're exporting.Usage
To test your applications you need to add them to
xref
server. After that, you can call requestillegal_internal_use
analysis to return a list of all calls to internal functions that aren't allowedPros and cons
%%hidden
and modules with only internal exports can be excluded from documentationSummary
Even without dynamic checks, I think this is quite good feature to have because you can force developers not to rely on something they shouldn't.
If proven useful, I plan to extend this mechanism to support checks at runtime. Static calls can be checked in loader, as proposed in EEP05. Dynamic calls would be way harder to implement and it is questionable do we want to sacrifice performance and simplicity for those checks.
Every critic is very welcome