Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: internal_export #7407

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

MarkoMin
Copy link
Contributor

Internal export

What it is

internal_export is an attribute that gives you more control over function exporting. In some cases, functions are only exported for internal usage, for testing purposes or because you must implement behavior callbacks. Those functions are not meant to be used by the end user of your application and they shouldn't exists in public interface of a module. Using them is currently considered bad practice, but with this feature it can be integrated into testing pipelines.

Best description of this problem I've found is from dr. sc. Richard A. O'Keefe: "there are many “applications” in the Erlang/OTP system which consist of a flotilla of modules. A module in such a flotilla cannot export it to another member or other members of the flotilla without exporting it to absolutely every module"

By internal usage I mean used by some module that belongs to the application. Module belongs to application if it is listed in modules property in application resource file. Most of Erlang programmers use LSP which by default suggest all exported functions - even undocumented ones, functions documented with %% testing only and callbacks. The user can, on purpose or not, use those functions it won't get noticed by any tool. Also, user can in the same way use all undocumented modules - a quick script showed that over 66% of modules in kernel, stdlib, mnesia, ssh, ssl, crypto and inets are undocumented and therefore considered internal.

By combining internal_export attribute and xref analysis, it is possible to detect static calls to functions marked as internal_export and decide whether they are legal or not.

Motivation

Current state is that a function is either exported or not. Although very simple, I think this mechanism lacks some control over scoping those exports because of already mentioned reasons. I eventually started a thread some time ago and I'd say it received a good feedback. EEP05 was the main source of inspiration, but I changed some things in design (explained in the next chapter).

This PR is related only for static analysis, but I also plan to work on dynamic checks if the static part receives a good feedback.

Design & implementation

How much control do we want?

In EEP05, it is proposed that the user should manually specify the scope of a function export. That would allow fine-grained control over exporting a function, but would potentially bloat the source file and produce some amount of boilerplate. It would also require maintenance each time new modules are added. Ut would require developers to know more stuff about the dependencies they're using - e.g. if you want to internally export handle_call/3 - do you export it to gen_server which defines it or gen which calls it dynamically? Do you really need to know that information?

I propose a mechanism where you only declare functions that are not meant to be used by the user - callbacks and internal functions. Those functions will be available to all modules in your application and its dependencies. Dependencies of an application are applications listed in applications, included_applications or optional_applications in the application resource file including their own dependencies (e.g. if A depends on B and B depends on C then A also depends on C).

I didn't find the use-case where you want to export a function to a module/application that is not captured by the mechanism above. Counterexamples are welcome.

Exporting to modules or to applications?

There is an open question whether to export functions to modules or to applications. Exporting to modules is currently proposed in EEP05.

If we don't allow user to declare additional scope - then it doesn't really matter (at least for static analysis), it is just the matter of implementation. It is easier to implement it by exporting to applications. If we do allow user to declare additional scope, specifying applications would be easier to maintain. Otherwise, if you want to export functions internally for your application, you'd have to enumerate all modules in the application, which would be pain. Another example is exporting a callback - if dynamic checks ever get implemented, that would require you to know exactly which module is calling your function (e.g. you'd need to export handle_call/3 to gen which shouldn't be directly used, not gen_server). That could also cause backward incompatibilities if some internal changes occur in gen_server. User shouldn't rely on internal implementation.

Notice that it is possible to convert current approach to module-wise by just replacing applications with value of modules property from application resource file.

Considering dynamic call checking, tradeoff must be made. Application-wise approach would be significantly less memory hungry, but would eventually introduce more overhead to dynamic call checks because you'd need to map caller and callee modules to the applications they belong to and then decide whether the call is legal or not. At the moment I can't really estimate how big this would be, but I guess pretty high / unacceptable. On the other side, module-wise approach would be very memory hungry. It would take O(1) per MFA as suggested in EEP05, but due to the current mechanism, there would be a lot of unused entries that wouldn't be used even once, but would clog the VM.

For now, my best idea is to combine both - store information about applications, but cache the results. That would make large part overhead disappear from every-but-first dynamic check, but would introduce moderate memory consumption. Any suggestions about dynamic checks are very welcome.

Xref modifications

New predefined variable IX added - list of all functions tagged with both internal_export and export. It is a subset of already existing variable X and is available in both xref_modes.
Two new analysis types are defined: internal_use and illegal_internal_use - ONLY AVAILABLE IN FUNCTIONS MODE (because they depend on predefined variable E).

  • internal_use - all usages of functions from IX
  • illegal_internal_use - all usages of functions from IX which are not allowed

Syntax

Syntax is simple:

-internal_export([f/a,...])

I considered shorter name - internal, but find it kinda confusing because we refer to internal functions in another way. Also, original name export_to was the option, but it's also confusing since we don't explicitly state to what we're exporting.

Usage

To test your applications you need to add them to xref server. After that, you can call request illegal_internal_use analysis to return a list of all calls to internal functions that aren't allowed

Pros and cons

Pros Cons
Safer codebase - you can detect calls to functions that shouldn't be used Added complexity and responsibility for developers
Spawn problem - I'd like to cite dr. Richard A. O’Keefe here: "if we wish to spawn a call to a function, that function must be exported even if we have no intention whatsoever of making it available to other modules" Overhead (only for dynamic checks)
Documentation - no more need to mark functions with %%hidden and modules with only internal exports can be excluded from documentation
LSP integration - internal functions not shown when consuming dependencies
CI/CD pipeline integration
Shell integration - Autocomplete could separate exports and internal exports
Loader integration - do the analysis in code loader

Summary

Even without dynamic checks, I think this is quite good feature to have because you can force developers not to rely on something they shouldn't.
If proven useful, I plan to extend this mechanism to support checks at runtime. Static calls can be checked in loader, as proposed in EEP05. Dynamic calls would be way harder to implement and it is questionable do we want to sacrifice performance and simplicity for those checks.

Every critic is very welcome

Feature `internal_export` is added as experimental.
Simply, feature looks at functions tagged with "internal_export" attribute and adds them to the list of exports (while keepining "internal_export" attribute as it is).
This is just to avoid code duplication.
This way it will be easier to support internal_exports as a top level beam chunk if some day it gets implemented.
New predefined variable IX added - list of all functiones tagged with both "internali_export" and "export".
IX is available in both modes.
Two new analysis types are defined: "internal_use" and "illegal_internal_use" - ONLY AVAILABLE IN FUNCTIONS MODE (because they depend on predefined variable E).
    "internal_use" - all calls to functions from IX
    "illegal_internal_use" - all internal uses that are not allowed
IX is a subset of already existing variable X, contatining only exported functions marked as internal_export.
This might change in the future, if internals become top-level chunk, not only annotation for exported functions.
This variable is crutial for catching unwanted calls to internal functions.
Most of the work is in single test case, because there is a lot of setup to be done, so I find it convenient.
There are some comments for clarification.
Transitive and circular dependencies are tested.
Described IX variable and "internal_use" and "illegal_internal_use" analyses.
Described the difference between legal and illegal calls to internally exported functions.
@github-actions
Copy link
Contributor

github-actions bot commented Jun 15, 2023

CT Test Results

       4 files     382 suites   50m 21s ⏱️
2 639 tests 2 590 ✔️ 48 💤 1
7 040 runs  6 988 ✔️ 51 💤 1

For more details on these failures, see this check.

Results for commit 87f3a83.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@TD5
Copy link
Contributor

TD5 commented Jun 16, 2023

@MarkoMin, I haven't looked over the implementation, but as a fellow Erlang user, I think this design solves a genuine issue where users of a library can take dependencies on parts of the library code which the library authors want the flexibility to change in the future. I've seen several workarounds for the lack of this feature in the wild, such as splitting libraries into two applications: the "API" application and the "private logic" application, then rely on enforcing application dependencies. The approach you have here is more natural and convenient, which can only be a good thing relative to those other options, in my mind, because keeping a growing codebase well-structured is a difficult task.

My gut says static checks are enough for most use cases. That isn't to say you shouldn't add runtime checks, but more that what you have outlined here is already really good :)

@DianaOlympos
Copy link
Contributor

The only part possibly problematic to me is the "and its dependencies". I understand the need, but it generates interesting problems. The other question I would have is why not make this the default for all not exported functions? I understand that it would be an invariant break compared to today's reality, but it is one that cannot break anything, as it is only opening existing "private" functions that were by definition not called from anywhere before.

@NAR
Copy link
Contributor

NAR commented Jun 19, 2023

I'd just like to add that at work we already have a similar static analyzer check: there's a separate configuration file specifying what are the API modules - and if an other application calls an exported function of a non-API module, the static analyzer triggers a warning. An attribute inside the Erlang module is a cleaner implementation.

@elbrujohalcon
Copy link
Contributor

The other question I would have is why not make this the default for all not exported functions? I understand that it would be an invariant break compared to today's reality, but it is one that cannot break anything, as it is only opening existing "private" functions that were by definition not called from anywhere before.

This is probably not a great idea. Many tools (dialyzer and xref, for instance) rely on the knowledge that certain functions cannot be used outside of the module that's under analysis. That allows them to more accurately detect if there are type discrepancies, lack of function usage, etc. While the tools can be adjusted to account for the extended scope, I'm pretty sure they will either become either noticeably slower or more imprecise (depending on what the maintainers choose to do to account for possible calls from the outside of the module in question).

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Jun 19, 2023
@MarkoMin
Copy link
Contributor Author

The only part possibly problematic to me is the "and its dependencies". I understand the need, but it generates interesting problems.

Deps can be tricky indeed. I tested circular and transitive dependencies, but more edge-cases can exists. Do you have any concrete example?

The thing is, when I did the algorithm for call checking, I prepared the stage for dynamic checks. Stuff about dependencies can be removed if:

  1. we agree that dynamic checks shouldn't be implemented
  2. we agree that top-down static calls (e.g. imagine that your callback module needs to have some specific name in order to work correctly) is an anti-pattern which shouldn't be used.

That would simplify the solution even more, resulting in a very short description for this feature: "Internally exported functions can be statically called only from within the same application".

@Maria-12648430
Copy link
Contributor

@MarkoMin Shouldn't this be an EEP proposal, with this PR as the reference implementation? AFAICS, you have written up most of what is needed in the opening post already. The discussion seems to be more appropriate there and, this being a change in/near the core of the language itself, the OTP team/technical board will probably require one anyway 😅

@MarkoMin
Copy link
Contributor Author

I can create one if proven necessary, its not a problem and shouldn't take so long. At this point, when I already submitted a PR, I'll wait for feedback from Core team and then create an EEP if needed.

@kikofernandez
Copy link
Contributor

@MarkoMin I think it is better to create the EEP so that we can flesh out details. The PR can then be mostly a conversation about design and implementation of the code.

@MarkoMin
Copy link
Contributor Author

@kikofernandez Just submitted the EEP

@kikofernandez
Copy link
Contributor

As EEP-67 has been rejected on the basis that documentation attributes can provide this functionality, using -doc false., I will close this PR.

We thank you for your effort.

@kikofernandez
Copy link
Contributor

I have re-open the PR because most of the implementation refers to xref.
It would be great if this PR can be re-target to look at -doc false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet