Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow local redefinition of built-in types #6335

Merged

Conversation

bjorng
Copy link
Contributor

@bjorng bjorng commented Sep 29, 2022

If a module defines a type and a new built-in type is added with the same name, Dialyzer will allow the redefinition of that particular release for one release only. That makes it difficult to write code that is supposed to work for multiple releases of OTP.

For example, the OTP 24 release introduced the nonempty_binary type. Modules that had their own definition would continue to work in OTP 24 but not in OTP 25.

Always allow the name of a built-in type to be reused locally, in the same way that a definition of a function with the same name as a BIF will take precedence over the BIF. Starting from OTP 25, the built-in types belong to the erlang module, so it will always be possible to get the built-in definition if needed by prefixing the name with erlang:.

Closes #6132

@bjorng bjorng added team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI labels Sep 29, 2022
@bjorng bjorng self-assigned this Sep 29, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 29, 2022

CT Test Results

       4 files     388 suites   58m 14s ⏱️
2 782 tests 2 734 ✔️ 48 💤 0
7 016 runs  6 965 ✔️ 51 💤 0

Results for commit b08f467.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot


get_core_location([L | _As]) when is_integer(L) -> L;
get_core_location([{L, C} | _As]) when is_integer(L), is_integer(C) -> {L, C};
get_core_location([_ | As]) -> get_core_location(As);
get_core_location([]) -> undefined.

massage_forms([{type, Loc, [{Name, Type0, Args}]} | T], Defs0) ->
Type = massage_type(Type0, Defs0),
Defs = sets:add_element({Name, length(Args)}, Defs0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe equivalent but in this line the sets:add_element({Name, length(Args)}, Defs0) takes the Name as argument. However, your massage_type function checks if sets:is_element({Type, length(Args0)}, Defs) where the Type I guess is not the same as the Name. Unless Name and Type are exactly equal, I believe you are creating a set of Names with their arity but your massage_type looks for Types and their arity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my naming is inconsistent. I have pushed a fixup commit that renames Type to Name in massage_type/2.

massage_forms([{type, Loc, [{Name, Type0, Args}]} | T], Defs0) ->
Type = massage_type(Type0, Defs0),
Defs = sets:add_element({Name, length(Args)}, Defs0),
[{type, Loc, [{Name, Type, Args}]} | massage_forms(T, Defs)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is not tail recursive but I guess that we do not care because we assume that the list is not going to be big enough to make an impact?

Copy link
Contributor

@michalmuskala michalmuskala Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body-recursive functions that just build a list generally are the same, or even more efficient than tail recursive with a reverse at the end.

See: https://ferd.ca/erlang-s-tail-recursion-is-not-a-silver-bullet.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -436,12 +436,8 @@ format_error({undefined_type, {TypeName, Arity}}) ->
io_lib:format("type ~tw~s undefined", [TypeName, gen_type_paren(Arity)]);
format_error({unused_type, {TypeName, Arity}}) ->
io_lib:format("type ~tw~s is unused", [TypeName, gen_type_paren(Arity)]);
format_error({new_builtin_type, {TypeName, Arity}}) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The removal of these two cases (new_builtin_type and builtin_type) mean that if we ever get them, the linter will crash. Is this the expected behaviour?

  2. Is there any guarantees that new_builtin_type and builtin_type cannot appear as arguments to the format_error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format_error/1 is only supposed to be called with a term returned from call to one of the functions in erl_lint. That means that when invoked from the compiler, format_error/1 will never crash.

Not having a default clause makes sure that if we have forgotten to add a new clause to format_error/1, the bug will be found quickly.

lib/stdlib/src/erl_lint.erl Show resolved Hide resolved
TypePair = {TypeName, length(Args)},
case is_map_key(TypeName, Types) of
true ->
TypePair = {TypeName, length(Args)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been define in line 3052 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code look ok but it is almost repeating the same pattern in all cases.
I wonder if there is a better way to write this code without repeating the same structure 3 times.
Maybe it can be refactored to use a function that assigns {TypeName, 0} or {TypeName, length(Args) depending on whether Args is any or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third clause (the one with the redundant definition) is wrong and redundant. It handles annotated types (e.g. A :: atom()), remote types (e.g. gb_sets:set()), and integer and atom literals. Those need no special handling, so the third clause can be removed.

The first and second clauses can indeed be combined to avoid repeating code.

I have pushed a fixup commit.

check_type_1(Types, SeenVars, St) ->
check_type_2(Types, SeenVars, St).

check_type_2({ann_type, _A, [_Var, Type]}, SeenVars, St) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name for the function than check_type_1 and check_type_2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions for better names?

The number suffixes indicates that check_type_1 and check_type_2 are helpers for check_type/1, and that they are never meant to be called directly from other parts of erl_lint.

@michalmuskala
Copy link
Contributor

This has an unfortunate side effect of allowing "spooky action at a distance" with header files.

See for example:

% foo.hrl
-record(person, {name :: string()}).

% foo.erl
-include("foo.hrl").
-type string() :: atom().

foo() ->
    #person{name = "Name"} % surprise, invalid (!)

@bjorng
Copy link
Contributor Author

bjorng commented Oct 11, 2022

This has an unfortunate side effect of allowing "spooky action at a distance" with header files.

Yes, it does. Not sure if that is a problem in practice, or how we could avoid it.

@kikofernandez
Copy link
Contributor

to avoid the problem stated by @michalmuskala , can we consider the header files to use the default erlang module for their types? I think we do not want to allow dynamic scoping, correct?

@bjorng
Copy link
Contributor Author

bjorng commented Oct 11, 2022

to avoid the problem stated by @michalmuskala , can we consider the header files to use the default erlang module for their types?

I don't think we should make the semantics dependent on whether the definition is in a header filer or not. That could also defeat the purpose of allowing redefinitions if the types that need to be redefined are in a header file. (It's no longer considered best practice to define types in header files, but we still have such code in Erlang/OTP.)

@michalmuskala
Copy link
Contributor

I agree with @bjorng - making semantics different between header files & modules wouldn't be desirable.

TBH, I don't think there's much we can do here, it's the semantics of headers & records that are the actual problem, not really this feature. I don't think we can do much, other than warn people not to do it. You could argue the same is already possible with functions & default values:

%foo.hrl
-record(processor, {pid = self(), ...}).

%mod1.erl
-include("foo.hrl").
processor() -> #processor{}.

%mod2.erl
-include("foo.hrl").
-compile({no_auto_import, [self/0]}).

processor() -> #processor{}.

self() -> 'self'.

If a module defines a type and a new built-in type is added with the
same name, Dialyzer will allow the redefinition of that particular
release for one release only. That makes it difficult to write code
that is supposed to work for multiple releases of OTP.

For example, the OTP 24 release introduced the `nonempty_binary`
type. Modules that had their own definition would continue to work in
OTP 24 but not in OTP 25.

Always allow the name of a built-in type to be reused locally, in the
same way that a definition of a function with the same name as a BIF
will take precedence over the BIF. Starting from OTP 25, the built-in
types belong to the erlang module, so it will always be possible to
get the built-in definition if needed by prefixing the name with
`erlang:`.

Closes erlang#6132
@bjorng bjorng force-pushed the bjorn/dialyzer/local-type-redefinition/GH-6132 branch from b08f467 to 4d08fc9 Compare October 12, 2022 09:44
@bjorng bjorng merged commit 5b279e3 into erlang:master Oct 12, 2022
@bjorng bjorng deleted the bjorn/dialyzer/local-type-redefinition/GH-6132 branch October 12, 2022 09:51
facebook-github-bot pushed a commit to WhatsApp/eqwalizer that referenced this pull request May 18, 2023
Summary: - why? `bultin-in` dynamic() in OTP 26 and erlang/otp#6335

Reviewed By: alanz

Differential Revision: D45967322

fbshipit-source-id: f6e5635b4a7e3d1b19e47266e977dd4f168f0434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dialyzer: Allow local redefinition of types
3 participants