Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dialyzer] False positive: strange interaction with string(), list() and overloaded functions #7624

Open
albsch opened this issue Sep 4, 2023 · 4 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@albsch
Copy link

albsch commented Sep 4, 2023

This is a minimized function from the Gradualizer test suite:

-spec g(list(tag)) -> list(string());
       (tag) -> string().
g([]) -> [];
g([tag | _]) -> [g(tag)];
g(tag) -> "".

The first part of the specification describes the list processing, the second part the tag to string conversion. This is a valid function with a valid spec.

Dialyzer complains:

should_pass.erl:5:2: Invalid type specification for function should_pass:g/1. The success typing is 
          ('tag' | maybe_improper_list()) -> [[[[any()]]]]

The return type looks really wild, why is Dialyzer inferring a 4-times nested list?

Reproducible build with Erlang 24-26:
https://github.com/albsch/dialyzer-test

Expected behavior
Dialyzer should not complain, the type spec is valid.

@albsch albsch added the bug Issue is reported as a bug label Sep 4, 2023
@IngelaAndin IngelaAndin added team:VM Assigned to OTP team VM team:PS Assigned to OTP team PS labels Sep 8, 2023
@TD5
Copy link
Contributor

TD5 commented Sep 8, 2023

From Dialyzer 5.1's release notes (my emphasis), here's one aspect which I think muddies the water a bit:

Dialyzer's overloaded domain warning is now disabled by default, and can be enabled with the flag -Woverlapping_contract.

Dialyzer used to issue a warning for overloaded domains stating "such contracts are currently unsupported and are simply ignored".

These contracts are not "ignored" but rather, Dialyzer takes the union of the overloaded domains. This means that we lose the dependency from each corresponding input to output type. Because of this, the warning is really about not being able to establish a dependency between the input and output types of each respective overloaded function specification.

With this in mind, splitting the function to avoid an overload specification seems to satisfy Dialyzer:

-module(should_pass).

-export([g/1, h/1]).

-spec g(list(tag)) -> list(string()).
g([]) -> []; 
g([tag | _]) -> [h(tag)].                                                                                                                                                       

-spec h(tag) -> string().
h(tag) -> "". 

And, indeed, changing the spec to naively simplify the overloading to:

-spec g(list(tag) | tag) -> list(string()) | string().

or

-spec g(list(tag) | tag) -> list(string()); (list(tag) | tag) -> string().

seems to make Dialyzer happy too.

I suspect this means the issue arrises due to how "Dialyzer takes the union of the overloaded domains" when it comes to refining the inferred success type using the specification given in the source file.

The return type looks really wild, why is Dialyzer inferring a 4-times nested list?

After "simplifying" the spec/inferred type, I imagine Dialyzer's reasons about string() (defined as https://www.erlang.org/doc/man/unicode#type-chardata), which notably allows nested and improper lists, and Dialyzer limits the depth it explores recursive types, which presumably explains the arbitrary quadruply-nested lists.

Dialyzer should not complain, the type spec is valid.

I think this is a reasonable claim, but pragmatically, I think using spec overloading is a "here be dragons" scenario.

For the solution inside Dialyzer, I think we need to revisit the soundness of the handling of overloading or the warning given for certain overloaded specs.

I did also try to enable the old -Woverlapping_contract warning explicitly, but it doesn't seem to be available in the CLI: https://github.com/erlang/otp/blob/master/lib/dialyzer/src/dialyzer_cl_parse.erl#L146

@bjorng
Copy link
Contributor

bjorng commented Sep 13, 2023

The return type looks really wild, why is Dialyzer inferring a 4-times nested list?

The return type [[[[any()]]]] is weird and at first glance it might seem wrong. It is correct because any of the nested lists can be empty. (This took me a while to figure out.) Dialyzer infers this type because it sees a recursive call to g/1, and g/1 can return either [] or a non-empty list. Dialyzer does not understand that the call g(tag) only can return [].

After "simplifying" the spec/inferred type, I imagine Dialyzer's reasons about string() (defined as https://www.erlang.org/doc/man/unicode#type-chardata), which notably allows nested and improper lists, and Dialyzer limits the depth it explores recursive types, which presumably explains the arbitrary quadruply-nested lists.

The string type is not defined in unicode module; it is defined like this in the erlang module:

-type string() :: [char()].

Now for why Dialyzer complains about the spec. Using OTP 26, Dialyzer gives a little more information about what Dialyzer considers wrong:

t.erl:13:2: Invalid type specification for function t:g/1.
 The success typing is t:g
          ('tag' | maybe_improper_list()) -> [[[[any()]]]]
 But the spec is t:g
          (['tag']) -> [string()];
         ('tag') -> string()
The return types do not overlap

As long as there is at least some overlap between spec and code, Dialyzer will allow it. However, lists are treated specially in that there must be an overlap not only when the lists are empty. So for the following example:

-spec h(list()) -> string().
h([H|T]) when is_atom(H) ->
    [H|h(T)];
h([]) ->
    [].

Dialyzer will complain:

t.erl:20:2: Invalid type specification for function t:h/1.
 The success typing is t:h
          ([atom()]) -> [atom()]
 But the spec is t:h
          ([any()]) -> string()
 The return types do not overlap

because the spec and the code only overlap for the empty list. I find it reasonable that there should be warning for this spec and code.

Therefore, I think that Dialyzer is correct in complaining about g/1. Perhaps we should make the warning a little bit clearer by saying "The return types do not overlap except when the lists are empty" or something similar.

@IngelaAndin IngelaAndin removed the team:PS Assigned to OTP team PS label Sep 20, 2023
@albsch
Copy link
Author

albsch commented Sep 22, 2023

Thank you for looking into this!

I find it reasonable that there should be warning for this spec and code.

Is this then considered a warning (that ideally can be disabled via a flag, like the warning for overlapping domains), or something that Dialyzer thinks will definitely lead to a runtime crash? Since that list edge case will never lead to a runtime crash and Dialyzer complains anyway, which I thought Dialyzer would (should?) never do.

So for the following example: h/1 Dialyzer will complain:

It's OK to complain about h/1 because h/1 definitely does not return just string() values when inputting a list() value. This will lead to a runtime crash if h/1 is used anywhere where string() values are expected. Dialyzer behaves for h/1 as expected.

@bjorng
Copy link
Contributor

bjorng commented Sep 25, 2023

Is this then considered a warning (that ideally can be disabled via a flag, like the warning for overlapping domains), or something that Dialyzer thinks will definitely lead to a runtime crash? Since that list edge case will never lead to a runtime crash and Dialyzer complains anyway, which I thought Dialyzer would (should?) never do.

Dialyzer does not say that it will lead to a runtime crash. It is a warning that the combination of spec and code does not make much sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants