Skip to content

Fix handling undefined records and record fields#147

Merged
josefs merged 3 commits into
josefs:masterfrom
gomoripeti:undefined_records
Jan 27, 2019
Merged

Fix handling undefined records and record fields#147
josefs merged 3 commits into
josefs:masterfrom
gomoripeti:undefined_records

Conversation

@gomoripeti
Copy link
Copy Markdown
Collaborator

In general Gradualizer should be run on code that compiles without
errors. (i.e. there should be no undefined records or record fields)
However if such a case is encounter crash with an error tuple (that is a
bit nicer than badkey, and much nicer than
function_clause, handle_type_error({error, {record_field_not_found, ...)

I kept one exception: undefined remote records (and fixed the error
formatting). This is a bit different as it is fetched from
gradluaizer_db. The compiler does not check entities in remote modules
(except behaviours), but it will catch this anyway when the remote
module itself is compiled. Maybe this should be an error as well.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 15, 2019

Codecov Report

Merging #147 into master will increase coverage by 0.19%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   81.69%   81.88%   +0.19%     
==========================================
  Files           9        9              
  Lines        2054     2059       +5     
==========================================
+ Hits         1678     1686       +8     
+ Misses        376      373       -3
Impacted Files Coverage Δ
src/typechecker.erl 82.41% <96.55%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ee4396...93bf0ba. Read the comment docs.

Comment thread src/typechecker.erl Outdated
#{RecName := Fields} ->
Fields;
_NotFound ->
error({undefined_record, RecName})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw({undef, record, RecName})? Error is for unexpected cases in the program IMO, e.g. for invalid input to a function, so a badarg error perhaps. But I think a throw with a formatted error message is more user friendly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used error/1 and not throw/1 to emphasize that this is invalid input to this function. This is the responsibility of the compiler and gradualizer should only be given forms which don't have undefined records or fields. (I used undefined_record and undefined_field because these are the atoms the compiler also returns as errors in such cases)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but the compiler returns those errors, it doesn't raise them as errors, right? There is a difference.... The word "error" is overloaded....

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes, there are two things. I looked at the compiler only to choose an atom.
Independent from the compiler, I chose to raise in order to emphasize (that this is not the responsibility of Gradualizer). But I can change this.

@josefs
Copy link
Copy Markdown
Owner

josefs commented Jan 21, 2019

In general Gradualizer should be run on code that compiles without
errors. (i.e. there should be no undefined records or record fields)

I disagree. The way that programmers work in typed languages such as Haskell is, they start by running the program through the type checker until the type checker stops complaining. Only after that do they compile it.

I think that we should make as few assumptions as possible about how and when the Gradualizer is run. If we can give informative error messages in all kinds of situations then Gradualizer will be a much more usable tool.

@gomoripeti
Copy link
Copy Markdown
Collaborator Author

I thought that the type checker is part of the Haskell compiler...

Anyway, I think in practice people have powerful IDE's nowadays with on-the-fly compilation (+unit testing and what not) as they type. We can also plug into this chain on-the-fly gradualization (as long as they are fast enough the order does not matter) in the end the programmer gets real-time warnings about any of the on-the-fly steps. So it has no advantage to run Gradualizer before compilation.
Another thing is that Elixir users can only check beam files - so that part of the community will for sure run Gradualizer after compilation.
From a practical side of developing Gradualizer, making no assumptions means duplicating a lot (is it a lot?) of work that is already present in the compiler - instead of focusing on features that for sure the compiler wouldn't do.
What we could do is when the AST is read from an erl source file (not beam) instead of epp we could use compile:file(Erl, ['P', binary]) which runs not just the preprocessor but also parse_transforms (think of lager) and the linter, before returning the abstract forms?

I could argue that taking the source file instead of beam has no advantage - but it has, we can gain richer AST eg. by including column info.
Another argument against my point is if we would like to run Gradualizer as a parse_transform (don't know if that will ever make sense), that is definitely run before the linter (and would face undefined records etc.)

so I have a few arguments pro and con.

@zuiderkwast
Copy link
Copy Markdown
Collaborator

Another thing is that Elixir users can only check beam files - so that part of the community will for sure run Gradualizer after compilation.

Are you sure? I thought there's a way to only generate the AST and then type check it. We have exposed a gradualizer:type_check_forms/2 for this.

What we could do is when the AST is read from an erl source file (not beam) instead of epp we could use compile:file(Erl, ['P', binary]) which runs not just the preprocessor but also parse_transforms (think of lager) and the linter, before returning the abstract forms?

That's great! Let's do this! 👏 🍰 😄

We can even get the column information if we first parse the source code and then run the compiler checks using compile:forms(Forms, ['P', binary]).

@gomoripeti
Copy link
Copy Markdown
Collaborator Author

gomoripeti commented Jan 21, 2019

Are you sure? I thought there's a way to only generate the AST and then type check it. We have exposed a gradualizer:type_check_forms/2 for this.

I'm not familiar with the Elixir compiler in-depth, but there are all kinds of inter-dependencies between Elixir modules (require, use) and one Elixir module can contain multiple modules (resulting in multiple beam files) so it does not look straight-forward to me (I also don't see in the Elixir docs how to pass Erlang compiler options on).

We can even get the column information...

I also thought of this :)

@josefs
Copy link
Copy Markdown
Owner

josefs commented Jan 22, 2019

I thought that the type checker is part of the Haskell compiler...

Indeed. I'm talking about them separately for the sake of this discussion as the typechecker and compiler are different for Erlang.

Anyway, I think in practice people have powerful IDE's nowadays with on-the-fly compilation (+unit testing and what not) as they type. We can also plug into this chain on-the-fly gradualization (as long as they are fast enough the order does not matter) in the end the programmer gets real-time warnings about any of the on-the-fly steps. So it has no advantage to run Gradualizer before compilation.

I envision enabling on-the-fly graudalization as part of their IDEs. Especially if the Gradualizer is faster than the compiler. But also because it will provide a lot more helpful feedback than the compiler.

But I get your point about the problem of duplicating checks. I didn't mean to suggest that we duplicate all the checks that the compiler does. What I mean is that when we find ourselves in a situation where the Gradualizer finds something that it not right with the input program, it should report that in a way that is helpful to the programmer and not simply crash. A concrete example is records; if the programmer tries to use a record that is not defined it is better if Gradualizer reports this in a friendly way instead of crashing.

What we could do is when the AST is read from an erl source file (not beam) instead of epp we could use compile:file(Erl, ['P', binary]) which runs not just the preprocessor but also parse_transforms (think of lager) and the linter, before returning the abstract forms?

That's great! Let's do this! 👏 🍰 😄

That's indeed an interesting idea. Do you have any sense if we might take a performance hit?

@gomoripeti
Copy link
Copy Markdown
Collaborator Author

off-topic: the Erlang compiler is getting better and better in type inference:
erlang/otp#2091
erlang/otp#2100
(it's just good to see development around types on multiple threads around Erlang tool-chain)

@gomoripeti
Copy link
Copy Markdown
Collaborator Author

So we agree that we shouldn't check if a record or field is defined if we don't need that info (just to please the user - as I implemented in a previous commit)

I think for Erlang programmers ("let it crash") a nice crash report is helpful and readable. To make it even more helpful we would need to pass around the location info. And I think the error thrown in these cases should be distinct from type_errors (and pattern_errors).

In practice Gradualizer is mostly run on code that compiles without
errors. (i.e. there should be no undefined records or record fields)
However we don't make such assumptions and if an undefined record or
record field is encountered it is still reported in a user friendly
way. (Let's note though that Gradualizer won't make extra efforts to
detect all undefined records or record fields if they are not needed for
type checking)
CodeCov complains that there are too many uncovered changed lines.

(Undefined recods cannot be tested in should_fail modules, because
 compilation fails before eunit could be run on them.)
This is rather a known_problem. I realised that it's a bit more
complicated to implement, so for now only the crash is addressed with a
big TODO for later.

%% wildcard in pattern matching
-spec h(#rec{}) -> boolean().
h(#rec{apa = 1, _ = true}) ->
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something the compiler allows, but I dont think it's quite useful or used. Should we allow this at all or rather throw a warning (similar to lazy-andalso) discouraging the usage?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite much used in some code that I've seen, so I think we should allow it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. when writing patterns for mnesia. I wonder how we could support that, but that's for the future...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see it in record creation or record patterns? do you see it with any other value than the underscore atom '_' or 'undefined'? it's totally fine with creating match-specs for ets/mnesia. (ets:fun2ms(fun(#rec{apa = 1, _ = undefined}) -> ... is also converted to record creation and not a pattern by the ms_transform parse trans). Given that could you give a real-world example of using wildcard in record pattern?

I'm fine with supporting this...also I realise this is the wrong PR, I just found this case when reviewing my code changes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen it in a pattern. Only in expressions. I've seen it with _ = false in a record where all fields are booleans.

Copy link
Copy Markdown
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@josefs
Copy link
Copy Markdown
Owner

josefs commented Jan 26, 2019

So we agree that we shouldn't check if a record or field is defined if we don't need that info (just to please the user - as I implemented in a previous commit)

Yes.

I think for Erlang programmers ("let it crash") a nice crash report is helpful and readable. To make it even more helpful we would need to pass around the location info. And I think the error thrown in these cases should be distinct from type_errors (and pattern_errors).

Erlang's "let is crash" slogan is useful for writing fault tolerant distributed systems. But it need to be amended with, "and handle the crash gracefully by other means". Erlang has supervision trees and other nice mechanisms to handle that for a distributed system.
I don't see how it's helpful for the programmer if we apply the "let it crash" slogan to the Gradualizer. But I do agree that it makes sense to report a different kind of error for things like unused records, compared to type errors.

Copy link
Copy Markdown
Owner

@josefs josefs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Comment thread src/typechecker.erl
all_type(Tys, Ty, AOut, [Cs|Css], TEnv).

get_record_fields(RecName, Anno, #tenv{records = REnv}) ->
get_maybe_remote_record_fields(RecName, Anno, TEnv) ->
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think 'get_possibly_remote_record_field' would be a better name.

@josefs josefs merged commit 5095fa6 into josefs:master Jan 27, 2019
@gomoripeti gomoripeti deleted the undefined_records branch January 27, 2019 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants