Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build an explicit abstract environment for check_regularity (fix #12334) #12368

Merged
merged 7 commits into from
Jul 25, 2023

Conversation

garrigue
Copy link
Contributor

The call to check_regularity in Typecore was using the original environment, which does not contain the newly added types. From a typing point of view, this is correct (those types will be handled as abstract), but this could cause strange error messages such as in #12334.
We now build an explicit abstract environment, extending enter_type with an extra switch.
Also fixed a bud in check_regularity which reported wrong constraints in a reversed order.

Note: it might be correct to used the partially abstract temp_env built above, but this requires a thorough examination.

messages (#12334) *)
let abs_env =
List.fold_left2 (enter_type ~abstract_abbrevs:false rec_flag)
env sdecl_list ids_list in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected abstract_abbrevs:true here. Are you using true anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a whole new env here seems potentially expensive. Is there instead a way to catch and reinterpret the error message in check_regularity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche Well-spotted. I did various testing, and using true or false doesn't seem to have much impact, but it should be true here for keeping the original behavior. Fixed now

@goldfirere Actually, this is not very expensive. We are just building a dummy env here, which contains only abstract types, and whose type parameters are not constrained. Using the original environment was really a dirty hack.

1 | type 'a t = 'a foo foo
^^^^^^^^^^
Error: Constraints are not satisfied in this type.
Type "'a foo foo" should be an instance of "int foo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very nice to get an additional

Constraints cannot be inferred within a mutually recursive group of definitions

or something. Because 'a foo foo is actually an instance of int foo.

Not sure how hard it would be to arrange for the better message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an exact error message is going to be difficult here. We could track the recursive use, and point it as a potential cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that tracking the recursive use would be very helpful, if it's not too hard.

messages (#12334) *)
let abs_env =
List.fold_left2 (enter_type ~abstract_abbrevs:false rec_flag)
env sdecl_list ids_list in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a whole new env here seems potentially expensive. Is there instead a way to catch and reinterpret the error message in check_regularity?

@garrigue
Copy link
Contributor Author

Ok, new attempt.
I reverted to using the original (defective) environment, but modifying the warning so as to report this when the path is an identifier. This assumes that this is the only occurrence of a local identifier not appearing in the environment...
Should probably have a way to give this kind of information in the trace.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the new implementation. Can you explain why you decided against the previous approach? (Is it related to performance concerns on creating an environment?)

Type "M.b" is abstract because no corresponding cmi file
was found in path.
Type "M.a" is abstract because no corresponding cmi file
was found in path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line breaks here read like a regression caused by the use of @ instead of in the error message. Why the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the addition of Type at the beginning makes the line potentially too long for a standard terminal.
If you think there is no use for line breaks, we can also go back to .

| _ ->
fprintf ppf
"@,@[Type %a is abstract because no corresponding cmi file@ \
was found in path.@]" (Style.as_inline_code path) p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are catching all errors on missing references, and implicitly assuming that non-qualified paths all come from the specific errors of constraint checking, and specializing the error message for that. This sounds fragile and hard to understand and not a good idea.

@garrigue
Copy link
Contributor Author

This implementation is an answer to @goldfirere point that it would be nice to have a warning that we used the abstract version of the types. This seems to be the easiest way to do that. Performance is not an issue here.
I agree that this way of catching everything is not right. It should be easy to refine it by propagating the fact we are reporting an error coming from this precise point (e.g., add a flag to Constraint_failed).

I'm definitely not suggesting that we merge this code in its current state, I am rather exploring the design space.

@lpw25
Copy link
Contributor

lpw25 commented Jul 12, 2023

Doing this properly probably needs to put some kind of "reason" on the Type_abstract in the dummy type definition, and reporting that as part of failed expansions. I personally don't think that is worth doing for just this corner case -- which only experienced users or developers of the type system are likely to ever actually see. However, that would also more generally avoid giving the "missing cmi" error in the wrong place, something that has definitely happened before, and I think the required work could definitely be justified by that.

@garrigue
Copy link
Contributor Author

@lpw25 That's precisely what I was trying to do yesterday, but I had no time to push then. Done now.
I think the code is reasonable, but this may still be an overkill.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't do a full review but just glanced at the last commit)

@@ -2432,19 +2432,20 @@ let explain mis ppf =
| None -> ()
| Some explain -> explain ppf

let warn_on_missing_def env ppf t =
type env_kind = Normal_env | Abstract_env of string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a variant type for the Abstract_env payload, turned into a string only in the error-message production code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new policy ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is a policy yet, but I concur that messages at destination to humans should be emitted in one piece if possible, in order to have the full context available.

I see three major benefits for this constraint on the error message:

  • it can be read and copy edited in one place
  • it can be formatted as a whole
  • it can be translated with enough context to enable a good translation

match p with
| Path.Pident _ ->
match p, env_kind with
| Path.Pident _, Abstract_env where ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match should have env_kind as the only scrutinee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not a Pident, the abstract environment is not the cause.
Of course, if we start to have other cases, this may not be the case, so in some way, your above suggestion of making the payload a variant may make this code future-safe.

@lpw25
Copy link
Contributor

lpw25 commented Jul 13, 2023

@lpw25 That's precisely what I was trying to do yesterday, but I had no time to push then. Done now.

The code doesn't look like what I was suggesting. I was suggesting the following in types.ml:

type abstract_reason =
  | Missing_cmi
  | Recursive_dummy
  
and ('lbl, 'cstr) type_kind =
    Type_abstract of abstract_reason option
  | ...

@goldfirere
Copy link
Contributor

This seems like a fine approach to me, and I’m happy with the final error message. Thanks, @garrigue! I will Approve, but others may want to chime in if they see a better way.

@garrigue
Copy link
Contributor Author

@lpw25 That's precisely what I was trying to do yesterday, but I had no time to push then. Done now.

The code doesn't look like what I was suggesting. I was suggesting the following in types.ml: [...]

I see. This is indeed more elegant, but also more invasive.
This time we are lucky that looking at the path is enough to know the cause of abstraction, but your approach may be better if the problem shows up again.

@gasche
Copy link
Member

gasche commented Jul 19, 2023

Personally I still feel grumpy about the proposed approach. The problem is that even if we know that the environment we have is was intentionally built with some definitions missing, we can't know when we get a Not_found error whether this particular error comes from one of the definitions intentionally hidden from the environment, or from an actually undefined compilation unit somewhere else. So we do still need to rely on the criterion that "unqualified paths come from from our fake environment", which I find fragile and inelegant. (When I reviewed the change earlier I was under the impression that the match on p, env_kind was just a match on env_kind with the condition on p redundant, but in fact this is not the case.)

I see two ways to avoid relying on this criterion:

  • Leo's proposal to mark abstract types with information, so that lookup of "types from the current recursive nest" return an abstract type marked as a dummy instead of failing with Not_found -- but I agree that it is more invasive.
  • As a workaround, the Partial_env_check_regularity construct could come with a list/set of paths that are intentionally missing in the environment, and we could check membership into this list/set instead of relying on the Pident _ catch-all.

Both approaches avoid the potential confusion that there is today between "Not_found errors on recursive neighbors" and "other Not_found errors".

@goldfirere
Copy link
Contributor

goldfirere commented Jul 19, 2023 via email

@gasche
Copy link
Member

gasche commented Jul 20, 2023

So we discussed this during a type-system meeting and @garrigue was basically interested in trying out Leo's suggestions, with tweak to the definitions. (There is no Missing_cmi case as this one just raises Not_found.)

@garrigue garrigue force-pushed the abstract_env_for_regularity branch from 8f2a2cb to 778fe23 Compare July 21, 2023 03:25
@garrigue garrigue force-pushed the abstract_env_for_regularity branch from 778fe23 to 78bf6fa Compare July 21, 2023 06:02
@garrigue
Copy link
Contributor Author

Switched to Leo's approach. Of course this requires changes in many more files.

I kept the previous approach at https://github.com/cocti/ocaml/tree/flag_in_Constraint_failed .
Despite Gabriel's comment, I'm still convinced it is doing exactly what is intended (i.e. the reporting occurs exactly for the right paths), but I agree it is harder to review.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new approach better, thanks!

Reading the patch, I understand @lpw25's original proposal better. When he proposed to add a Missing_cmi reason, I think the idea was that the Env.find_type* lookup functions would return Type_abstract Missing_cmi instead of raising Not_found. This is a yet more invasive change but it would be fairly nice, because it could remove a lot of Not_found handling in the codebase (sharing the logic with the Type_abstract _ case), and make it more robust in places where we don't currently catch Not_found.

This would also let us get rid of the warn_on_missing_def* machinery on printtyp: if each abstract type comes with an explanation of why it is abstract, we can print it (in non-obvious cases) when printing the explanation of the kind difference, instead of adding an extra explanation in an ad-hoc way as we do currently.

I think that these improvements could be implemented as separate / follow-up PRs, and that the current state is already an improvement. See minor comments on the code still.

| None, _ -> Abstract_def, None
| Some _, None -> Abstract_def, Some (Btype.newgenvar ())
| Some _, Some reason -> reason, None
in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obscure. Instead of taking an optional abstract_abbrevs parameter, could you take a non-optional abstract_reason parameter? There are two calls to enter_type, one with Abstract_def and one with Abstract_rec_check_regularity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that this is more versatile: one could eventually build abstract environments with other reasons.
But if you think this is an overgeneralization, I can go back to the original approach of using a boolean for abstract_abbrevs, and hard-coding the use of Abstract_rec_check_regularity here.
Or do you just mean that we should pass a value of type abstract_reason option at call sites ?

Copy link
Member

@gasche gasche Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the code again, I understand the idea better, and I agree that my "simplification" proposed above was not such a good idea. Some documentation of the abstract_abbrevs parameter would have helped.

My understanding of the intended behavior: enter_type will turn datatype declarations into abstract types, and type synonyms into "dummy type synonym" of the form type t = <newgenvar()>. If abstract_abbrevs is provided, then all types (datatypes or synonyms) are made abstract with the provided reason.

Documenting this intention could be nice, because the code is not so clear. I also wonder what is the purpose of those 'dummy type synonyms" -- why not just turn type synonyms into abstract types as datatype declarations? (Ideally this could be explained in a comment.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some explanations.

@garrigue
Copy link
Contributor Author

Reading the patch, I understand @lpw25's original proposal better. When he proposed to add a Missing_cmi reason, I think the idea was that the Env.find_type* lookup functions would return Type_abstract Missing_cmi instead of raising Not_found.

I thought of this too, but there is a problem: when we look up a type in the environment, we do not give its arity. So the dummy type returned in case of missing cmi would be nullary, requiring extra care when reporting. I suppose this may still be better than raising Not_found, but have to see how it fares. This is indeed not for this PR, but for a potential future one.

@gasche
Copy link
Member

gasche commented Jul 25, 2023

This PR was hit by a flaky CI failure: on AppVeyor (Windows) the test memory-model/forbidden.byte fails with (no output and) exit code -1073741819.

@gasche gasche merged commit 2612a1a into ocaml:trunk Jul 25, 2023
9 of 10 checks passed
Octachron added a commit to Octachron/ocaml that referenced this pull request Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants