Skip to content

#12985: tweak functor error msgs by prefering changes on the left#12988

Merged
gasche merged 6 commits intoocaml:trunkfrom
Octachron:functor_diffing_tweak
Mar 12, 2024
Merged

#12985: tweak functor error msgs by prefering changes on the left#12988
gasche merged 6 commits intoocaml:trunkfrom
Octachron:functor_diffing_tweak

Conversation

@Octachron
Copy link
Member

@Octachron Octachron commented Feb 23, 2024

This PR improves the error message for partially applied functor by making the diffing algorithm prefers paths with optimal weight that end with deletions or additions.

This one-line change is sufficient to switch the error message for

module type x = sig type x end
module type y = sig type y end
module X = struct type x end
module Y = struct type y end
module W = struct type w end
module F(A:x)(B:y)(C:x) = struct end
module M = F(W)

from

module M = F(W);;
Error: The functor application is ill-typed.
       These arguments:
         W
       do not match these parameters:
         functor (A : x) (B : y) (C : x) -> ...
       1. An argument appears to be missing with module type x
       2. An argument appears to be missing with module type y
       3. Modules do not match: W : sig type w = W.w end is not included in x

to

Error: Modules do not match: sig type w = W.w end is not included in 
       x
     The type x is required but not provided

close #12985

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the testsuite examples, my impression is that the change introduces some regressions as well as some improvements.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This variant or record definition does not match that of type "d"
Fields "x" and "y" have been swapped.
Field "y" has been moved from position 1 to 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a regression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a corner case because the existing weight considered that a move of one field

a   b   a
↓-------↑

(aka a delete + insertion ) was a good as an exchange of two fields

a  b
↓  ↓
b  a

(aka two change) I have fixed this case by tweaking the weights to prefer a swap.

1. Modules do not match:
Set : (module Set)
is not included in
Set.OrderedType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that this error message is also worse than before -- this would be clearer if A had type OrderedType, which is how I tend to read this example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is completely symmetrical? We are going from a Delete-Change to Change-Delete? If A was matching, the path that will be selected would be Delete-Match .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me as a human, it is very clear that Set cannot possibly be an OrderedType choice, while A looks like a type that may be intended to be an OrderedType (but maybe compare is mis-spelled or something?).

(The actual error may be that Set was written when say ISet was meant, which happens to be an ordered type. But then showing in details the signature of Set is also irrelevant to that sort of error, so the new message is also mostly noise in this case.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module A has type sig type a end so it's not clear to me how it can be considered more similar to OrderedType than Set without requiring human-level sensitivity (although it generated a smaller error message when picked, but I don't know why the missing type t and compare value were not listed before)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'm not saying that whatever criterion we use should recognize one from the other, just that the new error message is (for humans) slightly worse than before.

7. Module TY matches the expected module type ty
8. Module TY matches the expected module type ty
9. Module TY matches the expected module type ty
2. Module FiveArgsExt matches the expected module type Ext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (slightly) better than before.

3. Module types X.T and X.T match
4. Module types X.T and X.T match
5. Module types X.T and X.T match
5. An extra argument is provided of module type X.T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly better than before.

2. The following extra argument is provided X : sig type x = int end
3. Module X matches the expected module type
2. Module X matches the expected module type
3. The following extra argument is provided X : sig type x = int end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look almost identical but I agree that the new order is slightly better.

@gasche
Copy link
Member

gasche commented Feb 23, 2024

Preferring swaps to moves is very reasonable -- as a change of its own I would be happy to approve it. On the other hand, I don't understand what this second commit does in the details, it throws away a t.types_match that may have been a good idea. (Maybe some comment on the intent between the weights would be nice.)

@gasche
Copy link
Member

gasche commented Feb 24, 2024

After thinking more about this, I have mellowed on this PR. I think that there is probably a reasonable idea behind it, it is not just clear from looking at the diff, and that the regression in the existing messages are probably taken into account reasonably well by your new, additional change.

I remain disconcerted from the fact that I have no clue what is going on when looking at the diff. Could you:

  1. Explain how the code changes realize the high-level specifications that you describe, and
  2. Ensure that there are enough comments in the source, so that code readers know what the high-level specification is (and maybe how it is realized by the code)?

@Octachron
Copy link
Member Author

What happens is that the line

    select_best_proposition [del;insert;diag]

enforces an implicit reverse lexicographic order on the minimal weight paths.

In more details, consider the case where the three candidate have the same weight, we end-up with three paths ending with ... Del, ... Insert, ... Change. The code then selects as a representative of the weight-equivalence class the first one.
This is equivalent to choosing the smallest best-weight path according to the order Del < Insert < Change for the last part of the path.

By induction, since we have applied this choice for the all preceding matrix position, this means that the selected path among the paths with minimal weight is the smallest one according to the corresponding reverse lexicographic order.

Previously, we (I?) choose the order Change < Del < Insert, with the idea that all those paths were equivalent.

However, for the sake of partial functor application, we already have in place a elision mechanism for the Insert suffix of patches. This was done because we always compare the full list of parameters to list of concrete arguments, and thus

module F: A -> B -> C - > ...
module M = F(A)

results in a Match → Insert → Insert patch where the Insert suffix is non-informative.

Unfortunately, this elision mechanism was triggering less often that it should have because of our choice of order for optimal path.

For instance, if we apply the previous functor to an argument that doesn't match any parameter

module F: A -> B -> C -> ...
module M = F(D)

the paths with optimal weights are:

  • Change → Insert → Insert
  • Insert → Change → Insert
  • Insert → Insert → Change

And with the Change< Insert < Del order, we selected the last one completely disabling the Insert suffix elision mechanism. The first change in this PR avoids this situation by enforcing that Insert<Change. Not that the we also changed the relation of Del; but that is not necessary and we could choose either

  • Insert < Change < Del
  • Insert < Del < Change
  • Del < Insert < Change (which the one used in this PR).

@gasche
Copy link
Member

gasche commented Feb 29, 2024

I understand the explanation, thanks. Two questions:

  1. Is there a symmetric situation with Change and Del, when a functor is over-applied, that would justify not using Change < Del?
  2. Could you do this differently (but I'm not suggesting that you must, I'm happy with the minimal change here if properly explained) by generating a different Insert_at_end (and symmetrically Delete_at_end if relevant) action that is only emitted when diffing against the empty sequence, and would be preferred over Insert (and Delete)?

@Octachron
Copy link
Member Author

Octachron commented Mar 1, 2024

  1. For over-applied functors, it doesn't feel accurate to consider that arguments on the right are more likely to be erroneous. For deletions, it is the asymmetry of partial applications that breaks the position-invariance of error likelihood for Deletion, I don't see a similar phenomenon for insertion.
  2. Abstract module types makes this less easy than it appear because they make the shape of the diffing matrix dynamic, and thus we cannot identify the last rows (or columns) before we reach the end corner.
    Typically, with
module F(X:sig module type T module M:T end) = X.M
module type A = sig type a end
module B = struct type b end
module Three = struct
  module M(_:A)(_:A)(_:A) = struct end
  module type T = module type of M
end
module E = F(Three)(B)

we start with a 2x1 matrix and end with a 2x4 matrix. Selecting the path representative using the reverse lexicographic order bypasses the issue.

Error: This application of the functor F is ill-typed.
       These arguments:
         Three B
       do not match these parameters:
         functor (X : ...) A -> ...
       1. Module Three matches the expected module type
       2. Modules do not match:
            B : sig type b = B.b end
          is not included in
            A
          The type a is required but not provided

@gasche
Copy link
Member

gasche commented Mar 1, 2024

Status: I understand the explanation for the first change. You haven't tried explaining the second change, and/or reflecting the explanations in the codebase.

@Octachron
Copy link
Member Author

I have added the explanation for the path preference in the diffing matrix in the code.

I have also tweaked a bit further the weights for the definition diffing, with an explanation of the constraints involved:

  • Zeroth, without loss of generality we can choose w Delete = w Insert.
  • First, we want w Change < w Insert + w Delete, otherwise Change patch element are not useful.
  • Second, if we want to prefer consecutive swaps to moves, this implies that
    w Swap < w Move w (Change^2) < w (Insert Delete)w Change < w Delete
  • Third, we prefer bracketed Delete ... Insert pairs that allow more matches to long sequence of Changes.
    In other words, for some D ∈ ℕ* and a large enough R ∈ ℕ*, we should have
    w (Delete^D Keep^R Insert^D) < w (Change^(D+R))w Change > (2 D)/(D+R) * (w Delete).
    Note that the case D=1,R=1 is incompatible with the inequation w Change<w Delete above.
    If we choose R = D + 1 for D<5, we can specialize the inequation to w Change > (10 / 11) * w Delete.
  • Fourth, the ordering for Type_only_change, Name_only_change and Name_and_type_change determine which change is considered as more likely mistake, w Type_only_change < w Name_only_change < w Name_and_type_change seems like a good ordering.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! These are more explanations than anyone ever knew we could have, so this is a definite improvement and I am happy to Approve.

@gasche
Copy link
Member

gasche commented Mar 12, 2024

check-typo would like a Changes entry

@Octachron Octachron force-pushed the functor_diffing_tweak branch from 751f934 to 117070e Compare March 12, 2024 16:05
@gasche gasche merged commit c571e17 into ocaml:trunk Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve type error explanation for partially applied functor

3 participants