New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand aliases if necessary during immediacy computation (second attempt) #10041
Conversation
Types are now expanded if necessary to check [@@Immediate] annotations, fixing various bugs.
6de963c
to
a93e689
Compare
a93e689
to
b3c414d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a first pass of review on this new PR.
In general I am not really in favor of giving up on modular signatures and doing everything by expansion after the fact, but arguably a correct expansion is better than an incorrect signature.
Have you checked for potential impact on compile times?
type_unboxed: unboxed_status; | ||
type_uid: Uid.t; | ||
} | ||
|
||
and type_kind = | ||
Type_abstract | ||
Type_abstract of {immediate: Type_immediacy.t} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer having a proper record than an inline record:
- this is destined to grow if we apply your approach to more things currently stored in signatures
- I could see functions taking this information as argument, so it deserves a name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally unenthusiastic about adding things because we might maybe want them someday. It's a very easy change to make after the fact, let's wait until someone actually needs it and let them do it then, with a specific use in mind.
typing/types.ml
Outdated
|
||
let decl_is_abstract = function | ||
| { type_kind = Type_abstract _; _ } -> true | ||
| _ -> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call me superstitious, but I would feel more comfortable with a non-fragile pattern here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
Maybe we should emit a warning. *) | ||
(* We use expand_head_opt version of expand_head to get access | ||
to the manifest type of private abbreviations. *) | ||
let rec get_unboxed_type_representation env ty fuel = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move this (not-so-nice) code from its own module to Ctype
? I suppose there are dependency reasons, but it's not so clear what they are. I would have a preference for having the logic (for the unboxed stuff, and for the immediacy stuff) in their own modules instead of a file of 5K lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, circular dependency reasons. I'll try to remember what they were.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it might be worth merging this functionality into try_expand_once_opt
. The original aim of that function was to "expand" a type to get another type with the same representation. The intention is to use this for representation optimisations -- the "opt" in the name stands for optimisation -- like specializing primitives. That use case would be better served by a function that went through unboxed types too.
You'll need to be a little careful making this change as IIRC there a couple of uses of type_expand_once_opt
that really only want to expand through private
expansions, they would need to use a function with the same behaviour as the current type_expand_once_opt
called e.g. type_expand_once_private
.
end | ||
| _ -> Type_immediacy.coerce Unknown ~as_:imm | ||
| exception Not_found -> Type_immediacy.coerce Unknown ~as_:imm | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this function hard to understand. To understand the | _ ->
case, one has to remember what we have done in the two cases above. I think that some of the following changes would help:
- move the
exception Not_found
case first, it is simple and clear - get rid of the
when
case by- first checking for
Type_abstract
, with a comment that explains that we check the immediacy information
first to save an expansion in some cases - then using
Type_immediacy.coerce (kind_immediacy k) ~as_:imm
in all other cases.
- first checking for
(Note: ~as_
is not super nice as a label name, is there something wrong with expected
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's an improvement.
(Note: ~as_ is not super nice as a label name, is there something wrong with expected?)
Agreed. I suppose this is a good a time to change the label name as any.
|
||
let maybe_pointer_type env typ = not (is_immediate (immediacy env typ)) | ||
Type_immediacy.coerce Always ~as_:imm | ||
| _ -> Type_immediacy.coerce Unknown ~as_:imm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: another way to write this function would be to have a get_type_immediacy
function that returns Imm of immediacy | Of_type of type_expr
, and check_type_immediacy
that depends on ~expected
and loops on the result of get_type_immediacy
. Personally I would find it clearer, and it would also reduce the number of .coerce
calls to just one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not sure why you prefer check_*_immediate
to check_*_immediacy
; I guess that one reads like "check the immediacy of foo" and the other "check that foo is immediate", but it's not clear that "check that foo is immediate Unknown" is very intuitive, and the name variation may confuse some people.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the get_type_immediacy approach would work in the abstract case. It both checks against the declared immediacy (for speed, and in case expansion fails due to missing CMIs), and checks against an expanded type expression. I also find the interface less clear than "check that this type is immediate".
Regarding "immediate" vs. "immediacy": to me, the switch from noun to verb lined up with the switch from inference to checking ("compute the immediacy of this type" became "check that this type is immediate"). I'm not terribly attached to this change, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to change the code to something like:
let variant_all_constant row =
List.for_all
(function
| _, (Rpresent (Some _) | Reither (false, _, _, _)) -> false
| _ -> true)
row.row_fields
let variant_immediacty row =
let row = Btype.row_repr row in
(* if all labels are devoid of arguments, not a pointer *)
if row.row_closed && variant_all_constant row then Always
else Unknown
let head_immediacy env ty =
match (repr ty).desc with
| Tconstr(p, _, _) -> begin
match Env.find_type p env with
| exception Not_found -> Unknown
| decl -> kind_immediacy decl.type_kind
end
| Tvariant row -> variant_immediacy row
| _ -> Unknown
let rec check_type_immediate env ty imm =
let imm_head = head_immediacy env ty in
match Type_immediacy.coerce imm_head ~as_:imm with
| Ok () -> Ok ()
| Error _ as err ->
match try_expand_head try_expand_once_opt env ty with
| ty' -> check_type_immediate env ty' imm
| exception Cannot_expand -> err
with the above suggested change to try_expand_once_opt
handling the unboxed stuff.
typing/typedecl.ml
Outdated
begin match Ctype.check_decl_immediate env decl immediate with | ||
| Ok () -> () | ||
| Error v -> raise(Error(sdecl.ptype_loc, Immediacy v)) | ||
end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be useful to move this check below the "check for cyclic abbreviations" check, in case a cyclic abbreviations would break the immediacy-computation logic by running out of fuel during unboxed expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted! I'll try to construct a test case that hits this horrible case.
begin match Ctype.check_type_immediate env ty imm with | ||
| Ok () -> () | ||
| Error v -> raise(Error(loc, Immediacy v)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something weird here:
- I don't know how you thought of adding this logic here. Probably from the testsuite? It does not fit the documentation of the function, so maybe it does not belong here, or maybe the description/documentation should be generalized.
- It is strange that you would check for immediacy errors in the Type_abstract case, but not in the other cases. (Is it not supported to annotate a constant-variants type with an immediacy requirement?) If this is not required because it is already done by
transl_declaration
, then could the abstract case also be checked there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely the right place for this check, but thanks for pointing out the doc comment I missed.
A type declaration has the general form type _ t = M = K
, including both a manifest M
(an optional type expression) and a kind K
(one of abstract, a record definition, a variant definition, or ..
for open types). The check_coherence
function checks that, when M is not None, it agrees with the kind, i.e:
- if K is a variant type, check that M is a variant with the same parameters and constructors
- if K is a record type, check that M is a record with the same parameters and fields
- if K is
..
, check that M is an open type with the same parameters
Previously, check_coherence
had no case for abstract types, because there was no way that an abstract type could be incoherent with any given M
. Now, it is possible to write Type_abstract {immediate=Immediate}
(i.e. an abstract type which abstracts only over the immediate types), and so there is now something worth checking in check_coherence
.
It is strange that you would check for immediacy errors in the Type_abstract case, but not in the other cases. (Is it not supported to annotate a constant-variants type with an immediacy requirement?) If this is not required because it is already done by transl_declaration, then could the abstract case also be checked there?
It is supported, and is checked as you say in transl_declaration. I don't think it would be correct to check the abstract case there, as the final environment isn't built yet.
I think the confusion in the current code is that transl_declaration actually does too much, because it tried to check the manifest as well as the kind. I've changed it to make the division of responsibility clearer:
- transl_declaration ensures that the kind matches the immediacy declaration
- check_coherence ensures that the kind and manifest are compatible
| { type_unboxed = { unboxed = true; _ }; _ } -> | ||
begin match get_unboxed_type_representation env ty with | ||
| Some ty' -> check_type_immediate env ty' imm | ||
| None -> Type_immediacy.coerce Unknown ~as_:imm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: None
can be returned in the case where the expansion in get_unboxed_type_representation
itself returns an Abstract
type (there is a comment there that explains how this can happen, but I'm not sure how much it should be trusted). The natural thing to do in this case would be to trust the immediacy, instead of defaulting to Unknown
. This is a strange corner case, so it is okay to be unnatural here (and maybe it's actually safer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be unsound? I don't want to end up accepting:
type t = Foo of s [@@unboxed] [@@immediate]
and s = string
(I haven't tested that this is actually accepted by trusting the immediacy here, but my understanding of that comment is that this is the case it's trying to handle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that code would still miss this case, since the immediacy of s
would be Unknown
. The case that differs would probably be more like:
type t = Foo of s [@@unboxed] [@@immediate]
and s = string [@@immediate]
Thanks!
I'm not really following, could you expand on this a bit? What modularity is being lost here?
No, but I can't imagine it's slower. This version doesn't do any immediacy computation unless there are actual |
(* Check the kind first, in case of missing cmis *) | ||
match Type_immediacy.coerce (kind_immediacy type_kind) ~as_:imm with | ||
| Ok () -> Ok () | ||
| _ -> check_type_immediate env ty imm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making the other suggested changes, I think you should be able to share more code for this function.
I've had an initial look at this. I think the basic idea makes sense: only abstract types have non-obvious immediacy, so let's store the data on the |
Thinking out loud:
|
Also thinking out loud: what if we stored the unboxiness of a type constructor not in the |
Your suggestion indeed sounds like a good idea, assuming there is a simple/easy path to "just handle it like a normal datatype" if this is what we want (I would assume this case to also be rather common). |
I had a go at this, first patch now up in #10252. This patch just moves the boolean to the type_kind, I'm working on a second to refactor the expansion logic (and in particular, remove fuel). |
For the record, in our work with @nchataing on per-constructor unboxing (for variant types) we have a new This comment is also a heads up: we took inspiration from our general discussion of on-demand computations here, and eventually we will submit a PR of our own (for per-constructor unboxing, which is mostly orthogonal to immediacy). It might make sense to converge on this one and/or #10252 in the meantime, if you have time. |
I wish we could make progress on this general topic in the next few months -- in part because my own work on constructor unboxing relies on this approach. I wonder whether @stedolan could be convinced to bring this PR to the "let's get it reviewed for real" stage. An alternative would be for me to "adopt" it somehow, and find someone else to review it. (Maybe @Armael ?) |
Another person who might be interested in playing either reviewer or resubmitter for this PR is @antalsz. |
This is an alternative approach to #10017, fixing the same bugs. I think this approach is ultimately simpler, but the patch is larger.
The major change is to entirely remove the fixpoint computation in
typedecl_immediacy.ml
which computed thetype_immediacy
field of each named type. Instead, the immediacy of abstract types is represented as a parameter to theType_abstract
kind. Immediacy is checked when the type is defined rather than being inferred.Removing the
type_immediacy
field fixes the bugs in #10017: the bugs arise because the contents of this field becomes stale after substitutions.To make review easier, I've split the changes into two commits: the first contains purely mechanical changes, and the interesting bits are in the second.
cc @gasche @lpw25