Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand aliases if necessary during immediacy computation (second attempt) #10041

Closed
wants to merge 4 commits into from

Conversation

stedolan
Copy link
Contributor

This is an alternative approach to #10017, fixing the same bugs. I think this approach is ultimately simpler, but the patch is larger.

The major change is to entirely remove the fixpoint computation in typedecl_immediacy.ml which computed the type_immediacy field of each named type. Instead, the immediacy of abstract types is represented as a parameter to the Type_abstract kind. Immediacy is checked when the type is defined rather than being inferred.

Removing the type_immediacy field fixes the bugs in #10017: the bugs arise because the contents of this field becomes stale after substitutions.

To make review easier, I've split the changes into two commits: the first contains purely mechanical changes, and the interesting bits are in the second.

cc @gasche @lpw25

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass of review on this new PR.

In general I am not really in favor of giving up on modular signatures and doing everything by expansion after the fact, but arguably a correct expansion is better than an incorrect signature.

Have you checked for potential impact on compile times?

type_unboxed: unboxed_status;
type_uid: Uid.t;
}

and type_kind =
Type_abstract
Type_abstract of {immediate: Type_immediacy.t}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer having a proper record than an inline record:

  • this is destined to grow if we apply your approach to more things currently stored in signatures
  • I could see functions taking this information as argument, so it deserves a name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally unenthusiastic about adding things because we might maybe want them someday. It's a very easy change to make after the fact, let's wait until someone actually needs it and let them do it then, with a specific use in mind.

typing/types.ml Outdated

let decl_is_abstract = function
| { type_kind = Type_abstract _; _ } -> true
| _ -> false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call me superstitious, but I would feel more comfortable with a non-fragile pattern here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Maybe we should emit a warning. *)
(* We use expand_head_opt version of expand_head to get access
to the manifest type of private abbreviations. *)
let rec get_unboxed_type_representation env ty fuel =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this (not-so-nice) code from its own module to Ctype? I suppose there are dependency reasons, but it's not so clear what they are. I would have a preference for having the logic (for the unboxed stuff, and for the immediacy stuff) in their own modules instead of a file of 5K lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, circular dependency reasons. I'll try to remember what they were.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it might be worth merging this functionality into try_expand_once_opt. The original aim of that function was to "expand" a type to get another type with the same representation. The intention is to use this for representation optimisations -- the "opt" in the name stands for optimisation -- like specializing primitives. That use case would be better served by a function that went through unboxed types too.

You'll need to be a little careful making this change as IIRC there a couple of uses of type_expand_once_opt that really only want to expand through private expansions, they would need to use a function with the same behaviour as the current type_expand_once_opt called e.g. type_expand_once_private.

end
| _ -> Type_immediacy.coerce Unknown ~as_:imm
| exception Not_found -> Type_immediacy.coerce Unknown ~as_:imm
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this function hard to understand. To understand the | _ -> case, one has to remember what we have done in the two cases above. I think that some of the following changes would help:

  • move the exception Not_found case first, it is simple and clear
  • get rid of the when case by
    • first checking for Type_abstract, with a comment that explains that we check the immediacy information
      first to save an expansion in some cases
    • then using Type_immediacy.coerce (kind_immediacy k) ~as_:imm in all other cases.

(Note: ~as_ is not super nice as a label name, is there something wrong with expected?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's an improvement.

(Note: ~as_ is not super nice as a label name, is there something wrong with expected?)

Agreed. I suppose this is a good a time to change the label name as any.


let maybe_pointer_type env typ = not (is_immediate (immediacy env typ))
Type_immediacy.coerce Always ~as_:imm
| _ -> Type_immediacy.coerce Unknown ~as_:imm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: another way to write this function would be to have a get_type_immediacy function that returns Imm of immediacy | Of_type of type_expr, and check_type_immediacy that depends on ~expected and loops on the result of get_type_immediacy. Personally I would find it clearer, and it would also reduce the number of .coerce calls to just one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not sure why you prefer check_*_immediate to check_*_immediacy; I guess that one reads like "check the immediacy of foo" and the other "check that foo is immediate", but it's not clear that "check that foo is immediate Unknown" is very intuitive, and the name variation may confuse some people.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how the get_type_immediacy approach would work in the abstract case. It both checks against the declared immediacy (for speed, and in case expansion fails due to missing CMIs), and checks against an expanded type expression. I also find the interface less clear than "check that this type is immediate".

Regarding "immediate" vs. "immediacy": to me, the switch from noun to verb lined up with the switch from inference to checking ("compute the immediacy of this type" became "check that this type is immediate"). I'm not terribly attached to this change, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to change the code to something like:

let variant_all_constant row =
  List.for_all
    (function
     | _, (Rpresent (Some _) | Reither (false, _, _, _)) -> false
     | _ -> true)
   row.row_fields

let variant_immediacty row =
  let row = Btype.row_repr row in
  (* if all labels are devoid of arguments, not a pointer *)
  if row.row_closed && variant_all_constant row then Always
  else Unknown
        
let head_immediacy env ty =
  match (repr ty).desc with
  | Tconstr(p, _, _) -> begin
      match Env.find_type p env with
      | exception Not_found -> Unknown
      | decl -> kind_immediacy decl.type_kind
    end
  | Tvariant row -> variant_immediacy row
  | _ -> Unknown
  
let rec check_type_immediate env ty imm =
  let imm_head = head_immediacy env ty in
  match Type_immediacy.coerce imm_head ~as_:imm with
  | Ok () -> Ok ()
  | Error _ as err ->
      match try_expand_head try_expand_once_opt env ty with
      | ty' -> check_type_immediate env ty' imm
      | exception Cannot_expand -> err

with the above suggested change to try_expand_once_opt handling the unboxed stuff.

begin match Ctype.check_decl_immediate env decl immediate with
| Ok () -> ()
| Error v -> raise(Error(sdecl.ptype_loc, Immediacy v))
end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to move this check below the "check for cyclic abbreviations" check, in case a cyclic abbreviations would break the immediacy-computation logic by running out of fuel during unboxed expansion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted! I'll try to construct a test case that hits this horrible case.

begin match Ctype.check_type_immediate env ty imm with
| Ok () -> ()
| Error v -> raise(Error(loc, Immediacy v))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something weird here:

  • I don't know how you thought of adding this logic here. Probably from the testsuite? It does not fit the documentation of the function, so maybe it does not belong here, or maybe the description/documentation should be generalized.
  • It is strange that you would check for immediacy errors in the Type_abstract case, but not in the other cases. (Is it not supported to annotate a constant-variants type with an immediacy requirement?) If this is not required because it is already done by transl_declaration, then could the abstract case also be checked there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the right place for this check, but thanks for pointing out the doc comment I missed.

A type declaration has the general form type _ t = M = K, including both a manifest M (an optional type expression) and a kind K (one of abstract, a record definition, a variant definition, or .. for open types). The check_coherence function checks that, when M is not None, it agrees with the kind, i.e:

  • if K is a variant type, check that M is a variant with the same parameters and constructors
  • if K is a record type, check that M is a record with the same parameters and fields
  • if K is .., check that M is an open type with the same parameters

Previously, check_coherence had no case for abstract types, because there was no way that an abstract type could be incoherent with any given M. Now, it is possible to write Type_abstract {immediate=Immediate} (i.e. an abstract type which abstracts only over the immediate types), and so there is now something worth checking in check_coherence.

It is strange that you would check for immediacy errors in the Type_abstract case, but not in the other cases. (Is it not supported to annotate a constant-variants type with an immediacy requirement?) If this is not required because it is already done by transl_declaration, then could the abstract case also be checked there?

It is supported, and is checked as you say in transl_declaration. I don't think it would be correct to check the abstract case there, as the final environment isn't built yet.

I think the confusion in the current code is that transl_declaration actually does too much, because it tried to check the manifest as well as the kind. I've changed it to make the division of responsibility clearer:

  • transl_declaration ensures that the kind matches the immediacy declaration
  • check_coherence ensures that the kind and manifest are compatible

| { type_unboxed = { unboxed = true; _ }; _ } ->
begin match get_unboxed_type_representation env ty with
| Some ty' -> check_type_immediate env ty' imm
| None -> Type_immediacy.coerce Unknown ~as_:imm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: None can be returned in the case where the expansion in get_unboxed_type_representation itself returns an Abstract type (there is a comment there that explains how this can happen, but I'm not sure how much it should be trusted). The natural thing to do in this case would be to trust the immediacy, instead of defaulting to Unknown. This is a strange corner case, so it is okay to be unnatural here (and maybe it's actually safer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be unsound? I don't want to end up accepting:

  type t = Foo of s [@@unboxed] [@@immediate]
  and s = string

(I haven't tested that this is actually accepted by trusting the immediacy here, but my understanding of that comment is that this is the case it's trying to handle)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that code would still miss this case, since the immediacy of s would be Unknown. The case that differs would probably be more like:

type t = Foo of s [@@unboxed] [@@immediate]
and s = string [@@immediate]

@stedolan
Copy link
Contributor Author

stedolan commented Jan 4, 2021

I made a first pass of review on this new PR.

Thanks!

In general I am not really in favor of giving up on modular signatures and doing everything by expansion after the fact, but arguably a correct expansion is better than an incorrect signature.

I'm not really following, could you expand on this a bit? What modularity is being lost here?

Have you checked for potential impact on compile times?

No, but I can't imagine it's slower. This version doesn't do any immediacy computation unless there are actual [@@immediate] annotations to check.

(* Check the kind first, in case of missing cmis *)
match Type_immediacy.coerce (kind_immediacy type_kind) ~as_:imm with
| Ok () -> Ok ()
| _ -> check_type_immediate env ty imm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the other suggested changes, I think you should be able to share more code for this function.

@lpw25
Copy link
Contributor

lpw25 commented Feb 15, 2021

I've had an initial look at this. I think the basic idea makes sense: only abstract types have non-obvious immediacy, so let's store the data on the Type_abstract constructor. However, the premise isn't actually true thanks to unboxed types. They are really the same as abstract types from the perspective of immediacy and of layouts more generally. I think it would probably be better to keep the field on the decl type, even though it is redundant on most variant and record types, so that unboxed types will benefit from the same optimisation of only "expanding" if needed.

@gasche
Copy link
Member

gasche commented Feb 15, 2021

Thinking out loud:

  • Another way to take Leo's comment into account would be to add more data in the unboxed case (currently type_unboxed is a boolean, we could make it richer, storing information on its expansion).
  • I don't like the way following unboxed chains uses fuel, that looks very brittle to me, I wish there was a better way.

@stedolan
Copy link
Contributor Author

Thinking out loud:

  • Another way to take Leo's comment into account would be to add more data in the unboxed case (currently type_unboxed is a boolean, we could make it richer, storing information on its expansion).
  • I don't like the way following unboxed chains uses fuel, that looks very brittle to me, I wish there was a better way.

Also thinking out loud: what if we stored the unboxiness of a type constructor not in the type_unboxed field of its declaration, but as a new constructor of type_kind? I had a couple of bugs when writing this patch, where I case-split on decl.type_kind but forgot to specially handle the case where decl.type_unboxed.unboxed = true, incorrectly handling an unboxed type as though it were a normal Type_variant with one constructor. (It wouldn't surprise me if there were more bugs of this form, in my patch or elsewhere)

@gasche
Copy link
Member

gasche commented Feb 15, 2021

Your suggestion indeed sounds like a good idea, assuming there is a simple/easy path to "just handle it like a normal datatype" if this is what we want (I would assume this case to also be rather common).

@stedolan
Copy link
Contributor Author

I had a go at this, first patch now up in #10252. This patch just moves the boolean to the type_kind, I'm working on a second to refactor the expansion logic (and in particular, remove fuel).

@gasche
Copy link
Member

gasche commented May 25, 2021

Also thinking out loud: what if we stored the unboxiness of a type constructor not in the type_unboxed field of its declaration, but as a new constructor of type_kind?

For the record, in our work with @nchataing on per-constructor unboxing (for variant types) we have a new Cstr_unboxed of <some metadata> constructor in the type constructor_description.

This comment is also a heads up: we took inspiration from our general discussion of on-demand computations here, and eventually we will submit a PR of our own (for per-constructor unboxing, which is mostly orthogonal to immediacy). It might make sense to converge on this one and/or #10252 in the meantime, if you have time.

@gasche
Copy link
Member

gasche commented Jun 13, 2022

I wish we could make progress on this general topic in the next few months -- in part because my own work on constructor unboxing relies on this approach. I wonder whether @stedolan could be convinced to bring this PR to the "let's get it reviewed for real" stage. An alternative would be for me to "adopt" it somehow, and find someone else to review it. (Maybe @Armael ?)

@gasche
Copy link
Member

gasche commented Jun 13, 2022

Another person who might be interested in playing either reviewer or resubmitter for this PR is @antalsz.

@gasche
Copy link
Member

gasche commented Jan 13, 2023

This is subsumed by #11841 ( a rebase of this PR by @ccasin ), closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants