Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple identical Struct declarations #448

Closed
heathsc opened this issue Mar 30, 2021 · 11 comments
Closed

Multiple identical Struct declarations #448

heathsc opened this issue Mar 30, 2021 · 11 comments
Assignees
Milestone

Comments

@heathsc
Copy link

heathsc commented Mar 30, 2021

What is the correct behaviour for the parser if multiple identical Struct definitions with the same name are declared either from the same document or different documents? We can see this, for example, with several of the BioWDL workflows where some basic Structs are declared multiple times with identical declarations without using aliasing to disambiguate them. In these cases should the parser (a) raise an error, (b) raise an error only if the definitions are not identical (same member names & types in the same order) and issue a warning otherwise or (c) raise an error if the definitions are not identical and ignore the duplicate definition otherwise. I would favour (b) unless it is clear that this is allowed in which case (c).

@mlin
Copy link
Member

mlin commented Mar 30, 2021

First pasting for reference the spec language:

It is valid to import the same Struct into the global namespace multiple times via different paths; however, if two Structs with the same name but different members are imported into the global namespace there is a name collision resulting in an error. This means that care must be taken not to give identical names to two different Structs that might be imported into the same WDL document tree. Alternatively, Structs can be aliased at import time.

There are two slightly different scenarios to think about. First is the "diamond import" pattern (which I think we do see in BioWDL among others), where W.wdl imports the same struct from two different task libraries T1.wdl and T2.wdl, but both T1.wdl and T2.wdl originally imported it from a common struct library S.wdl. I think this pattern is common (used in BioWDL I believe) and doesn't need a warning let alone an error. Longer paths are clearly possible too, but the point is there's only one "original" struct typedef in this case.

The second scenario is where the source code of the struct typedef really appears in multiple source documents (with identical members in all respects). I do think the spec is ambiguous as to whether such struct types are conceptually identical, compatible, or incompatible. Miniwdl does consider them identical, i.e. the struct type is defined as the members, which may have one or more aliases (one being the name in the original declaration), and does not care which file they were originally declared in. I can see a warning possibly being justified in this case, but miniwdl does not generate one currently.

Also, miniwdl does not care about the order of the members.

One more subtlety to track (which I think argues for miniwdl's current approach) is that structs can contain other structs as members, possibly with aliasing, so you have to take care that struct Car { Person owner } in different files may or may not be compatible, depending on what is Person in each file.

@heathsc
Copy link
Author

heathsc commented Mar 31, 2021

Thank you for the reply. I was considering that the order of members could have an effect on struct IO, but it is true that this would not change the way the pipeline works so the order should be ignored.

Your last point raises a issue that I hadn't considered and I will have to consider the best way to deal with this. Does the struct aliasing in an import statement affect only Struct declarations in the imported file or also Struct usages within the file? It is not clear (to me!) in the spec, but I think that both delarations and usages should be aliased.

Last (for now) question on Struct aliasing: I was assuming that the aliasing only applies to the document being imported and not to daughter documents. i.e., if A.wdl imports B.wdl that imports C.wdl, any struct aliasing in the import statement in A.wdl would only apply to B.wdl and not to C.wdl. Is that correct?

On another issue, is there a set of reference documents that can be used to test the conformance of a parser?

Thanks a lot for your help.

@mlin
Copy link
Member

mlin commented Mar 31, 2021

(To be clear, I agree the spec is ambiguous on some of these points -- I can describe what miniwdl does now, but I don't claim that's normative and I don't know how other WDL frontends handle these cases.)

Does the struct aliasing in an import statement affect only Struct declarations in the imported file or also Struct usages within the file? It is not clear (to me!) in the spec, but I think that both delarations and usages should be aliased.

If I import "mapper.wdl" alias Index as BamIndex, then I would not expect Index to be available for me to use in the document (at least not this one).

I was assuming that the aliasing only applies to the document being imported and not to daughter documents. i.e., if A.wdl imports B.wdl that imports C.wdl, any struct aliasing in the import statement in A.wdl would only apply to B.wdl and not to C.wdl. Is that correct?

I was expecting a slightly different question like "any struct aliasing in the import statement in B.wdl would only apply to B.wdl and not to A.wdl`. Was that what you meant? If so, miniwdl does import the aliases defined in B.wdl into A.wdl, but I don't have a strong conviction that it's right to do so.

@heathsc
Copy link
Author

heathsc commented Mar 31, 2021

I don't think these cases should occur very often, but in the interests of reproducibility there should be a clear 'right answer', so it would be nice if future versions of the spec could clarify this.

For my questions I think some examples might make them clearer

Scenario 1: If I have the following declarations in a single document B.wdl that is imported with the statement import B.wdl alias Car as Car1

Struct Car { String Make }
Struct CarRental { Car CarType }

I would then import Car as Car1 into the global namespace, and I should make sure that when we evaluate CarRental we will be able to find Car even though the name has changed. That is what I meant by 'both declarations and usages should be aliased'

Scenario 2: We have 3 documents, A.wdl, B.wdl and C.wdl.

A.wdl
import B.wdl alias Car as Car1

B.wdl
import C.wdl

C.wdl
Struct Car { String Make }

In this case would the importing of Struct Car from C.wdl be affected by the alias clause in A.wdl so that Car gets imported into the global namespace as Car1? I imagine not, but I could imagine scenarios where this might be wanted.

Again, thank you for taking the time to answer my endless questions. I am working on implementing a frontend and I want to nail down as many of these ambiguous cases as possible.

@DavyCats
Copy link
Contributor

Maybe a bit of a side track this, but perhaps implementing an import X from Y type of syntax would aid in avoiding these types of scenarios.


Personally, I feel the way struct imports work is one of the most (if not the most) convoluted parts of WDL, due to them getting added to a "global namespace" instead of being namespaced like everything else. Another bit of relevant spec:

https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#global-scope

Global Scope

A WDL document is the top-level (or "outermost") scope. All elements defined within a document that are not nested inside
other elements are in the global scope and accessible from anywhere in the document. The elements that may be in a global
scope are:

  • Namespaces (via imports)
  • structs (including all structs defined in the document and in any imported documents)
  • tasks
  • A workflow

Regarding scenario 1, I don't think technically aliasing the usage should be necessary. The structs, when imported, get added to the document's global scope/namespace under the aliased name, but the imported document shouldn't have access to that namespace, I don't think. It has it's own namespace within in which the struct still has the original name. The second Car struct, therefore, also does not exist with the global namescape of B.wdl (since it also is not imported by B.wdl and can't be imported, since that would cause a cyclical dependency).

And for scenario 2, I never even thought about imported structs getting propagated when they are imported in an imported file... I would actually say that in scenario 2 A.wdl should be invalid, since there is no Car struct defined in B.wdl, even if it is part of it's global scope following the import. But I'm being quite pedantic here (see the exact wording in the quoted bit of spec above), I suppose.

@heathsc
Copy link
Author

heathsc commented Mar 31, 2021

Thanks for your comments - I think I am overcomplicating the matter unnecessarily. I agree that scenario 1 should work without explicitly requiring aliasing the usage, and for scenario 2 I will only apply the aliasing to the file that is imported (so B.wdl in this case and not C.wdl). I also agree that in the case where the aliased struct is not present in the imported document (as in scenario 2) this should probably get an error or at least a warning as the user's intention of aliasing can not be carried out if the struct is not present.

@jdidion
Copy link
Collaborator

jdidion commented May 26, 2021

@mlin FWIW wdlTools has exactly the same behavior for structs as you describe in miniwdl.

According to the v1.1 spec: "When a Struct is imported with an alias, it is added to the global namespace only under that alias." So, when importing a document, all of its structs are imported, either under their original name or their alias.

@jdidion
Copy link
Collaborator

jdidion commented May 26, 2021

@mlin @heathsc do you think there is any clarification that is necessary in the spec? If so, let's discuss here and one of us can make a PR.

@mlin
Copy link
Member

mlin commented May 26, 2021

I think it remains ambiguous whether two struct types with distinct declarations but identical members are compatible or not, ie

struct A {
  Int x
  String s
}

struct B {
  Int x
  String s
}

workflow w {
  input {
    A a
  }
  B b = a  # allowed???
}

This single-file case isn't very interesting, but consider if we have numerous struct types imported through a hierarchy of files, with different aliases in them, it can get very confusing to keep track of where the 'original' definitions were...giving a certain appeal to deciding just based on whether the struct members are the same.

@patmagee
Copy link
Member

I honestly think that aliasing was one of my poorer decisions. If we could retcon the struct change I would probably allow proper namespacing in the identifiers. This would completely resolve any ambiguity that we are running into with this whole "global" namespace. Also, I totally forget why that was the case... I believe we were trying to avoid dot identifiers in types for some reason?

So this is probably not a helpful comment, but I would be happy to sponsor a change for 2.0 that brings namespacing to structs. IF we could do it in a graceful non breaking way that would be amazing too

@jdidion jdidion added this to the 1.2 milestone Feb 7, 2024
@jdidion jdidion added this to WDL v1.2 Feb 7, 2024
@jdidion jdidion moved this to Todo in WDL v1.2 Feb 7, 2024
@jdidion jdidion moved this from Todo to In Review in WDL v1.2 Mar 28, 2024
@jdidion jdidion moved this from In Review to Todo in WDL v1.2 Mar 28, 2024
@jdidion jdidion self-assigned this Apr 4, 2024
@jdidion
Copy link
Collaborator

jdidion commented Apr 4, 2024

I propose to add an allowed Struct to Struct coercion, where any struct can be coerced to any other as long as the following are true:

  1. They have the same numbers of members
  2. Their members' names are identical
  3. The type of each member in the source struct is coercible to the type of the member with the same name in the target struct

Will open a PR with this change for review.

@jdidion jdidion moved this from Todo to In Review in WDL v1.2 Apr 4, 2024
@jdidion jdidion moved this from In Review to Done in WDL v1.2 May 15, 2024
@jdidion jdidion closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants