Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make array slot default false #189

Closed

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Mar 5, 2024

Problem:

the way to specify an Any shaped array is to do this:

classes:
  MyClass:
    attributes:
      my_array:
        array:

which rocks. The problem is that is equivalent to this from the POV of the python dataclasses:

classes:
  MyClass:
    attributes:
      my_array:

so to make an any shaped array one has to, in practice, do something that is False-y but not None like:

classes:
  MyClass:
    attributes:
      my_array:
        array: {}

Similarly to the semantic difference between None and False in max_number_dimensions, this PR makes the default value for the array slot False, making it distinguishable to None and preserving semantics.

The alternative of keeping the default None and making any shaped arrays be False is desirable from a metamodel consistency POV, but counterintuitive - "what do you mean array: False means the most general array definition possible." Arguably, this is still consistent with metamodel intuition, because the default - as in "when one doesn't specify an array at all" - still means "No array," and specifying array: None is an affirmative signal that we want an array with None restrictions on its shape.

edit: actually now that i'm thinking about this, this also probably wouldn't work, so let's just say this issue is the "we need to figure out how to differentiate between array: None and array not being present on a class definition" PR

cc @cmungall @rly

@cmungall
Copy link
Member

cmungall commented Mar 5, 2024

Yes, it doesn't quite work for the reasons you state. We'd have to make the range of array something like a union of an array object and a boolean.

This isn't the first time we've run into things like this. From a pythonic POV it's frustrating not to be able to distinguish Nones, unset objects, or empty objects. One way to think about it is making the metamodel work for the broadest range of target frameworks - e.g. we want to allow people to store schemas in a relational database.

@sneakers-the-rat
Copy link
Contributor Author

What strategies have worked in the past for this? i'll follow your lead, maybe i should have raised this as an issue rather than a PR...

@cmungall
Copy link
Member

cmungall commented Mar 6, 2024 via email

@sneakers-the-rat
Copy link
Contributor Author

Actually wait this works perfectly: sneakers-the-rat/linkml-runtime@26d2081

what we need isn't a change in the metamodel per se, but greater access to the literal representation of the underlying schema. we already have this place in the metamodel where we have a difference in meaning between None and being unset, this just lets us tell whether that's the case.

I wrote that into the Element class there, but what we'll want to do is add one more class in between Element and YAMLRoot that is also a dataclass, but not generated by the python generator, and all it does is that __new__ method.

So this:

classes:
  TypedUnset:
    attributes:
      array:
        range: integer
        array:

  NonArrayUnset:
    attributes:
      nonarray:
        range: integer

works like this

sv = SchemaView('schema.yaml')
nonarray = nonarray = sv.get_class('NonArrayUnset').attributes['nonarray']
isarray = sv.get_class('TypedUnset').attributes['array']
>>> nonarray.array is None
True
>>> isarray.array is None
True
>>> sv.is_unset(nonarray, 'array')
True
>>> sv.is_unset(isarray, 'array')
False

ezpz

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented Mar 7, 2024

the special casing i think comes from the fact that array isn't a TypeExpression, like we could also have gone with

MyClass:
  attributes:
    my_array:
      range:
        array:
          range: integer

which is a little more awkward but doesn't involve parsing None from unset. it's not too late to do that, and it might be the right call, but i don't have strong feelings there. In that case i guess we'd have to special case that array: {range} can't itself be an array, but that's a more hidden special case

@rly
Copy link
Contributor

rly commented Mar 10, 2024

In my opinion,

classes:
  MyClass:
    attributes:
      my_array:
        array: {}

and

classes:
  MyClass:
    attributes:
      my_array:
        array:
          minimum_number_dimensions: 1

are a bit ugly but not so bad.

I think it would not be so common that someone would want to define an any shaped array with no information about it. The one use case I have seen in NWB is a "scratch" dataset where a user can store whatever array they want, and it is up to the user to keep track of what it represents.

But if we can differentiate between None and unset in code using @sneakers-the-rat 's example above, even better!

@sneakers-the-rat
Copy link
Contributor Author

I think of the metamodel and the resulting schemas as an interface - they are the public contract that the standard makes to the rest of the generators, transformers, etc. So to me the syntax should be designed with authorship in mind, and to make clear expectations of usage and behavior, and then we should do what we can behind the scenes to make that work, so I think the ability to differentiate between "set as None" and "unset" to be able to give a predictable, coherent interface is a challenge worth taking :).

The above fix is actually "truer" to the underlying YAML representation - things being interpreted as None when they are not explicitly set is a feature of the python dataclasses (that i think is correct), but the YAML representation would be able to differentiate between unset and set as None. So this is basically just restoring a missing part of the python dataclass representation in a backwards compatible way.

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented Mar 12, 2024

it turns out this change is a great deal more difficult bc the metaclasses are already assumed to be 1:1 with the JSON/YAML for the purposes of dumping/loading by the dumpers, and the jsonasobj2 package is pretty mysterious with how to set instance-private attributes, so seems like we might need to pay some tech debt on this to implement first.

edit: i got a version that works within linkml_runtime but on testing with the rest of linkml it looks like there are hundreds of places where eg. different generators iterate over vars(metamodel_class), ie. treating the metamodel classes as dictionaries rather than objects that can have private/etc. attributes. Sort of at a loss on this because a lot of it has to do with the use of jsonasobj2 which i still dont' entirely understand the purpose of. it seems like the separation of concerns between 'object' and 'serialization' has sort of been lost there which makes it pretty brittle to modify, so i'm just going to chalk this one up as a loss pending further information bc i don't have time to get to the bottom of it

@rly
Copy link
Contributor

rly commented Mar 15, 2024

From @cmungall : some representations, e.g., JSON-LD, cannot distinguish between unset and set to None/null, so it is not ideal to assign different meanings to those.

@linkml/ndarray-wg decided that array: {} or array: any non-None value represents an any-shaped, any-typed array. We can decide later whether we want to distinguish between unset and set to None/null for the purpose of testing for monotonicity.

@rly rly closed this Mar 15, 2024
@cmungall
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants