-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IE-0011: New data structures #11
base: main
Are you sure you want to change the base?
Conversation
I've been having a think more about how Maps should store their keys and realised there are several categories of kinds, with behaviours that affect more than just Map keys.
So there are two types of what I'm calling "container" values, those that behave like Lists, and those that behave like Maps. Lots of people have surely been tripped up by the fact that if you pass a list to a phrase which modifies the list, the original phrase won't see the modifications. Would it be better to make Lists act like Maps and modify them internally, with explicit cloning when need? For the block values, I think it might be best to store them in maps as hashed keys. I'm not sure if the existing I7 hashing is good for this purpose or not. Either way, the maps would actually then need to have three lists: hashed keys, actual keys, values. We need to store the actual keys for two reasons: so that they won't be reference counted and deleted, and to allow for the possibility of hash collisions because 32 bits is not a lot at all. We can use the Glulx search opcodes to do a first fast search for a matching hash, and then do a slower direct comparison of the contents. If the contents don't match then use the Glulx search opcodes to look for a second matching hash. By storing hashes then we don't need to compare the contents for every key, only some of them. Perhaps we might even want to change the List kind to have an optional hidden second list for hashes, so that all list searches could be sped up? If these categories were formalised then they could be included in the kind definitions, and we could maybe reduce how much special case code there is as the various template functions could decide what to do based on the kind category. |
I think more games will have a "map" object than will want to use a hash-map. Same with objects named "couple of Xs". And it's worth looking through the existing choice-based extensions to see if any of them define an "Option" kind. As for "any", this is used as a descriptor word in all game code! 9.x gets away with "list of T" as a type name, and a quick test defining "any" as a kind doesn't seem to cause ambiguities either. (In 9.x, I mean.) So maybe this isn't a problem. But we definitely want to test this with test games that have treasure maps, etc. |
Now that I think about it, "an any" is on the jargon-y side; I don't like that in Inform. "Dynamic value" maybe? Similarly, "optional T" instead of just "option". |
It sounds like Null is only useful in an Any. (Which is how JSON would use it.) The only other use I can think of is a "get" operation on Map which returns a value or Null. But that's better expressed as returning an Option. Is "Null" the singleton value or the kind? My Python experience says that you only ever refer to the value. That means "Null" will clash with a game object named "null" (unlike "map" or "list"). More stringent testing needed and maybe a different name. It's possible that this isn't generally useful enough to be in BasicInformKit, and should instead be defined in the JSON extension. |
Yeah we'd definitely want to be careful about what to call all of these. These are the names used in the existing extension, so they compile without issue, but I didn't try using the names inside a real game with lots of other things using potentially conflicting names.
I think that won't conflict as couples must be "couple of X and X" not "couple of Xs"? Now there could be an object using the first form, but it would probably be a little less common than the second.
It is kind of jargony, but it's just so clean sounding in phrases like "map of any to text". "Dynamic value" doesn't sound right to me because Inform has basically no immutability so all values are dynamic. (Also, from the perspective of anys as block values, they aren't dynamic - if you change the value of an any variable you replace it with a new any.) I tried calling it a "box" originally, but that's equally jargony, and even in other languages doesn't always mean that the type can be anything (it doesn't in Rust for example.) I landed with "any" as that's what's used in C++, C#, Swift, TypeScript, etc. I'm not sure if it would be valid, but maybe something like "any-kind" or "any kind" could work? Or "anykind" would definitely be valid, and kind of sounds somewhat natural in "map of anykind to text".
That sort of change would be fine with me. I think I used the nominal form for consistency so that all the added kinds would have nominal names rather than being mostly nominals with one adjectival.
I've found three uses so far of a Null:
The first could perhaps be dealt with by changing the default value to Number/0. If it was decided to not include Nulls in Basic Inform, then yes they could be defined in a JSON extension instead. But it's the final one that I think shows their most use and why they should be included in Basic Inform. A null is useful for null results and null promises, the first being for a phrase which has no return value but could raise an error, the second for a long running process which again has no return value but for which people may want to await. In both cases you could just return a dummy number instead, so it's really for conceptual clarity that a null would be better. Nulls isn't a singleton, it's a kind, but they all equal each other. |
I think all of these kinds would work in Z-Code. It would even be possible to port the Glulx search opcodes to Z-Code, though obviously they'd run much slower. |
A couple comments from Graham:
There's a few ways. Firstly maps are block values, so you can embed them in other kinds, use them as object properties, clone them, nest them, etc, whereas there's only one of each relation. The other big difference is that relations were limited to an arity of 2 (pre-v10). See this recent forum discussion. And relations really wouldn't be suitable for reading in a JSON file, which may have an unknown number of entries and may nest maps and lists to any depth.
This was my attempt at trying to make if let work in natural language. I think some of the if lets work sound nicer than others; the Anys, Maps and Results are all really natural sounding IMHO:
The options are more clumsy because I've stuck with the some/none terminology that other languages use. Maybe it would be better to say set/unset?
But a downside of this is that "unset" sounds more like an uninitialised option rather than an option that is initialised to a null/none value. Maybe we can think of another pair of terms that's even better. We need if lets in order to prevent unsafe code. Languages like Rust and Typescript would raise a compiler error if you tried the following, but Inform 7 (currently) wouldn't be able to:
By combining both of them in one phrase we ensure that the author can't accidentally access data in an invalid manner. And for maps it's also more performant to search and extract the value in one pass rather than test whether the map has a key and then extract the value for that key. The huge advantage of Results in particular is that it forces users of a function to account for the possibility of a failure, they can't just assume that the function was a success. Inform phrases that currently can cause run time errors might be better if they were modified to return results, for example:
And hey! Those are perfect examples of where null results would be useful. Null results are very common in Rust, though it's called the unit type instead of null. It's similar to a
I'd be fine with these, but I wouldn't know how to implement them currently. Would that be a use for block value casting? I don't really know how that works, and the only current use of it that I saw was snippets to text. I like explicit casting phrases ( |
Surely the Inform idiom would be
|
Oh, yeah maybe. But do phrases with parenthesis like that even work? |
I can't make that work nicely for Map lookups, though.
(Reads okay but naming MyMap at the end is poor.) |
I expect this would require the compiler to expose its "...(called foo)" magic in some way. |
One pro for "K option" over "optional K" is that it makes nesting clearer. Is "optional number result" a result containing an optional number or an optional containing a number result? Whereas "number result option" is unambiguous. Nesting results and options is something people will want/need to use. For example, I used it for Promises:
This returns an option containing a K result. If the promise is not yet resolved then it returns a none option. If the promise has been resolved then it returns a some option containing the promise's result. |
I've changed my mind a little - probably for the average Inform author, having these phrases just print an error is okay, and having to deal with results might be overly complicated. So instead of changing those phrases, there could be alternate forms that return results. Then extension authors can call those new phrases and be able to intercept errors their own way, whether that's printing the error or handling it some other way. Results would be most useful for extension authors who are able to silently handle errors. |
Possibly of interest is that Inform (both 9.3 and 10) already defines a |
I started trying to port the extension for Inform 10, but it appears that the constructor kinds are now hardcoded in the compiler, so it's not possible to add new ones in extensions/kits. |
This is a point of the Inter design that I never felt very happy with, and I think it ought to be possible to change it; I hadn't realised the implications. There's no reason constructor kinds need to be hardwired. |
I hope it's not unwelcome for me to chime in over a year later…? I just wanted to comment on the chosen syntax for each of the proposed types. (Though I've ignored promises and mostly ignored closures on the basis of them being both more advanced and more experimental.) AnyIf I'm not mistaken, the word "any" already has a meaning of sorts to Inform, so I think it might be bad to use it as the name of a kind. Maybe this is actually unambiguous with the other meaning though. As to what the other meaning is: To something or other:
if any man (called the slouch) is on the couch:
say "Look at that lazy [slouch]!"; I think that makes it basically one of the "noise words" alongside things like "a", "an", "some", etc. As far as I can tell it's only allowed in conditions though, not in assertions. The only other possibility I can think of for a name though is "any value". I'm also not happy with that "value as an any" syntax, but I don't have a better suggestion. For these two: To say kind/type of (A - any):
To decide if kind/type of (A - any) is (name of kind of value K): I would say articles should be allowed here - "the" before "kind/type" (though maybe that already works since it's at the beginning of the phrase?) and "a/an/--" before the name of the kind. Similarly, in these ones: To decide what K result is (A - any) as a/an (name of kind of value K):
To decide what K is (A - any) as a/an (name of kind of value K) or (backup - K): Is there some reason for the article to be required instead of optional? It might be worth supporting some as an alternate to a/an in all these cases, too. CouplesI don't actually have a comment on couples, but seeing the MapsIs the The For checking keys, I think allowing "contains" as an alternative to "has" would be helpful for programmers coming from other languages, where that function is commonly called "contains". I think it would be nicer if The It's slightly incongruous that getting and setting keys has a shorthand syntax (map => key [= value | or value]) but deleting and checking them does not. For checking, maybe something like "if map => key exists", and for deleting, maybe something like "map => key = none" or "del/delete map => key"? ClosuresI don't have much to say about these, but the term "closure" strikes me as a little strange here. Based on the example in the documentation, they seem more like what I'd call a "coroutine". What I'd call a "closure" does have some similar semantics, admittedly, in that it captures local variables from a stack somewhere, but from the way I understand things, it would usually be local variables from a function that has already terminated. I was going to add an example in Inform-like pseudo-code, but I'm not even sure how to express what I'm trying to say in an Inform-like way. However, it's basically what you'd get from the Maybe that turns out to just be a special case of what you've implemented here as closures – I'm not sure. Since I'm not even sure how I'd express it in an Inform-like way, I can't even tell if your closures offer a way to express it. OptionalsI second not being a fan of the "K option" syntax. I'm not sure if I can think of any good alternatives, but here are a few random ideas I thought of.
I also really dislike the Also, though it works a bit against the list of forms I mentioned above, I notice that there's perhaps a missed opportunity here in that these phrases could instead be defined as adjectives. And maybe even writable adjectives, something like the below pseudo-code, though kind variables don't actually work in adjectives as far as I'm aware (and I don't know if Definition: a value of kind K option is empty rather than full if I6 routine "OPTION_TY_Empty" makes it so (it does not contain a value). Which has the neat feature that you can now write If this was already considered and rejected, perhaps on the basis of "what does That said, defining them as read-only adjectives still remains an option even if one cares about the ambiguity of "what does ResultsSome of the comments about optionals also apply here – they're literally the same as an optional except that the "none" state is replaced with an error message that can be arbitrary text. Especially relevant is the comment about defining adjectives rather than phrases for the basic states of the type. I don't see why a/an can't be optional when casting to a result. Also I think it should be allowed before |
Thanks for your comments @CelticMinstrel. I won't respond in detail because at this stage of the proposal the full details don't need to be worked out. However in regards to naming things:
When I tried that, all variables would get the same map. Ideally it won't be needed in the final version, but it will likely require compiler changes. Ideally we'll also fully work out and codify the details of interior vs exterior mutability, as in #11 (comment). I'd like that to be part of the type system rather than just implementation details. |
Summary
Add some new data structure kinds into
BasicInformKit
.Motivation
Inform has some data structures already, notably the List and the Table, but there are some others that would be very useful to authors for various purposes. These Data Structures were first prototyped in the extension Data Structures by Dannii Willis.
It is proposed that these kinds be brought into the core of Inform for these reasons:
constant-compilation-method:special
be implemented. This meant that most of these new kinds needed to be constructed with long blocks, when they could possibly be more efficient as short-block-only kinds.ANY_TY_Print_Kind_Name
can be constructed by the compiler rather than manually written.if
statements with block scoped variables:Components affected
Impact on existing projects
Could cause issues with any projects that use the new kind names as existing values, for example, many games might have a "map" thing or even a map kind. So should probably be part of a semver-major update.
Proposal
The following new kinds be added to Inform in the
BasicInformKit
:Anys
An any stores a value and its kind; the kind cannot be determined at compile time, but can be read at run time. These are useful for when you want to store multiple kinds of values in one list or map, or for when you don't know what kind some data might be.
Couples/Tuples
A couple is a 2-tuple, grouping two values of any kind. Couples are useful for when you need to return two values of different kinds from a phrase.
I couldn't work out how to implement a generic N-tuple type, but maybe it could be if incorporate into the core. Also, there does exist a hidden
TUPLE_ENTRY_TY
which I don't know anything about.Maps
Maps store key-value pairs. Each map has a set kind for its keys and another set kind for its values, but if you need to store heterogenous keys or values you can make a map using anys.
Nulls
Nulls do not represent any value, but the positive affirmation of a lack of a value. They could be used, for example, when deserialising JSON.
Nulls could possibly be unified with the internal
NIL_TY
kind.Options
An optional value, either nothing, or a value of a specific kind.
Results
A result contains either a wrapped value or an error message text.
Questions
For each of these there is the question of what the best representation is: a short and long block? Short block only? I also never looked at hashing and how that would work for these new kinds. (I'm not sure when hashing is actually used by Inform.)
How should Maps be implemented for efficient searching? Particularly string searching.
Would it be feasible to have general purpose sum types added to Inform, in which case Options and Results would not need any special handling, but could just be kinds supported by default? And Results could return errors other than texts if that was the case.
Should the unchecked phrases actually be included, or should only the safe phrases be supported? (If someone needed unsafe phrases they could perhaps be provided by a separate extension.)