Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array methods for LuaTable have unclear behaviour #11

Closed
jonas-schievink opened this issue Jun 18, 2017 · 9 comments
Closed

Array methods for LuaTable have unclear behaviour #11

jonas-schievink opened this issue Jun 18, 2017 · 9 comments

Comments

@jonas-schievink
Copy link
Contributor

I've noticed that LuaTable::for_each_array_value and LuaTable::array_values have somewhat surprising behaviour. First of all, they don't really match anything in the Lua API (the manual doesn't define what an array is, only sequences are defined). Second, they require the table to exclusively hold natural numbers as keys, something unseen in the Lua API or standard library (which just ignore the non-sequence part of a table).

Another unexpected behaviour occurs when iterating over a table with "holes", like { [1] = "bla", [2] = "qux", [123] = "oops" }, then the missing elements are still taken into account and the closure passed to for_each_array_value is called 121 times with a nil value. While this is pretty much the only usable behaviour since the key isn't passed to the closure, this isn't quite what I expected (eg. ipairs would ignore the 123 key and only iterate over 1 and 2).

My initial solution for this would've been to change the methods to act on the sequence part of the table instead (and rename them accordingly), however, the current behaviour is used by the FromLua impls of HashSet, BTreeSet and Vec, where it does make more sense (we return an error instead of ignoring the non-sequence part of the table). Not sure what's the best way forward for this API.

@kyren
Copy link
Contributor

kyren commented Jun 18, 2017

I'm not sure either, but I agree that the current behavior is wrong. Using lua_rawlen is particularly wrong, because that section of the Lua 5.3 Reference manual says that the builtin length operator on tables can return any of the borders of the table, so I guess the current behavior is effectively random. I feel like every time I read about lua tables my mental model of how it works changes, for some reason I just can't keep the semantics of Lua tables in my head accurately.

I think the least surprising most correct behavior is just to provide a method that returns all the pairs in the table (lua pairs function), and one that returns the contiguous elements starting from 1 (lua ipairs function).

Let me summarize this, and tell me if I've gotten anything wrong:

Suppose you want to return a data type like Vec<Option> to rust. This is effectively impossible, and we should just not try to support this directly at all. There is no way to ever reliably have nil values in a table, because then the table is not a sequence and so the value of the length operator is completely undefined. I know that you can make the array part of the table a specific length and make sure that the final key is non-nil, and then looping over 1 to #t will give you nil values, but this is not sane semantics and is extremely extremely brittle and surprising and not something we should do.

The parts of the API that expect plain sequences to create containers like Vec / Set could do one of two things, either use 'pairs' and ignore the keys, or use 'ipairs', and either one is sort of surprising. I'm kind of leaning towards just using 'ipairs' and being done with it, but it does mean skipping values in non-sequence tables. I don't feel very strongly over one or the other, what seems most sensible to you?

I kind of knew this was wrong when I wrote it, so I completely expected this. In Starbound, this fundamental Lua limitation gave us no end of problems. We ended up having a lot of special types specifically for JSON data that had complex metatables that kept track of nil values and distinguished between array and map, and that sort of helped, but I just don't think there's any getting around the fact that certain rust / c++ types are just problematic for Lua. I agree that we should strive to behave in the least surprising way possible.

@jonas-schievink
Copy link
Contributor Author

There is no way to ever reliably have nil values in a table, because then the table is not a sequence and so the value of the length operator is completely undefined.

I think it's a bit more subtle than this. Storing nil in a table removes the key-value pair, but it doesn't always cause the table to not be a sequence. IIUC, doing t[#t] = nil on a non-empty sequence is equivalent to Rust's Vec::pop (ie. it removes the last element of the sequence). It stays a sequence when doing that. It does not stay a sequence, however, if you set any other integer key to nil (since all integer keys are < #t, this would cause a "hole" in the sequence). Modifying any non-integer key (or any integer key < 0) does not change the is-sequence property :)

I'm kind of leaning towards just using 'ipairs' and being done with it, but it does mean skipping values in non-sequence tables. I don't feel very strongly over one or the other, what seems most sensible to you?

If we want to stay in line with Rust's existing From and Into traits (which we already aren't since ToLua and FromLua can fail, so take this with a grain of salt), ToLua and FromLua must be lossless operations. It seems to me that this is a useful property to have, so the existing implementations should return an Err if the operation wouldn't be lossless.

Thinking more about the impls for sets, there's this Lua pattern for representing sets: Use a table t where the keys are the elements of the set and their values are always true. Adding an element e becomes as simple as t[e] = true, removing it is just t[e] = nil. If someone says "set in Lua" I tend to think about this kind of table, but the FromLua impls expect a sequence containing the set members, effectively deduplicating duplicate values in the sequence (this is lossy, too!).

I'm not sure what's the best way to handle this, but maybe these impls are too confusing and should just be replaced with explicit methods on LuaTable?

The same applies to the impl for Vec<T>. I would've possibly expected there to be an impl to get a Vec<(K, V)>, but there's only one to get a Vec<T>, so maybe that functionality should be moved to a method on LuaTable instead?

We ended up having a lot of special types specifically for JSON data

Ah yeah I remember working with JSON in Lua. The lib I've used defined a specific value used to represent JSON NULL values (I think lightuserdata is useful for this kind of unique sentinel value). But that's offtopic anyways.

@kyren
Copy link
Contributor

kyren commented Jun 19, 2017

I think it's a bit more subtle than this. Storing nil in a table removes the key-value pair, but it doesn't always cause the table to not be a sequence. IIUC, doing t[#t] = nil on a non-empty sequence is equivalent to Rust's Vec::pop (ie. it removes the last element of the sequence). It stays a sequence when doing that. It does not stay a sequence, however, if you set any other integer key to nil (since all integer keys are < #t, this would cause a "hole" in the sequence). Modifying any non-integer key (or any integer key < 0) does not change the is-sequence property :)

Okay, that mostly matches my understanding, thank you. I was aware that setting the entry for the key of #t to nil did not change the sequence property, I was referring more to the general problem of having a sequence with nils inside it being nonsensical, because like you said it would have a hole and no longer be a sequence, I should have clarified that I meant an integral entry < len. But yes, lua tables never have "nil values", because setting a value to nil should be semantically equivalent to removing it, like you said. I was NOT aware that having a non-integer key or key < 0 does not mean that the table is no longer a sequence, interesting.

If we want to stay in line with Rust's existing From and Into traits (which we already aren't since ToLua and FromLua can fail, so take this with a grain of salt), ToLua and FromLua must be lossless operations. It seems to me that this is a useful property to have, so the existing implementations should return an Err if the operation wouldn't be lossless.

I'm not so sure I agree with that in general. It doesn't seem terribly important for the conversions to be lossless, there are just SO many ways for conversions to be a lossy. If you accept a Vec<i64> and lua passes you the table {1, "2", 3}, that is going to be a lossy conversion, but should it be an error?

Thinking more about the impls for sets, there's this Lua pattern for representing sets: Use a table t where the keys are the elements of the set and their values are always true. Adding an element e becomes as simple as t[e] = true, removing it is just t[e] = nil. If someone says "set in Lua" I tend to think about this kind of table, but the FromLua impls expect a sequence containing the set members, effectively deduplicating duplicate values in the sequence (this is lossy, too!).

It's possible that the FromLua implementation for HashSet and BTreeSet should just not exist, but there should instead be functions on LuaTable for collecting the keys and values of the table. Actually, there's a way to safely do iteration over LuaTable by placing the current iterated value into the registry, and I was going to add the capability to do this, but it's somewhat slower than the way it's currently done. The API is already decidedly not zero cost, so maybe the cleaner API is better overall?

The same applies to the impl for Vec. I would've possibly expected there to be an impl to get a Vec<(K, V)>, but there's only one to get a Vec, so maybe that functionality should be moved to a method on LuaTable instead?

I mean, certainly I could add conversions to / from Vec<(K, V)>, but you're right that it starts to get a bit confusing. I would be willing to remove the conversion to Set, but I'm not sure I should remove the conversion to Vec, it's just so so common when building complex APIs to want to accept a list of things, it's a bit obnoxious to omit it. I think though in that sense accepting a Vec and then making the user collect it into a Set is not so bad.

Thoughts?

Edit: Actually, I'm not sure if I can add impls for Vec<(K, V)>, there are no ToLua / FromLua impls for tuples other than (), so it's not actually overlapping, but I'm not confident enough with the impl rules to know if I can add that without it conflicting with the Vec<T> impl, I guess it would conflict right?

@kyren
Copy link
Contributor

kyren commented Jun 19, 2017

a0e83b3 partially addresses what we talked about, but definitely only partially. I think the result is definitely better than the previous situation, but we should still talk about what ToLua / FromLua conversions should be available and also the naming scheme. I'm going to leave this issue open and not push another cargo version just yet until there are a few more passes on it.

@jonas-schievink
Copy link
Contributor Author

That commit looks like a step in the right direction, nice work! The only thing I'm unsure about is that the pairs method returns pairs, but ipairs doesn't yield the "i-pairs" (only the values), but I don't think it's too bad.

If you accept a Vec and lua passes you the table {1, "2", 3}, that is going to be a lossy conversion, but should it be an error?

True, it's losing type information. However it's not losing any information contained in the value (which would be the case when you had a float in there). With floats, even Lua reports errors when you try to do integer operations on them (try 2.3 << 4). Lua seems to be quite consistent here, which is nice: "2" << 4 works, 2 << 4 works, 2.3 << 4 and "2.3" << 4 don't (and neither does "I'm a string" << 4, of course). Same goes for all other bitwise operations.

So I think your example should not be an error, but cause a coercion from "2" to the integer 2. However, if the string was "2.4" (or anything that doesn't coerce to an integer), it should cause an error.

I'm not sure I should remove the conversion to Vec, it's just so so common when building complex APIs to want to accept a list of things

That's true. a0e83b3 also made the conversion work the way I'd expect it to, so let's keep it.

The set conversions, however, I'm still unsure about. I think it's an operation that's rare enough to make the user use .ipairs().collect() (which is now really short, neat!) if they want the current behaviour, or .pairs().map(|k, _| k).collect() if they want to convert the kind of set I mentioned above. The ToLua replacement would look similar to lua.create_array_table(set) or lua.create_table(set.iter().map(|elem| (elem, true))), I'd imagine, which isn't bad, either.

Now, ToLua and FromLua are, of course, not lossless. Their intended behaviour is a bit difficult to specify exactly, but it boils down to "what Lua would do":

  • Coerce strings to numbers
  • When acting on sequences, ignore the non-sequence part of a table
  • Error on precision loss (such as having to convert 2.3 -> 2)
  • ...

(I want to clearly document how those are meant to be used)

@kyren
Copy link
Contributor

kyren commented Jun 19, 2017

That commit looks like a step in the right direction, nice work! The only thing I'm unsure about is that the pairs method returns pairs, but ipairs doesn't yield the "i-pairs" (only the values), but I don't think it's too bad.

I agree completely, but I think the problem is actually the name. ipairs in lua returns the index and value pairs, but the indexes are always contiguous from 1, and in rust the obvious way to do that is with enumerate. It's a bit silly to have the method return the index, but I can't decide which one is more sensible, keep the current behavior and change the name to be less surprising, or make ipairs just return the index pair like lua.

The set conversions, however, I'm still unsure about. I think it's an operation that's rare enough to make the user use .ipairs().collect() (which is now really short, neat!) if they want the current behaviour, or .pairs().map(|k, _| k).collect() if they want to convert the kind of set I mentioned above. The ToLua replacement would look similar to lua.create_array_table(set) or lua.create_table(set.iter().map(|elem| (elem, true))), I'd imagine, which isn't bad, either.

I agree, I'll just remove the set conversion impls, there should only be impls when there is a clear obvious single way to do the conversion.

Now, ToLua and FromLua are, of course, not lossless. Their intended behaviour is a bit difficult to specify exactly, but it boils down to "what Lua would do":

Coerce strings to numbers
When acting on sequences, ignore the non-sequence part of a table
Error on precision loss (such as having to convert 2.3 -> 2)
...
(I want to clearly document how those are meant to be used)

Your description matches what my intuition of what those traits are supposed to do exactly. I'm not sure that currently the behavior for float to int conversions is correct, but I will definitely check on that today and add tests for it.

Can you think of better names for create_empty_table / create_table / create_sequence_table and pairs / ipairs?

@kyren kyren closed this as completed Jun 19, 2017
@kyren kyren reopened this Jun 19, 2017
@jonas-schievink
Copy link
Contributor Author

Can you think of better names for create_empty_table / create_table / create_sequence_table and pairs / ipairs?

Hmm, maybe sequence_values instead of ipairs? pairs seems fine, no need to rename it.

And perhaps create_empty_table should just be create_table, and create_table becomes create_table_from? That makes the simple case the default.

@kyren
Copy link
Contributor

kyren commented Jun 20, 2017

f92e21d removes the Set impls and does the renames we were talking about. I definitely don't think it's perfect now, but it's definitely better than it was and it's at least now consistent. Do you mind if I close this issue until we think of more specific changes we'd like to make?

@jonas-schievink
Copy link
Contributor Author

Sure, looks great, thanks!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants