Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Ignore missing attributes #183

Closed
nviennot opened this issue Dec 25, 2012 · 36 comments
Closed

[feature] Ignore missing attributes #183

nviennot opened this issue Dec 25, 2012 · 36 comments
Labels
Milestone

Comments

@nviennot
Copy link
Contributor

Consider a query with a filter such as:

r.table('users').filter('active' => true, 'age' => 30)

If the users table has a document without the age attribute, the database yells:

Object {...} is missing attribute "age"

I do not wish to use an additional filter user.contains(:age) everywhere.

Can the database treat missing attribute as null ?

@coffeemug
Copy link
Contributor

We had many, many discussions on this and the prevailing idea was to offer as much convenience as possible without sacrificing safety. We thought that treating missing attributes as nulls could get the user into very inconvenient situations, so we decided to be explicit instead. The current proposed improvement is in #27, which will add more operators that will make dealing with missing attributes more convenient.

However, I do think the current behavior is very annoying and the improvements in #27 may not sufficiently offset that annoyance. We've essentially built a very flexible document model, but designed the query language in a way that makes taking advantage of this model inconvenient. This part of the product seems rather unpleasant as a result. I'm moving this to post-protobuf-improvements so we can revisit how to make this better.

@nviennot
Copy link
Contributor Author

Dup of #27

@coffeemug
Copy link
Contributor

Reopening for the time being because I think we might need a deeper discussion here than just adding convenience commands.

@nviennot
Copy link
Contributor Author

When using a filter, for any attribute that is read, I would like an implicit and doc.contains(attr). Is that even a good idea?

@coffeemug
Copy link
Contributor

That's in the eye of the beholder :) We'll consider various options to make this more comfortable as part of this issue.

@pixelspark
Copy link

Would certainly be nice to have a shortcut for this in ReQL!

@jdoliner
Copy link
Contributor

Could we please consolidate this in to one issue with #27 by adding whatever this has that #27 doesn't to #27 and closing this one. Having 2 copies of basically the same issue is just adding mental overhead. I don't quite understand what this one has #27 doesn't @coffeemug could you clarify and I'll migrate it over?

@coffeemug
Copy link
Contributor

#27 proposes specific commands to make dealing with this behavior more comfortable. I think we should reconsider the behavior of the evaluator, consider evaluating missing keys to null, and leaving enforcing to constraints that we might implement later. That's distinctly different from #27, though it does make a defaults command less important.

@jdoliner
Copy link
Contributor

So treating missing attributes as null is actually a bad idea. I think the most compelling reason is that comparison functions work on it so if missing attributes defaulted to null then:

table.filter(lambda x: x["foo"] < 5)

would return documents in which "foo" is either a number less than 5 or is missing. Which is not a behavior I think anyone would expect. Note that we need comparison with null to work otherwise grouped map reduce will fail if there's a null group. Basically we need to choose between null being used as a value for users or a value that we use to indicate internal errors and we're already a couple steps down the road it being a user value. And I think that's the right choice because it's part of JSON.

So here's how I understand the desired behavior:

Thus far I've only seen people want default attributes in filter I think in most other cases where you're using the value for something you want an error. For example dropping a row from a map reduce because of a missing attribute seems liable to confuse. So let's say for a first approximation the desired behavior is that filter applies a function to each row, true and false have the same meaning as before however if anywhere during evaluation of this function a missing attribute is accessed then the row is excluded.

This sounds a lot like an exception which filter implicitly catches and treats the same as the function returning false of course that means this is a somewhat heavyweight feature.

Another option would be something akin to haskell's bottom. Basically bottom is a special type that indicates "this computation errors." If bottom is passed in to a function then it will return bottom so it gets propagated up and then filter could "catch" it by considering it akin to false. However this can lead to some confusing behavior if you have nested filters.

@pixelspark
Copy link

Why not default to 'undefined' instead? In JS, null !== undefined (I'm not sure how that maps to other languages), and any expression where either side equals undefined will evaluate to false (except perhaps for undefined===undefined?). That means that x("foo").gt(5) will evaluate to false when the attribute 'foo' is missing. For numbers, undefined is similar (but not equal to!) NaN.

@jdoliner
Copy link
Contributor

Bottom is alot like undefined except that it propagates itself. I think undefined is basically the correct idea except that it still has a few confusing behaviors. Most notably under negation for example suppose someone wanted to check if an attribute were within a range they might do:

filter(lambda x: low < x["foo"] & x["foo"] < high)

This returns false if "foo" is undefined which is what we want. However now suppose they want to filter to values outside of the range. They'll likely just invert the original predicate:

filter(lambda x: !(low < x["foo"] & x["foo"] < high))

And now they're in a bit of trouble because this returns true if foo is undefined which I think is unexpected. However with bottom this would still return bottom which means it wouldn't be included in the filter.

Undefined also does indeed compare equal to itself both under == and === so

filter(lambda x: x["foo"] == x["bar"])

would return documents for which both "foo" and "bar" are undefined which strikes me as unexpected.

@neumino
Copy link
Member

neumino commented Dec 27, 2012

A simple solution would be to return false for any comparisons as soon as something is undefined.

Drawbacks are

  • You cannot do filter(x["foo"] == undefined) -- But in thie case they should use .contains("foo")
  • The false will propagate, so filter( x["foo").gt(1).not() ) is going to returns results where foo is not defined which might be unexpected

@pixelspark
Copy link

@neumino, your first point is not really an issue if the user needs to enable the 'silent errors' explicitly. As for the second point, I agree with @jdoliner; it's best if some error value (e.g. 'bottom') would propagate through the whole expression, such that x("foo").gt(1).not() returns 'bottom' if x("foo") evaluates to 'bottom'.

@coffeemug
Copy link
Contributor

I think special-casing filter isn't ideal. Special casing something like this has a bad smell, also there are other functions where we know we want this behavior (e.g. pluck), and if more functions arise, we're going to have to add more special cases.

FYI, SQL essentially defines 'bottom' logic the way @jdoliner proposed. They treat null as "unknown value" and any operation on an unknown value returns another unknown value. I think some parts of SQL treat unknown values differently because in those cases it basically destroys the entire result set (which isn't always desired), so they got some criticism for having semantics with special cases.

In our case, here are some issues with bottom that would be nice to work out:

  • How do we propagate bottom to the user? If the user says table('foo').map(r.row('bar')).run() and 'bar' is occasionally undefined, what does the user see? (we could drop those rows, cast them to null, or extend json)
  • In case of group map reduce, bottom has somewhat interesting properties. On the one hand, if the group map reduce query is trying to compute multiple values and for one of those values some attributes are missing, the query will complete, and for the parts that are always present it will compute a value, while for the parts that aren't present it will return bottom. However, for most common reductions (e.g. sum/avg/etc.) our current behavior is superior because the exception short-circuits the query. Suppose the query runs into a missing attribute early on. With bottom, it will still have to go through all of the computation before returning a useless result (bottom). Furthermore, in this case the user won't know how to fix it because instead of receiving an offending document along with the exception, they get nothing.

I think one way to get around the second problem is to implement bottom semantics, but then have some commands (such as group map reduce) check for bottom in the reduction at every step and throw an error (or, for these commands, we can let user specify what to do in case of bottom -- e.g. keep going, throw, ignore the row). (I know this is a special case, but somehow it seems less bad to me -- perhaps because in this case we implement good shared underlying semantics, and have the command rely on it in an advantageous way -- e.g. if we had macros, users could define commands this way too. Giving this option in case of group map reduce is also a good idea because people legitimately want different behavior in different cases with this command, where in case of filter they almost always want one behavior.

@jdoliner
Copy link
Contributor

I think the special casing basically comes from a desire to have different defaults for different operations if we want to avoid special casing then it comes at the cost of making people explicitly say when they want to drop error rows I'm down for this but as you say people basically always want one behavior with filter and I think it's basically either special casing or an undesired default for one of them.

Bottom (haskell notation is to use the symbol ⊥ so I'm going to save myself some typing) just gets propagated as an error. Part of an instance of ⊥ is a textual description of why the computation will error if run. If functions are both lazy and pure. Which all of the functions we're talking about in these examples are then ⊥ is actually equivalent to an exception in that it short circuits operations, basically once you get ⊥ you stop reading of the stream so the rest of it doesn't get computed. However we should bear in mind that this short circuiting doesn't actually save us much computation due to sharding. We still compute the full thing for shards without errors and then throw that result out when we discover one has an error.

@coffeemug
Copy link
Contributor

I think the special casing basically comes from a desire to have different defaults for different operations

I think there are two types of special casing. One is when we arbitrarily change internal semantics to get the behavior we want. An example of this is throwing an error everywhere except in pluck and filter. I think we're all in agreement that this type of special casing is really bad.

The second type is when we introduce a primitive (such as bottom) that supports all possible use cases we might want to support, and then have library functions have different behavior built on top of this primitive. This seems perfectly reasonable to me because the behavior doesn't change arbitrarily -- the same behavior, for example, is available to the user (e.g. they could check for bottom in their expressions and do different things based on it, such as throw an error, while they can't catch errors and make decisions on top of them right now).

Part of an instance of ⊥ is a textual description of why the computation will error if run.

Ah, makes sense.

once you get ⊥ you stop reading of the stream so the rest of it doesn't get computed.

Ok, that makes sense too.

So then, how do we represent bottom on the client?

@kareemk
Copy link

kareemk commented Dec 28, 2012

Per my comments in #197 rethinkdb shouldn't throw an error if an attribute is missing. Given that rethinkdb is schemaless and there are no restrictions on the structure of the data that is inserted then querying should assume that missing attributes are not done in error and skip them. With a schemaless db these type of integrity constraints are the responsibility of the application by definition otherwise the DB should require support of a schema to ensure integrity at the write not the read.

@mlucy
Copy link
Member

mlucy commented Dec 28, 2012

I think in lieu of representing bottom on the client, we should just throw an error if we would otherwise have to return bottom to the client (with bottom's error message, of course). So if somebody wrote:

tbl.filter{|row| row[:missing_attr]}

they wouldn't get an error because filter would handle bottom. But if somebody wrote:

tbl.map{|row| row[:missing_attr]}

then they would get an error, because the map would return bottom and we can't send bottom to the user. But if they wrote:

tbl.filter{|row| row[:list].map{|row2| row2[:missing_attr]}}

then they wouldn't get an error, because the map would return bottom and then the filter would handle it.


As an aside, since we want bottom to short-circuit evaluation for efficiency reasons, this really seems more like throwing and catching an exception in my head. Also, we should probably call it the error type or something instead of bottom, since we're only using it for errors and it's a clearer name for people who don't know Haskell. (Also, maybe it would make sense to change r.error to return this error type?)

@jdoliner
Copy link
Contributor

Throwing an error on the client was how I imagined representing bottom. Actually this is sort of what we're doing right now the only thing we don't have is a way to recover from bottom.

Bottom really is the functional equivalent of an exception. I don't think we want to have true exceptions here's why: We have some imperative parts of our language where bottom won't work like exceptions. For example bulk inserts. Here if one the items to be inserted returns bottom it won't behave like an exception (or at least how one might reasonable expect an exception to behave) in that it won't prevent the later insertions from happening. Making it so it did would be a pain and cost us some performance. Basically I think that exceptions are a little bit too heavyweight of a feature to fit in to our language right now, and perhaps ever. Particularly because they're going to make us sacrifice performance in some places.

@jdoliner
Copy link
Contributor

It still seems like there are 2 points of view being expressed here that aren't really addressing each other directly. Here's what they are as far as I can see it:

One point of view which I believe is held by @kareemk is:

rethinkdb shouldn't throw an error if an attribute is missing.
querying should assume that missing attributes are not done in error and skip them.

I'd like a bit of clarification on what this means. In particular skipping as far as I understand it means skipping the row in question this has a concrete meaning with a filter. That if evaluating the predicate creates such an error this row is dropped. However what does it mean in other cases? In particular what does the following return?

r.expr({})["foo"].run()

Similarly what if a missing attribute is encountered over the course of a map reduce? Perhaps if it's encountered while mapping a single row it might make sense to drop that row (although I think that's dubious) but if it's encountered while reducing the only thing to do I think is error.

Another point of view which I believe is held by @coffeemug is:

we introduce a primitive (such as bottom) that supports all possible use cases we might want to support, and then have library functions have different behavior built on top of this primitive.

The question I have is what exactly this looks like in the API. First if I understand correctly under this scheme should a vanilla filter fail to find an attribute then it will throw an error (being a representation a clientside representation of bottom) is this correct? Second how might error recovery syntax look? Something like:

table.filter(..., drop_errors=True)

seems like the most obvious option to me although it's a bit clunky. We could also maybe add an on_error function. Then the filter sementics would be implemented as:

table.filter(f, drop_errors=True) -> table.filter(lambda x: on_error(f(x), False))

These are just my off the top of my head ideas for the primitives. Others probably have some better ideas.

I apologize if I've snipped away too much context with these quotes and attached beliefs that aren't actually held. This was just my impression of the 2 sides. However I think we need to make both of these ideas more concrete and eventually pick on before we can actually proceed on this issue.

@kareemk
Copy link

kareemk commented Dec 30, 2012

I'd like a bit of clarification on what this means. In particular skipping as far as I understand it means skipping the row in question this has a concrete meaning with a filter. That if evaluating the predicate creates such an error this row is dropped. However what does it mean in other cases? In particular what does the following return?

r.expr({})["foo"].run()

r.expr({})["foo"].run() should return undefined/nil per the JS (and Ruby) implementation of hash even though Python throws a KeyError. I'm assuming here that JS is the core language for RethinkDB.

@kareemk
Copy link

kareemk commented Dec 30, 2012

Following up on my previous comment, if RethinkDB is a schemaless database then the responsibility lies with the application developer to ensure that a comparison against nil due to referencing a missing attribute should never happen and branch the code appropriately.

@coffeemug
Copy link
Contributor

The question I have is what exactly this looks like in the API.

I think we should pick the most desired behavior for every function and use it when it makes sense. For example, filter and pluck would internally test for bottom and treat is as false (and therefore simply omit the offending rows from the resultset). Other primitives, such as map may not do that (I don't actually know what map should do, I'm just using it as an example here). Basically, I propose we look at the API function by function, and pick the most sensible behavior for that function. Most likely, we can leave all behavior as is and change a small set of functions (such as pluck, filter, and possibly groupMapReduce) for the first pass.

When I said I think we should introduce a primitive that supports all use cases, I was making a philosophical argument. I think it's inelegant from the PL perspective to have some functions (such as filter and pluck) behave one way, and other functions (such as map, etc.) behave in a different way, without having an underlying mechanism users can use to easily implement the same behavior. Bottom provides such a mechanism. Having full-blown exception handling provides it as well. (Personally, I like bottom much more for ReQL, but that's a separate discussion).

On a different note, as far as throwing an error if bottom propagates to the client, what if bottom is nested inside a document (e.g. { foo: 1, bar: bottom })? This is important in examples like this:

...groupBy(user, r.sum('age'), r.avg('purchases'))
=> { grouping: ..., reduction_age: 30, reduction_purchaes: bottom }

I'm glossing over a few issues (groupBy doesn't support multiple reductions, and purchases is presumably an array which means it isn't clear what the average reduction even does) but that's besides the point here. In cases like this, we want the user to get as much information as possible, so I think just throwing an error because some part of the document has bottom isn't ideal.

@jdoliner
Copy link
Contributor

jdoliner commented Jan 2, 2013

So this may be an unpopular view but I feel like argument against special casing has some usability value beyond its philosophical merits. Having different behaviors for different functions runs the risks that the "best" behavior might not seem like the best one to everyone so it can introduce an extra piece of information you need to know to effectively use the language. An alternative which I suspect people will find ugly is to have a version of each of the functions which will drop the offending rows. Specifically filter would error on missing attributes. filterD would drop rows with missing attributes. We could have, mapD, concatMapD etc.

The sugar could also go the other way having the default of all of these functions (which are all basically concatMap) could be to drop rows that error but provide a pedantic version that errors if any of the rows error. I definitely think there should be a way to make the database error if an attribute is missing.

In my mind the value of having each function have the same default outweighs the cost of having some of the functions (the minority I'd assume because we can pick which side gets the sugar) require an extra character to get the most commonly desired behavior. Having less to think about just seems more valuable to me.

I consider pluck to be a different thing. Basically I think that pick is just sugar for:

pick(obj, attrs) -> object.filter(lambda k, v: k in attrs)

So dropping nonexistant attributes is the default behavior.

I was thinking that nested bottoms would just turn the whole expression into bottom. Returning nested bottom I think brings up a whole host of issues because then you can't just throw you need to have a way to represent it in the clients which complicates things. Also it's going to make it a lot harder to short circuit because we need to know a lot about where the result winds up to know what to do. Basically it seems like it's going to make the implementation tougher and giving people good ways to drop bottom infected is less complicated way to let people see parts of results.

@mlucy
Copy link
Member

mlucy commented Jan 2, 2013

I like the idea of using bottom, but not nesting it or returning it to clients, and having filter skip rows that return bottom. The documentation for filter can say "selects all rows where the predicate returns true, ignoring any errors". I think in general functions that do a bunch of things should be resilient against one of them failing -- batched updates and inserts already keep going when they encounter an error, for example.

@coffeemug
Copy link
Contributor

Ok, agreed that consistency in default behavior outweighs the confusion associated with different behaviors in different cases. Also agreed that dealing with passing nested bottom to the client is an unnecessary complication.

However, I think that doubling the number of functions will unnecessarily complicate the API. I'd much rather put the database into a "strictness state" by passing a flag on the connection, and on .run() function. It will be a little less granular, but I think there is no evidence that such granularity is required. If/when we find out we need such granularity, I think we'll need a more general purpose way of expressing it (for example, by passing optional run=... arguments to each command that override the arguments passed to run), but I don't think we need to deal with that case for now.

Also, I agree with @kareemk, I think the default behavior should be to skip over bottom. What we have now is pulling the product in two different directions (i.e. a flexible schema that one can't comfortably take advantage of because of a rather inflexible default error handling policy). If the user actually wants this behavior in production, they can get it by creating connections this way (and they'll be able to get further safety once we get to implementing constraints).

@jdoliner
Copy link
Contributor

jdoliner commented Jan 3, 2013

I'd much rather put the database into a "strictness state" by passing a flag on the connection

I may be misunderstanding this but I think having a strictness state for the database is a bad idea because we'd need to store it somewhere and distribute it to different machines which gets in to all sorts of corner cases. Plus it has the potential to seriously screw you over because someone using the database for an entirely different purpose could change this flag.

Strictness makes a lot more sense to me on a connection basis (which might have been what you meant I'm not sure).

We also should probably return a summary which mentions how many rows were dropped due to errors. Maybe even which steps they were dropped on. This would play in nicely with what we're doing in #194.

@jdoliner
Copy link
Contributor

jdoliner commented Jan 3, 2013

Another thing worth talking about is what happens with get when it doesn't find a row. Right now it returns null. I'd argue it actually makes more sense for it to return bottom. This is nice because it's consistent. Also this plays really nicely with eq_join. Right now eq_join desugars like so:

stream.eq_join(attr, table)
# desugars to
stream.concat_map(lambda x: let({"row" : table.get(x[attr])}, branch(let_var("row") == null, [], [{"left" : x, "right" : let_var("row")})))

Which is messy, if get return bottom then it could just desugar to:

stream.map(lambda x: {"left" : x, "right" : table.get(x[attr])})

The elements that don't find a matching row get dropped automatically due to bottom semantics and it's all very nice.

To do this we would need to have more granular strictness settings. Because if someone sets a connection to be strict they won't expect it to change the meaning of eq_join. This is evidence that having more granular control will make some of our backend implementations nicer but if we're going to have that level of granularity I don't see much harm in giving people a way to set the strictness per operation as well as per query. We can at least have it in the protocol buffers and languages like python which handle optional arguments really well can expose it while maybe languages that don't will mask it. Also I suspect that when we have developers writing rethinkdb libraries they'll want to have this level of control so it really just seems like a small cost to throw the flag in there now.

@coffeemug
Copy link
Contributor

I may be misunderstanding this but I think having a strictness state for the database is a bad idea because we'd need to store it somewhere and distribute it to different machines which gets in to all sorts of corner cases.

Right, I think for the first implementation we should pass it on the connection, and on run(). If we ever need to deal with it on per-database level, we can deal with it then.

Another thing worth talking about is what happens with get when it doesn't find a row. Right now it returns null. I'd argue it actually makes more sense for it to return bottom.

Agreed.

To do this we would need to have more granular strictness settings.

Could you explain this in more detail? I don't think the strictness setting should affect the behavior of get. I think it would make sense for get to always return bottom, regardless of this setting (similarly to how if and add would always return bottom regardless of this setting). Then, when the setting is lax, eq_join will skip the offending rows and return only the ones that are present in the right table. If the setting is strict, we can pick a behavior for eq_join that we think is consistent with the rest of the system. It's conceivable that the user might want a different behavior in some cases, but I don't think it's worth complicating the implementation for this unless there is strong evidence we actually need it.

I don't know how much of a burden a per-op setting on protocol-buffers is, but if it's a pain, I don't think it's worth bothering with it for now.

@jdoliner
Copy link
Contributor

jdoliner commented Jan 3, 2013

Strictness setting wouldn't affect the behavior of get. It would however affect the behavior of map and thus of eq_join. I'm operating under the assumption that users will be surprised of eq_join fails when the table doesn't have a matching value. I think that's the typical definition and the purpose of things like outer and inner join is to handle this situation, errorring will be unexpected though in my opinion. It's keeping this behavior simple that requires the granularity on the backend.

I don't know how much of a burden a per-op setting on protocol-buffers is, but if it's a pain, I don't think it's worth bothering with it for now.

It's very simple.
Edit: At least I think it is.

@kareemk
Copy link

kareemk commented Jan 4, 2013

@coffeemug I really like your proposal of making this a connection level setting that makes perfect sense and your approach of keeping the API as simple as possible (from the users's perspective) until there is demand from users for other features/options.

With respect to throwing an error (returning bottom) I actually think that if you attract a large swatch of the mongodb crowd, which I think you have a very good chance of doing, I have a strong hunch you won't see many requests for this.

@coffeemug
Copy link
Contributor

I'm operating under the assumption that users will be surprised of eq_join fails when the table doesn't have a matching value.

Ahh, I see. It would be pretty easy to desugar to a branch that tests for bottom, which IMO is "righter" in this case than introducing a per-op argument (it doesn't feel like the right abstraction in this case, but I don't have a strong argument to support this).

@jdoliner
Copy link
Contributor

jdoliner commented Jan 4, 2013

Actually having thought about it more I think desugarring to branch is a better idea. A strong argument is that missing attrs can come from different places. In the case of eq_join you can get it if an element of the stream is missing the attribute. This is a case where with strictness turned on you'd expect it to error but if we desugar to an error dropping map then it won't. So it seems like it is a bad abstraction.

@AtnNn
Copy link
Member

AtnNn commented Jan 15, 2013

I would prefer having defaults (#27) rather than having this new primitive. Also, I strongly suggest not calling it bottom.

The real meaning of bottom in type theory is very different from what you are calling bottom. It is a type that has no value. It is not a value that can be returned, and in a strict language it can not be passed as an argument. Haskell itself, even if it is a non-strict language, does not have bottom.

A nicer way to name this new primitive with concepts from functional programming is to call it Nothing. It could also just be called undefined and have different semantics than in javascript.

@othiym23
Copy link

There has been a similar discussion in this thread related to refutable destructuring assignment in EcmaScript 6 (thread starts here), and the name they've given a non-signaling empty value is nil. By "non-signaling" I mean that it's like the default behavior of NaN as it moves through a series of floating-point operations in JavaScript: once an operation has yielded a NaN, the final outcome of the set of operations is always going to end up as NaN. nil differs from null or undefined in that it's a version of the null object pattern, similar to Objective-C's nil.

Instead of thinking of what we're talking about as a "default value", I think it makes more sense to think of it as a hole or bubble that pops up in data sets as we perform operations on sets of incomplete, structured values, analogous to the role that NULL plays in SQL's OUTER JOIN semantics (not that SQL should be a model for anything Rethink does -- please don't introduce ternary logic into Rethink).

I think it can be sane as long as a few caveats are heeded:

  1. nil / NULL / unit (I agree with @AtnNn, calling it bottom is going to confuse anyone with a background in type theory or FP) is a byproduct of an operation, not a value
  2. it cannot be assigned to a variable (again, it's an absence, not a value) or used as a comparator
  3. the semantics of what happens as it bubbles through a chain of operations enables the kinds of existential queries that started this issue to complete in a defined, deterministic way without errors. The simplest way to do this is to say that any filter or projection operation that queries on an attribute that is not present in one or more of the objects in the collection will always omit those objects, and define it such that operators like >, !, and == will always fail when tested against this internal non-value

The point of all this rigamarole is to allow people to store objects with values that are null or undefined without overly complicating the semantics of the query system. The easiest way to do it is to define a private / internal symbol (or, as atnnn suggested, use a Nothing type that only contains the Nothing 'value') that is used inside the query engine. The nice thing about using a symbol is that you can return intermediate datasets during scatter / gather operations over the data store that contain that symbol.

@coffeemug
Copy link
Contributor

This issue has been subsumed by #570. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants