NULL/default/contains/etc. proposal #570

mlucy · 2013-04-02T02:13:38Z

Slava, Bill, Sam and I just talked about this. Here's what I think we should do about all of these:

Adopt default with exception semantics. You write either query.default(val) or query.default {|exc| func_of_exc}. I think we should do this because exception semantics are easy to explain and easy to implement (it's a small edit distance from what we have now to these semantics).
Rename contains to has_fields. Make it polymorhpic, where calling seq.has_fields(:a, :b) is equivalent to seq.filter {|x| x.has_fields(:a, :b)}. Our new consistent rule for polymorphism is that terms which return an arbitrary object (like pluck) are polymorphic with respect to map, and predicates which only return a bool are polymorphic with respect to filter. (I don't think most users will ever explicitly learn this rule, but I would bet it has the right shape for their brains.)
has_fields will return true if the object has all of the keys passed to it AND the value of all those keys is non-null. It turns out that for real queries this is what you want most of the time.
Introduce a new primitive (has_attributes, has_names, has_keys, something like that) which returns true if the object has all of the kys (without checking whether the value of those keys is NULL). We don't have to emphasize this in the documentation, but people need some way to do this.

This gives two ways to solve the problem people have where they want to write e.g.:

tbl.filter {|user| user[:age] > 10} # some users don't have `age`!

tbl.has_fields(:age).filter {|user| user[:age] > 10}
tbl.filter {|user| user[:age].default(0) > 10}
tbl.filter {|user| (user[:age] > 10).default(false)}

The text was updated successfully, but these errors were encountered:

mlucy · 2013-04-02T02:30:39Z

Update:

We should go back to the behavior for pluck where r({:a => 1, :b => 2}).pluck(:a, :c) yields {:a => 1}. This is good because the primitive seems more useful that way.
We should introduce a term with_fields where tbl.with_fields(:a, :b) is equivalent to tbl.has_fields(:a, :b).pluck(:a, :b). This is good because it's a very common operation. (What I said to Slava is "we may end up having lots of terms for operating on streams of objects for the same reason that common lisp has lots of functions for operating on lists of lists".)

coffeemug · 2013-04-02T02:40:51Z

I really like this proposal, for all of the following reasons:

It's incremental. Instead of spending months debating various evaluation models and then discovering that they break in some edge cases, don't fit reality, etc. this proposal takes our existing system and converts it to the one that's actually desirable via a small, incremental change.
It maintains safety semantics for people that want them. For example, if we make filter drop rows on error, the users will get confused when they get nothing back because of a silly error somewhere. With this proposal, people can pick between both models extremely easily, and get a safe model by default (which seems very sensible to me).
If we discover additional issues somewhere, we can tune the specific commands to fix them (whereas if we discover additional issues in a new evaluation model, they will quite possibly mean redoing everything).
Last but very much not least, this is easy to do. We can likely have this whole problem go away in just two days of development time.

I'd really like to hear what @al3xandru and @jdoliner think about it.

neumino · 2013-04-02T02:51:27Z

I have a small question. When is default executed? Is it when an error is thrown? Or also when we return null?

mlucy · 2013-04-02T03:54:20Z

Just when an error is thrown. So the second example wouldn't work if you're using the "NULL means missing attribute" paradigm.

al3xandru · 2013-04-02T20:56:59Z

pluck not throwing: a must
the proposed semantic of default looks ok
the proposed semantics of default (and with_fields) don't address the use case of extended variability in the row structure.

For cases where rows show big variability in their structure the query will be littered with defaults. The only option would be to have the users write multiple queries and union them or every other function in the query to be a default.
For addressing the above I suggest a change of with_fields semantics that would allow specifying default values for the fields:
```
r.table('stats').with_fields({ :a => 20, :b => { :n => 32, :m => 70})...
```
If NULL values do compare to all other types we support, then having separate functions for has_fields and has_something_else doesn't make sense. I'd suggest has_fields to just to the check of presence and has_fields(not_null=true) to check for presence and not NULL .

Keeping the number of functions we expose low will make the query language feel powerful and simple. The learning curve is important. The more API functions added, the more complicated things are perceived.

al3xandru · 2013-04-02T20:57:28Z

Our new consistent rule for polymorphism is that terms which return an arbitrary object (like pluck) are polymorphic with respect to map, and predicates which only return a bool are polymorphic with respect to filter.

I'm not sure about users, but I'm not sure I'm getting it either.

jdoliner · 2013-04-02T22:01:23Z

It feels like this really doesn't address the issue that got us here. The original issue as I recall was that people were calling filter with a predicate that failed on some rows due to missing fields. To do this they had to write:

tbl.filter.{|user| user.has_attr(:age) && user[:age] > 10}

And this was annoying to them.

If that was annoying to them are these better?

tbl.has_fields(:age).filter {|user| user[:age] > 10}
tbl.filter {|user| user[:age].default(0) > 10}
tbl.filter {|user| (user[:age] > 10).default(false)}

Maybe marginally but I think they're still going to be pretty annoying. With this solution a project which has very unstructured data is going to wind up writing default(false) a lot in their projects and that's going to add up for them quickly. I agree that it's bad for the default behavior of filter to drop these rows but what if we added a version of filter that does? Tentatively I'd like to call it filterU meaning "unstructured" basically the unstructured versions of functions (map concat_map and maybe reduce could probably also have unstructured versions as well) implicitly drop rows which fail due to missing fields. The point of adding such a function is that people who are seriously using ReQL in an unstructured way have a short hand canonical way to represent this type of operation that isn't going to bloat their code base with ugly code.

jdoliner · 2013-04-02T22:08:22Z

I like the idea of having has_fields be variadic. I think having it be polymorphic is potentially confusing. My first guess for what has_fields would do on a stream is check if there's an element of the stream that matches the one I gave it. It's a pretty big jump for me to know that has_fields on a stream means, "give me only the elements of the stream which have these fields". I think this makes sense if you talk about it for a while but is going to be confusing the people coming in cold turkey. This might be okay though. I think it would be a lot better if I could say:

table.filter(r.has_fields("foo", "bar"))

jdoliner · 2013-04-02T22:48:22Z

I detest the idea of having has_attributes, has_names, has_keys which is mostly the same as has_fields but with a subtly different meaning. I actually don't know what it is that leads programmers to think that using synonyms to express subtle differences in semantics is a good idea. It's terribly confusing.

coffeemug · 2013-04-03T02:16:04Z

Ok, this is workable. We'll have to work out a few naming issues and edge cases around functions, etc. outlined above, but overall, this seems satisfactory. I'm working on scheduling 1.5-spring-2 now. Once I'm done, I'll get all the relevant people together and we'll hammer this out.

coffeemug · 2013-04-03T06:43:27Z

Since we should decide whether we treat null the same as non-existence, we should probably be consistent about it. E.g. if default checks for both errors and null by default, we should probably also redefine comparisons to throw on encountering null.

mlucy · 2013-04-03T06:55:30Z

I think we should throw on any comparison with null and introduce a new is_null term.

jdoliner · 2013-04-03T07:16:01Z

So comparisons throw. What about just on access. For example:

table.map(row["non_existant_key"]) # throws
table.map(row["null_key"]) # doesn't throw?

table.filter(row["non_existant_key"]) #throws
table.filter(row["null_key"]) #throws?

Also adding an is_null term to get around the fact that all comparisons with null throw is insane. People are going to expect to be able to say: table.filter(row["foo"] != null) and have it not throw. This also is really weird when you have objects with null in them. You actually now can't write a function that compares two objects with null in them for equality.

I think the only sane way to compare null is this:

null == null -> true
null != null -> false
null == non_null -> false
null != non_null -> true
All other comparisons throw.

mlucy · 2013-04-03T07:18:35Z

That would be reasonable too.

coffeemug · 2013-04-03T07:20:45Z

👍 for @jdoliner's suggestion. I'm cool with is_null as sugar for == null, but that's obviously superfluous.

coffeemug · 2013-04-03T11:23:20Z

A quick status update: things are a little crazy for me with 1.5-sprint-1 being over, and sprint-2 planning, but I'd like to get together IRL with all relevant stakeholders this week (Friday at the latest), and hammer out the final proposal. We can then implement it next week.

al3xandru · 2013-04-03T20:56:49Z

If I'm reading this correctly now we'll have:

null responding to equality and non-equality comparisons
but null throwing in all other comparisons

So, we end up with 1 special case (non existing attribute) and half of a special case null. Is there any particular reason why we want to tread null as a (semi) special case?

The above proposals puts us very close to SQL behavior for NULL, but with a major difference: instead of always returning false for any comparison of null, we throw.

Introducing is_null also reminds me of SQL IS NULL and HAVING.

I think in my ideal world:

null would be just a normal value with a clear definition for comparisons (basically what @jdoliner defined above plus null.cmp(not_null) -> false; not_null.cmp(null) ->false`)
the only special case would be a missing field which would deal with has_field and with_field(default_values).

chrisabrams · 2013-05-12T21:41:44Z

What's the status on this? I just recently ran into the same issue with properties not existing.

mlucy · 2013-05-12T22:10:26Z

Most of this (default, has_fields, with_fields) is in internal code-review at the moment. Changing the behavior of NULLs so that they throw on comparison probably isn't going to happen in the near future because there are too many things that depend on all RQL values having a well-defined ordering.

The 1.5 release is coming up very soon, so these changes probably won't make it into that, but there's a solid chance they'll be in for 1.6.

chrisabrams · 2013-05-12T22:11:30Z

So for filtering for properties that do not exist, is there anything I can do at the moment to prevent an error?

mlucy · 2013-05-12T22:16:20Z

I might be misunderstanding your question, but you can explicitly use contains to pre-filter the list:

table.filter {|row|
  row.contains(:a, :b, :c)
}.filter {|row|
  row[:a] + row[:b] < row[:c]
}

(contains will be renamed to something like has_fields when all of this gets merged in because so many people find the name confusing)

chrisabrams · 2013-05-12T22:17:05Z

Ahhh. Yes I am one of those people :)

ricardobeat · 2013-05-16T19:54:57Z

Since null needs to respond to comparisons so that orderBy can work, I don't see why it should throw in other operations. Won't the need for pre-filtering result in worse performance?

When dealing with unstructured data I want null/undefined errors to be silent 99% of the time. Mongo has special $type and $exists queries to cater for the special cases when you don't want that.

jdoliner · 2013-10-18T21:45:15Z

How much is this issue helping us right now?

coffeemug · 2013-10-18T22:08:19Z

Good bye, relic of history!

coffeemug · 2013-10-18T22:08:58Z

P.S. it feels really good to close these, so I feel a little guilty closing the ones you put in the work to find. Feel free to reopen and close again yourself :)

ghost assigned mlucy Apr 2, 2013

This was referenced Apr 3, 2013

Add default command #367

Closed

Dealing with empty/single-object streams #362

Closed

Add an existential operator #322

Closed

Add defaults to getattr, and defaults command on streams #27

Closed

[feature] Ignore missing attributes #183

Closed

al3xandru mentioned this issue Apr 3, 2013

Insert behavior change when id = null #498

Closed

mlucy mentioned this issue May 22, 2013

Proposal: r.with_fields #886

Closed

coffeemug closed this as completed Oct 18, 2013

coffeemug mentioned this issue Feb 6, 2015

Make filter less bad #1721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NULL/default/contains/etc. proposal #570

NULL/default/contains/etc. proposal #570

mlucy commented Apr 2, 2013

mlucy commented Apr 2, 2013

coffeemug commented Apr 2, 2013

neumino commented Apr 2, 2013

mlucy commented Apr 2, 2013

al3xandru commented Apr 2, 2013

al3xandru commented Apr 2, 2013

jdoliner commented Apr 2, 2013

jdoliner commented Apr 2, 2013

jdoliner commented Apr 2, 2013

coffeemug commented Apr 3, 2013

coffeemug commented Apr 3, 2013

mlucy commented Apr 3, 2013

jdoliner commented Apr 3, 2013

mlucy commented Apr 3, 2013

coffeemug commented Apr 3, 2013

coffeemug commented Apr 3, 2013

al3xandru commented Apr 3, 2013

chrisabrams commented May 12, 2013

mlucy commented May 12, 2013

chrisabrams commented May 12, 2013

mlucy commented May 12, 2013

chrisabrams commented May 12, 2013

ricardobeat commented May 16, 2013

jdoliner commented Oct 18, 2013

coffeemug commented Oct 18, 2013

coffeemug commented Oct 18, 2013

NULL/default/contains/etc. proposal #570

NULL/default/contains/etc. proposal #570

Comments

mlucy commented Apr 2, 2013

mlucy commented Apr 2, 2013

coffeemug commented Apr 2, 2013

neumino commented Apr 2, 2013

mlucy commented Apr 2, 2013

al3xandru commented Apr 2, 2013

al3xandru commented Apr 2, 2013

jdoliner commented Apr 2, 2013

jdoliner commented Apr 2, 2013

jdoliner commented Apr 2, 2013

coffeemug commented Apr 3, 2013

coffeemug commented Apr 3, 2013

mlucy commented Apr 3, 2013

jdoliner commented Apr 3, 2013

mlucy commented Apr 3, 2013

coffeemug commented Apr 3, 2013

coffeemug commented Apr 3, 2013

al3xandru commented Apr 3, 2013

chrisabrams commented May 12, 2013

mlucy commented May 12, 2013

chrisabrams commented May 12, 2013

mlucy commented May 12, 2013

chrisabrams commented May 12, 2013

ricardobeat commented May 16, 2013

jdoliner commented Oct 18, 2013

coffeemug commented Oct 18, 2013

coffeemug commented Oct 18, 2013