Proposal: change the behavior of secondary indexes wrt NULLs #1001

coffeemug · 2013-06-13T07:44:49Z

I was writing up an explanation of how to use secondary indexes to match regexes and I followed my intuition and wrote this:

r.table('users').indexCreate('test', r.row('first_name').match('S.*'))
r.table('users').between(null, null, {index: 'test'})

I expected the between call to return only the documents where first_name starts with S, but instead it returned everything because we include null into the secondary index. I had to rewrite the query as follows to actually work:

r.table('users').indexCreate('test', r.row('first_name').match('S.*').ne(null))
r.table('users').getAll(true, {index: 'test'})

I think we picked behavior of secondary indexes wrt NULLs and should change it to exclude NULLs. That seems a lot more natural to me.

The text was updated successfully, but these errors were encountered:

coffeemug · 2013-06-13T07:46:02Z

Note that if I just used filter, I'd write this: r.table('users').filter(r.row('first_name').match('S.*')). I'd naturally expect secondary indexes to be congruent with this behavior.

mlucy · 2013-06-13T08:19:37Z

This is more of a between thing than a secondary index thing. If we change this we'll have to pick a new way to denote an empty endpoint for between.

Also, as an aside, making a secondary index with match like that won't really do what you want because each row that matches will have its own entry in the index (rather than being true it will be some object). So even with this change, you could easily get all the rows that don't match a regex, but you couldn't get all the rows that do.

What you really want here is r.row('first_name').match('S.*').coerce_to("bool"), although that's an ugly thing to ask people to type.

coffeemug · 2013-06-13T08:28:44Z

Ah, noted on not being able to get all the rows that do :( We should probably close this proposal then.

However, I don't see what it has to do with between and why we'd have to change the denotation for empty endpoints. Could you clarify?

wmrowan · 2013-06-13T10:11:27Z

I thought we had a rule that index (only primary?) keys could not be null. This is what allows us to use null for empty endpoint in between. If secondary index keys can be null but between is supposed to work on secondary indexes then we already can't use nulls to denote empty endpoints. This is an inconsistency that we have to fix one way or the other.

Given that we decided to go the route that null generally means nonexistence then @coffeemug's original proposal makes total sense. Returning null in the secondary index function means "don't include in the index".

@mlucy why does the query have to be (...).coerce_to('boo')? Why does @coffeemug's (...).ne(null) not work?

coffeemug · 2013-06-13T11:09:03Z

(...).ne(null) does work, FYI.

On Thu, Jun 13, 2013 at 3:11 AM, wmrowan notifications@github.com wrote:

I thought we had a rule that index (only primary?) keys could not be null.
This is what allows us to use null for empty endpoint in between. If
secondary index keys can be null but between is supposed to work on
secondary indexes then we already can't use nulls to denote empty
endpoints. This is an inconsistency that we have to fix one way or the
other.

Given that we decided to go the route that null generally means
nonexistence then @coffeemug https://github.com/coffeemug's original
proposal makes total sense. Returning null in the secondary index function
means "don't include in the index".

@mlucy https://github.com/mlucy why does the query have to be
(...).coerce_to('boo')? Why does @coffeemug https://github.com/coffeemug's
(...).ne(null) not work?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1001#issuecomment-19382996
.

mlucy · 2013-06-13T20:58:04Z

@wmrowan -- that works fine. coerce_to("bool") is effectively equivalent to do{|x| x.ne(null).and(x.ne(false)).

@coffeemug -- we currently disallow null as the value for an index and use it as a special value in between. If you write .between(null, 1) that means "anything <= 1", not "anything between null and 1".

jdoliner · 2013-06-14T01:41:58Z

Using null as a special value in between is just generally a bad idea, we
should stop doing it and use a special value to represent unboundedness. My
objection is that if you're getting the arguments to between from another
expression such as accessing a field of another document its very easy for
a null which meant nonexistence to become a null which means unboundedness
and yield confusing results.
On Jun 13, 2013 4:58 PM, "Michael Lucy" notifications@github.com wrote:

@wmrowan https://github.com/wmrowan -- that works fine.
coerce_to("bool") is effectively equivalent to do{|x| x.ne(null).and(x.ne
(false)).

@coffeemug https://github.com/coffeemug -- we currently disallow nullas the value for an index and use it as a special value in
between. If you write .between(null, 1) that means "anything <= 1", not
"anything between null and 1".

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1001#issuecomment-19424578
.

mlucy · 2013-06-14T22:02:46Z

I dunno, we seem to be moving away from using null as a user-value. If it represents non-existence, it actually makes a lot of sense to use it to represent a non-existent endpoint.

neumino · 2013-06-14T22:15:21Z

I'm strongly in favor of having null as a valid value and change the behavior of between.

Using null to represent unboudedness is just bad design.
It's also a pain for users. null in javascript is an object, not an undefined value.

mlucy · 2013-06-14T22:24:01Z

@neumino -- what signature would you like instead?

jdoliner · 2013-06-14T22:27:42Z

between(r.open, r.open) would work for me.
On Jun 14, 2013 6:24 PM, "Michael Lucy" notifications@github.com wrote:

@neumino https://github.com/neumino -- what signature would you like
instead?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1001#issuecomment-19484895
.

coffeemug · 2013-06-14T22:28:55Z

One moment ya'll -- I was just talking about changing the behavior of indexCreate to drop rows where the key of the secondary index is null (where as right now the row inserted with a null key).

The issue of between is somewhat related, but regardless of what we do with between I think we should change the behavior of index creation.

neumino · 2013-06-14T22:47:07Z

We shouldn't drop rows when the secondary index is null.

I expect

r.db("foo").table("bar").filter( r.row("field").eq( value ) )

to return exactly the same results as

r.db("foo").table("bar").getAll( value, { index: "field" })

Whatever the value of value is.

The syntax of @jdoliner would work for me.

mlucy · 2013-06-15T03:22:44Z

@jdoliner -- I really don't like our new trend toward introducing terms that are only valid inside of one specific other term. It feels really hacky and ugly (although null does too). Can you think of any way of getting rid of null that doesn't do that?

@neumino -- the advantage of Slava's proposal would be that creating an index on an attribute would have exactly the same behavior if that attribute is null or if that attribute doesn't exist in the object. I also think that if we could push people away from using null as a regular value rather than a way to represent non-existence, that that might be a good thing.

neumino · 2013-06-15T03:29:28Z

We could have .after(start_key) and .before(end_key).

For the rest, I still don't see a valid reason not to let people use null. We can of had a bad start with null with our API, but it doesn't mean that we have to keep pushing it away.

jdoliner · 2013-06-15T10:41:46Z

It seems like there's no getting around some added complexity here I find
special purpose values tasteful but you done so here's some other options.

we could have 3 versions of the function between lower_bound and
upper_bound I seriously doubt @mlucy will like that and I really don't
either but its an option.
we could introduce r.min_value and r.max_value. These would be valid in
many other places so it addresses @mlucy's concern I think we should
disallow serializing them as values to disk and maybe returning them as
data to users (since they don't really convert to very usable native things
in the language) this means they're probably more complicated to implement
but maybe it's worth it.

Since we seem robe discussing 2 issues in one thread here I'll now address
the nulls in sindexes issue. I'm all for dropping null values but if I
recall correctly @coffemug you were actually the main proponent of not
dropping them originally. This isn't to say that you've hit your opinion
quota on this issue but normally when we rethink a decision we have some
main proponent reminding us why we originally made it. I forget what it was
but I remember it was pretty convincing and I'd kind of like to have it as
context. Can you remember what it was?

On Friday, June 14, 2013, Michel wrote:

We could have .after(start_key) and .before(end_key).

For the rest, I still don't see a valid reason not to let people use null.
We can of had a bad start with null with our API, but it doesn't mean
that we have to keep pushing it away.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1001#issuecomment-19490568
.

coffeemug · 2013-06-17T07:03:52Z

As far as between and dropping nulls goes, the obvious solution is to use key-based arguments instead of positional ones:

table.between() # returns everything
table.between(start = 'bar') # returns everything after 'bar'
table.between(end = 'foo') # returns everything before 'foo'
table.between(start = 'bar', end = 'foo') # returns everything between 'bar' and 'foo'

This seems ok to me. However, if we end up not allowing nulls in secondary indexes, I don't see what's wrong with jut passing null to between to indicate an infinite bound like we do now.

As for nulls in sindexes, I don't think either solution had a strong proponent. As far as I recall, we wobbled between available options and picked the one currently implemented (which I agree with at the time). I now think we picked the wrong one in light of treating null as non-existence.

wmrowan · 2013-06-17T08:01:26Z

@coffeemug What you just described for between is what we originally implemented. We switched it for 1.5 because we thought the optional argument approach was clunky: table.between({leftBound: 'a', rightBound: 'b'}).

srh · 2013-06-17T08:04:16Z

Can you think of any way of getting rid of null that doesn't do that?

@mlucy What alternative do you think is best?

mlucy · 2013-06-17T20:54:41Z

Alright, here's what I think we should do:

Make null not be in the secondary index because it's confusing.
Leave the current null syntax for 1.7. It's ugly, but if we don't allow NULLs in secondary indexes then it's non-ambiguous, and I think there are more important RQL problems that we need to address first. (We still have dual-inclusive slice, for god's sake.)
Open a new issue to fix between, since it's outside the scope of this issue (we originally got onto it because I misunderstood what Slava was saying).

As for what I think we should do about between, I actually sort of like r.minval and r.maxval. They would be useful in other terms as well. For example, they would fix #955 at the same time; you could get a common prefix by writing table.between([prefix, r.minval], [prefix, r.maxval], :index => ...).

neumino · 2013-06-17T22:20:36Z

@mlucy I didn't get why you think having null in the secondary index is confusing. What's confusing?

mlucy · 2013-06-18T00:26:21Z

Mostly because it would be yet another case where null and non-existence differ. If someone creates a secondary index on a particular field (which I think is by far the most common use case, especially for novice users), then it's confusing to have different behavior when that field doesn't exist vs. when it's null. I could be convinced otherwise, though.

Actually, it turns out that we already exclude null from secondary indices; between is just incorrect because it has a code path from before we had secondary indexes where if you don't specify either endpoint it's just an identity op. We should just fix between to not have that behavior.

mlucy · 2013-06-18T00:45:26Z

This is in review 642 by @Tryneus .

mlucy · 2013-06-18T01:03:29Z

This is in next. Closing because the original issue is satisfied (or rather, it turns out we were doing that all along).

@neumino -- if you still think we should allow null as an sindex value, could you open a separate RQL proposal for it?

mlucy mentioned this issue Jun 17, 2013

RQL Proposal: Make Between not use null to denote unboundedness #1023

Closed

ghost assigned mlucy Jun 18, 2013

mlucy added a commit that referenced this issue Jun 18, 2013

#1001 fixed between bug

6740a8c

mlucy closed this as completed Jun 18, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: change the behavior of secondary indexes wrt NULLs #1001

Proposal: change the behavior of secondary indexes wrt NULLs #1001

coffeemug commented Jun 13, 2013

coffeemug commented Jun 13, 2013

mlucy commented Jun 13, 2013

coffeemug commented Jun 13, 2013

wmrowan commented Jun 13, 2013

coffeemug commented Jun 13, 2013

mlucy commented Jun 13, 2013

jdoliner commented Jun 14, 2013

mlucy commented Jun 14, 2013

neumino commented Jun 14, 2013

mlucy commented Jun 14, 2013

jdoliner commented Jun 14, 2013

coffeemug commented Jun 14, 2013

neumino commented Jun 14, 2013

mlucy commented Jun 15, 2013

neumino commented Jun 15, 2013

jdoliner commented Jun 15, 2013

coffeemug commented Jun 17, 2013

wmrowan commented Jun 17, 2013

srh commented Jun 17, 2013

mlucy commented Jun 17, 2013

neumino commented Jun 17, 2013

mlucy commented Jun 18, 2013

mlucy commented Jun 18, 2013

mlucy commented Jun 18, 2013

Proposal: change the behavior of secondary indexes wrt NULLs #1001

Proposal: change the behavior of secondary indexes wrt NULLs #1001

Comments

coffeemug commented Jun 13, 2013

coffeemug commented Jun 13, 2013

mlucy commented Jun 13, 2013

coffeemug commented Jun 13, 2013

wmrowan commented Jun 13, 2013

coffeemug commented Jun 13, 2013

mlucy commented Jun 13, 2013

jdoliner commented Jun 14, 2013

mlucy commented Jun 14, 2013

neumino commented Jun 14, 2013

mlucy commented Jun 14, 2013

jdoliner commented Jun 14, 2013

coffeemug commented Jun 14, 2013

neumino commented Jun 14, 2013

mlucy commented Jun 15, 2013

neumino commented Jun 15, 2013

jdoliner commented Jun 15, 2013

coffeemug commented Jun 17, 2013

wmrowan commented Jun 17, 2013

srh commented Jun 17, 2013

mlucy commented Jun 17, 2013

neumino commented Jun 17, 2013

mlucy commented Jun 18, 2013

mlucy commented Jun 18, 2013

mlucy commented Jun 18, 2013