Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contain for array #211

Closed
neumino opened this issue Jan 10, 2013 · 15 comments
Closed

Contain for array #211

neumino opened this issue Jan 10, 2013 · 15 comments
Labels
Milestone

Comments

@neumino
Copy link
Member

neumino commented Jan 10, 2013

It would have solve this problem: https://gist.github.com/4500622

I think we talked about that once with @coffeemug @mlucy and @wmrowan but I couldn't find an issue on github.

Tagging as post protobuf improvements

@coffeemug
Copy link
Contributor

One way to do this is as follows:

r.expr([286108290734252030]).map {|id| r.table(table).get(id)}.filter {|t| t[:entities][:urls].filter{|u| u.contains('expanded_url')}.count.gt(0)}.run

@coffeemug
Copy link
Contributor

So there are a few questions here (that I can think of):

  1. What can we do about the map hack (we really need an in/member operator, but if we just do that, it won't help people access documents via a primary key -- e.g. how do I get docs with ids 1 or 2 or 3 by Pkey)
  2. Can we make json specific functions polymorphic to work on streams (e.g. what would it mean for contains to work on a stream? -- presumably a stream of booleans, but I guess this isn't very useful)
  3. Is there anything we can do to make nested filters (like above) more convenient?

@wmrowan
Copy link
Contributor

wmrowan commented Jan 14, 2013

This is the best I could do with @charl's example:

table.filter(function(row) {
    return row.contains("entities").and(
        row('entities').contains('urls').and(
            row('entities')('urls').map(function(url) {
                url.contains('expanded_url').and(
                    function_of_expanded_url(url('expanded_url'))
                )
            }).reduce(false, function(acc, val) {
                return acc.or(val)
            })
        )
    )
}).run()

There are two things I can think of to make this simpler. The first would be to implement a r.default or r.catch function that evaluates its first argument but returns its second argument if the first throws an error. This would allow us to get rid of all our contains checks. The second would be to implement all and any on sequences to replace the map reduce combo here. Here's what the example would look like with these new functions:

table.filter(function(row) {
    return r.catch(
        row('entities')('urls').any(function(url) {
            r.catch(function_of_expanded_url(url('expanded_url')), false)
        })
    , false)
}).run()

@neumino
Copy link
Member Author

neumino commented Jan 14, 2013

r.catch would be awesome!

@coffeemug
Copy link
Contributor

We don't need r.catch because of changes proposed in #183.

@neumino
Copy link
Member Author

neumino commented Jan 14, 2013

I was thinking about using r.catch with r.error.
It's probably not its first purpose, but if we can throw/catch error, we can break a map when we have what we need.

@wmrowan
Copy link
Contributor

wmrowan commented Jan 14, 2013

Or perhaps we don't need the changes proposed in #183 because we have r.catch? To be honest I really don't understand why we need bottom as distinct from an error. The only difference I can see with bottom is that we can test for it, which is exactly what r.catch does.

@mlucy
Copy link
Member

mlucy commented Jan 15, 2013

From an ease-of-explanation perspective, bottom might be a weird functional thing to a lot of programmers, while basically everyone understands throw/catch-style exceptions.

@al3xandru
Copy link
Contributor

Here is how I'd like to write the original query:

table.filter(function(row) {
   return row('entities')('urls')('*')('expanded_url').eq(value)
})

I'd be OK with using something else instead of ('*') to indicate that the match should be done for each element of an array.


Gave it a bit more thought and instead of ('*') a more generic solution would be to have a function that can either run against all items in an array or against the item with a specified index.

Example:

row('entities')('urls').array_item('*')('expanded_url')

or

row('entities')('urls').array_item(23)('expanded_url')

@jdoliner
Copy link
Contributor

So bottom is more of an implementation detail than anything else. With a bottom type one could easily implement catch as

lambda x,y: branch(typeof(x) == bottom, y, x)

which we could have sugar for. So you'd actually get to say:

catch(x, y)

which would return x unless it errored in which case it would return y.

I don't think there's any reason users will ever have to see the word bottom. They'll see "Evaluation returned an error ..." and they'll get to say catch when they want to handle errors themselves.

To be honest I really don't understand why we need bottom as distinct from an error.

bottom is the type of error. Sorry this wasn't clear from #183.

I'm a bit dubious about having a throw function. The problem with throwing is that people have the expectation that they can use it to abort an imperative process which due to the parallelism of the system is an expectation we can't meet. For example if we had throw I feel like someone might write the code:

table.order_by("foo").update(lambda x: branch(x["foo"] == 20, throw("end"), {"foo" : x["foo"] + 1}))

Seems like a potential way to update the keys in which foo is less than 20 but it won't work. (This query has some other problems with it but hopefully it illustrates my point.

@coffeemug
Copy link
Contributor

BTW, if we had to expose it to the user, we wouldn't expose the word bottom, we'd probably expose undefined, which is much more clear.

@wmrowan
Copy link
Contributor

wmrowan commented Jan 15, 2013

Ahh, I see. I kept thinking, what's the point, we already have errors that propagate like this. The distinction is that bottom is a value propagated up the function composition chain rather than an exception that jumps up the call stack to an error handler. While the behavior is similar it is actually subtly different.

While we don't have throw we already support error which is a function that simply returns bottom. I suppose the challenge is to document r.error and r.catch in a way that does not suggest throw catch behavior. That's why I think the current r.error is a better name than r.throw any why I thought that perhaps r.default was a better name than r.catch.

Rather than saying that bottom is the type returned by r.error and one that has special semantics, why not just use the term error? Again we have to explain how errors propagate but in this sense too I think that the term error conveys the meaning we want. While exception would imply short circuiting behavior, the term error I think invokes the notion of a special return value (as in languages like C that don't support exceptions) which is what we mean to imply.

@coffeemug
Copy link
Contributor

This may or may not be a dup of #198. I'll consider this shortly.

@jdoliner
Copy link
Contributor

I feel our current contains satisfies this bug, although a lot was discussed here so I might be wrong. Any objections to closing this?

@coffeemug
Copy link
Contributor

Cleanup officer to the rescue. (I think we might still be able to do more, but this issue is obsolete at this point -- let's open a new one if this comes up)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants