Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perposle r.return() and r.tunnel() #1986

Open
wojons opened this issue Feb 21, 2014 · 6 comments
Open

perposle r.return() and r.tunnel() #1986

wojons opened this issue Feb 21, 2014 · 6 comments
Milestone

Comments

@wojons
Copy link
Contributor

wojons commented Feb 21, 2014

This is simular to #1653 the idea behind this is that i may have some very busy database nodes and dont wanna put any more storess on them but the current machine issueing the query is totally fine. So i issue the query and then the results. maybe some super light filter and then return the results back to calling node. And it can do some crazy stuff from there maybe there is a complex r.match().

The way this works normally would be like.

r.table().filter().return().filter(r.match())

everything after the return takes place locally assuming the data was on another node and not this one.

There can also be a few other good uses like.

r.tunnel('lax.*').table().filter().return().filter(r.match())

depending on how this should be done it will pick any node in the dc lax have it run the query and then have it retuned to its self it would process the data and then the results of that would automatically be sent back to the local machine.

the other option is that it would try to parallelize the query in lax so if there are a few nodes there they all get the different docs back and run the finally filter yes this is a pretty crazy idea but also pretty powerful

the return function should have a few options maybe you wanna return one step up or all the way back to the original machine depends.

@coffeemug
Copy link
Contributor

I believe that coerceTo('array') will currently accomplish what you want with return:

r.table().filter().coerceTo('array').filter(r.match())

@coffeemug coffeemug added this to the backlog milestone Feb 24, 2014
@wojons
Copy link
Contributor Author

wojons commented Feb 24, 2014

@coffeemug from what i understand from @danielmewes is that using coerceTo('array') will remove all parrllelizion. and the entire data set will have to fit into memory and i am not sure if that also means it messes up with pipelining so if its a large data setit needs to be pulled into one array first before it continues processing on the parent node. it can get the same results i want from above depending on the data set size and other stuff but not as flexible as what i am purposing.

@coffeemug
Copy link
Contributor

Ah I see. I have to mull over this for a bit. We purposely made data flow in the cluster completely opaque. Making some of it controllable is an intriguing idea.

@wojons
Copy link
Contributor Author

wojons commented Feb 24, 2014

@coffeemug i know ever user may use rethinkdb differently from your original vision. I use it for a database but also as an anyaltics engine/framework. As we all know PHP is not the language to handle large data processing which is why PHP was able to handle its own when paired with MySQL because you have MySQL handle all that data processing for you. With all the new ways the web works and terms like "webscale" it makes mysql a less popular thing to scale your php with. I personally install rethinkdb on all my application nodes in my cluster and have them connect locally. I even processing that is not built into the php framework to the local database to processes it for me sometimes that is going out to another server where the data is sometimes using it to sanitize a users input using things like

r.json().map()

@coffeemug
Copy link
Contributor

@wojons -- that's awesome, that's actually really helpful. RethinkDB can act as a general purpose distributed computation engine, but it is missing a few control primitives for that. We'd have to think through how to properly add these, but the possibilities are pretty damn cool.

(Also, it's a matter of timing/marketing/etc. which can be surprisingly nuanced)

@wojons
Copy link
Contributor Author

wojons commented Feb 24, 2014

@coffeemug yeah exactly i understand it needs to be timed right and so on. Yeah distributed computation engine is a really good way to explain it its the core of my application. rewriting the application would not take long in any language since most of the important stuff is in rethinkdb. I know its an out there feature and there are somethings need to happen before then and until then I can wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants