Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extension towards paralell map / reduce that maps eventually #42

Open
ghost opened this issue Dec 30, 2014 · 5 comments
Open

extension towards paralell map / reduce that maps eventually #42

ghost opened this issue Dec 30, 2014 · 5 comments

Comments

@ghost
Copy link

ghost commented Dec 30, 2014

this is a nice dataflow reactive system.

i was wondering ig this project intends to allow parallelization of a single datanode in th dag to be say 100 dag nodes of the same type. then each one receives a slice of the data on the bus and then converges.

its a logical next step i feel to use the power of crdt because you dont have to conform to map reduce i think anymore because the results can join over time .

bacon.js has a similar approach as swarm (not the same) but has formalized the map reduce without crdt.
https://github.com/baconjs/bacon.js

the two joined would be potent !!

can you have a look and see if you see what i see. i would like to converge the two and put a formal dsl that is serialisable perhaps with a web ide on top

@gritzko
Copy link
Owner

gritzko commented Dec 30, 2014

Well, seems too theoretical for me.
Can you illustrate that with some real-life workload scenario?

@ghost
Copy link
Author

ghost commented Dec 30, 2014

real life workload. when you need to increase the throughput of a sysytem you run many of them in parallel. this is what Spark & Storm do in slight different fundamental ways.
so the real world workload is a systems requirement for faster results.

the way reactive framework operate is a DAG. a single node acts as a computational node.
but to achieve the real world requirement we need to have 100 of them (for example).
in order for each of the 100 to do the computation, the data stream is sliced into 100 separate chunks.
This what cassandra and spark do using Dstreams.
Do this help ?

@gritzko
Copy link
Owner

gritzko commented Oct 16, 2015

Regarding multi-server/load-distribution scenarios, the only thing currently planned is consistent hashing.
Using CRDTs for large scale computing is definitely out of the project's scope at this stage.
Eventually, Swarm may expand in that direction, but that is too different from what we are doing now.
There are some immediate practical challenges, like distributed counters. Those seem to be resolvable by the current means.

@gritzko gritzko closed this as completed Oct 16, 2015
@gritzko
Copy link
Owner

gritzko commented Oct 16, 2015

P.S. The next step in that direction is adding server-side event filters/listeners.

@gritzko gritzko reopened this Oct 16, 2015
@ghost
Copy link
Author

ghost commented Oct 18, 2015

looking forward to playing with it and seeing where i can push it.
this wll integrate with love field nicely btw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant