Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement simple word generator-aggregator model on REEF #1

Closed
DifferentSC opened this issue Sep 21, 2015 · 10 comments
Closed

Implement simple word generator-aggregator model on REEF #1

DifferentSC opened this issue Sep 21, 2015 · 10 comments
Assignees
Milestone

Comments

@DifferentSC
Copy link
Contributor

As the first step of MIST project, we need to implement a simple word generator-aggregator stream processing application on REEF. This issue is important for two reasons below.

  • We can measure the number of streaming queries which can be processed in the environment without ZooKeeper.
  • We can practice development on REEF environment (especially Tang and Wake) and get used to some useful tools for building MIST like RemoteManager, NetworkService, etc. We can also think about how we can provide simple API to hide some complex things of REEF.
@TaeHunKim
Copy link
Contributor

As a start, I'll implement a simple word counter with one task as mentioned in ISSUE #4 .

After that, I'll try to separate word generator and aggregator as two or three tasks.
(For this, I should study sharing data among tasks.)

@DifferentSC
Copy link
Contributor Author

To separate the word generator and aggregator task, I think we'd better implement word generator and aggregator in different REEF jobs and run those jobs in separate machines.

We can use RemoteManager for the network communication. Because RemoteManager provides low-level socket-based API, we can make a connection easily if we have addresses of word generator and aggregator. It handles data using Wake EventHandler, so we can also easily implement receiving data process inside the EventHandler. The main aggregator task thread will sleep all the time, and EventHandler inside it will wake up and handle the data whenever it comes in. Word generator and aggregator should use same codec to be used for serialization / deserialization of the data.

Below is the overall structure of the model.

image

@bgchun
Copy link
Contributor

bgchun commented Sep 25, 2015

@DifferentSC @TaeHunKim How about using NCS instead of RemoteManager?
@dafrista What do you think?

@DifferentSC
Copy link
Contributor Author

I think NCS could be a better option, because each RM in Machine 2 should have different socket numbers and it could be troublesome to send data to all different sockets in Machine 1 manually.

@DifferentSC
Copy link
Contributor Author

I made an example about how to use NCS in REEF Jobs. The code might be somewhat dirty, but I think it can be some help to resolving this issue.

https://github.com/DifferentSC/reef-ncs-example

@bgchun
Copy link
Contributor

bgchun commented Oct 1, 2015

@DifferentSC We don't write dirty code. ;-)

@TaeHunKim
Copy link
Contributor

Thanks to @DifferentSC, I success to implement word counter with one generator and one aggregator.
But I still have some questions and discussions about it. We may can talk about them f2f when you have time.
After that, I will refactor this code, make a pull request, and try to improve it with multiple aggregator.

@TaeHunKim
Copy link
Contributor

I finished the basic experiment. I'll extend the experiment in other issued

@DifferentSC DifferentSC added this to the v0.1 milestone Dec 16, 2015
@taegeonum
Copy link
Contributor

@TaeHunKim Is it ok to close this issue? If yes, please close it.

@TaeHunKim
Copy link
Contributor

@taegeonum I forgot to close it. Thank you. I'm going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants