New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LogAware interface, to expose the log #33

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@justinsb
Collaborator

justinsb commented Dec 2, 2013

If the StateMachine implements LogAware, it will be passed the RaftLog instance

Add LogAware interface, to expose the log
If the StateMachine implements LogAware, it will be passed the RaftLog instance

@justinsb justinsb referenced this pull request Dec 2, 2013

Closed

Client example? #32

@justinsb

This comment has been minimized.

Show comment
Hide comment
@justinsb

justinsb Dec 2, 2013

Collaborator

My first project using Barge is to create a simple appendlog service, basically exposing Raft as a service. In order to do that, I need access to the log itself from the StateMachine.

I implemented a very simple prototype by hacking up the BargeD example code; you can see what I'm thinking here: https://github.com/justinsb/appendlog/blob/master/src/main/java/com/cloudata/appendlog/Database.java

Collaborator

justinsb commented Dec 2, 2013

My first project using Barge is to create a simple appendlog service, basically exposing Raft as a service. In order to do that, I need access to the log itself from the StateMachine.

I implemented a very simple prototype by hacking up the BargeD example code; you can see what I'm thinking here: https://github.com/justinsb/appendlog/blob/master/src/main/java/com/cloudata/appendlog/Database.java

@mgodave

This comment has been minimized.

Show comment
Hide comment
@mgodave

mgodave Dec 2, 2013

Owner

I'm not completely sure why you need access to the log from the state machine. By definition, the state machine receives all committed log entries, as they are committed. Therefore, you can read and serve requests from the accumulated state. It seems like, in this case, you may just accumulate the entries as the arrive and serve them out. Exposing the underlying log (and thus the uncommitted entries) is extremely dangerous and gives the programmer free reign to violate the safety properties of Raft. I think what you are trying to can be accomplished more succinctly by just accumulating the entries as the arrive and placing a public interface on your state machine. Let me get a better example together and post the gists here later.

Owner

mgodave commented Dec 2, 2013

I'm not completely sure why you need access to the log from the state machine. By definition, the state machine receives all committed log entries, as they are committed. Therefore, you can read and serve requests from the accumulated state. It seems like, in this case, you may just accumulate the entries as the arrive and serve them out. Exposing the underlying log (and thus the uncommitted entries) is extremely dangerous and gives the programmer free reign to violate the safety properties of Raft. I think what you are trying to can be accomplished more succinctly by just accumulating the entries as the arrive and placing a public interface on your state machine. Let me get a better example together and post the gists here later.

@justinsb

This comment has been minimized.

Show comment
Hide comment
@justinsb

justinsb Dec 2, 2013

Collaborator

I'm working towards something resembling Amazon's Kinesis, where the log is itself the state. It seems inefficient to duplicate storage of the entries, when they're already in the logs... I agree that it's definitely possible to shoot yourself in the foot e.g. by reading uncommitted entries.

When we're building e.g. a key-value store on top, then we don't want to expose the log. But I think a log-as-a-service is a special case? Perhaps I haven't fully grokked the design yet...

Collaborator

justinsb commented Dec 2, 2013

I'm working towards something resembling Amazon's Kinesis, where the log is itself the state. It seems inefficient to duplicate storage of the entries, when they're already in the logs... I agree that it's definitely possible to shoot yourself in the foot e.g. by reading uncommitted entries.

When we're building e.g. a key-value store on top, then we don't want to expose the log. But I think a log-as-a-service is a special case? Perhaps I haven't fully grokked the design yet...

@justinsb

This comment has been minimized.

Show comment
Hide comment
@justinsb

justinsb Dec 2, 2013

Collaborator

Actually, on second thoughts / readings, I think I do see the problem you're describing... because entries can be re-written in the Raft protocol. So this "works" for today's DefaultRaftLog, but won't work well when we implement compaction etc. Let's hold off on merging!

Collaborator

justinsb commented Dec 2, 2013

Actually, on second thoughts / readings, I think I do see the problem you're describing... because entries can be re-written in the Raft protocol. So this "works" for today's DefaultRaftLog, but won't work well when we implement compaction etc. Let's hold off on merging!

@mgodave mgodave closed this Dec 3, 2013

@mgodave

This comment has been minimized.

Show comment
Hide comment
@mgodave

mgodave Dec 3, 2013

Owner

I'm going to close for now.

Owner

mgodave commented Dec 3, 2013

I'm going to close for now.

@justinsb

This comment has been minimized.

Show comment
Hide comment
@justinsb

justinsb Dec 3, 2013

Collaborator

I agree even more now: it's better not to expose the log, even for the AppendLog-aaS case. I've been prototyping an approach which copies form the Raft log to a set of log files (with a file format similar to Kafka): justinsb/appendlog@d3789e2

The downside is that everything is written twice (once to Raft, once to the snapshots). But offsetting that, we don't need fsyncs on the snapshots, only for the Raft log. The Raft log fsyncs can thus be amortized across concurrent updates, if we let multiple data structures share a Raft log. (e.g. multiple appendlogs in a multitenant server).

I think this same argument will apply to other data structures, although the appendlog was the most obviously painful.

I'm looking forward to snapshots & log compaction - let me know if I can help on that in some way! (We'll have to be very careful with non-idempotent actions on log replay.)

Collaborator

justinsb commented Dec 3, 2013

I agree even more now: it's better not to expose the log, even for the AppendLog-aaS case. I've been prototyping an approach which copies form the Raft log to a set of log files (with a file format similar to Kafka): justinsb/appendlog@d3789e2

The downside is that everything is written twice (once to Raft, once to the snapshots). But offsetting that, we don't need fsyncs on the snapshots, only for the Raft log. The Raft log fsyncs can thus be amortized across concurrent updates, if we let multiple data structures share a Raft log. (e.g. multiple appendlogs in a multitenant server).

I think this same argument will apply to other data structures, although the appendlog was the most obviously painful.

I'm looking forward to snapshots & log compaction - let me know if I can help on that in some way! (We'll have to be very careful with non-idempotent actions on log replay.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment