New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovery #37
Recovery #37
Changes from all commits
d1d7914
0a8e44b
976c1a9
59b889e
5707c08
4de6523
ce3f847
d69815c
a4003ef
8889c75
455ddee
8335cb4
b81f15f
2f95ebb
5971ca3
a71c73c
1ff66e9
7486556
4bacd40
6daa9d4
f79d6ef
57130b3
ce619f9
5fed8cd
25c6425
768e968
0084c57
5a2d135
3b76e91
03774c5
4a5dad0
a618597
8126b20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
soup-log-*.json | ||
*-log-*.json | ||
*.png | ||
*.log | ||
plotting/*.png | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,8 @@ use std::thread; | |
use std::time; | ||
use std::collections::hash_map::Entry; | ||
use std::rc::Rc; | ||
use std::io::{BufRead, BufReader, ErrorKind}; | ||
use std::fs::File; | ||
|
||
use std::net::SocketAddr; | ||
|
||
|
@@ -18,6 +20,8 @@ use transactions; | |
use persistence; | ||
use debug; | ||
use checktable; | ||
use serde_json; | ||
use itertools::Itertools; | ||
use slog::Logger; | ||
use timekeeper::{RealTime, SimpleTracker, ThreadTime, Timer, TimerSet}; | ||
use tarpc::sync::client::{self, ClientExt}; | ||
|
@@ -30,6 +34,7 @@ pub struct Config { | |
} | ||
|
||
const BATCH_SIZE: usize = 256; | ||
const RECOVERY_BATCH_SIZE: usize = 512; | ||
|
||
const NANOS_PER_SEC: u64 = 1_000_000_000; | ||
macro_rules! dur_to_ns { | ||
|
@@ -803,6 +808,9 @@ impl Domain { | |
Packet::ReplayPiece { .. } => { | ||
self.handle_replay(m); | ||
} | ||
Packet::StartRecovery { .. } => { | ||
self.handle_recovery(); | ||
} | ||
consumed => { | ||
match consumed { | ||
// workaround #16223 | ||
|
@@ -1566,6 +1574,81 @@ impl Domain { | |
} | ||
} | ||
|
||
fn handle_recovery(&mut self) { | ||
let checktable = self.transaction_state.get_checktable(); | ||
let node_info: Vec<_> = self.nodes | ||
.iter() | ||
.map(|(index, node)| { | ||
let n = node.borrow(); | ||
(index, n.global_addr(), n.is_transactional()) | ||
}) | ||
.collect(); | ||
|
||
for (local_addr, global_addr, is_transactional) in node_info { | ||
let path = self.persistence_parameters.log_path( | ||
&local_addr, | ||
self.index, | ||
self.shard.unwrap_or(0), | ||
); | ||
|
||
let file = match File::open(&path) { | ||
Ok(f) => f, | ||
Err(ref e) if e.kind() == ErrorKind::NotFound => { | ||
warn!( | ||
self.log, | ||
"No log file found for node {}, starting out empty", | ||
local_addr | ||
); | ||
|
||
continue; | ||
} | ||
Err(e) => panic!("Could not open log file {:?}: {}", path, e), | ||
}; | ||
|
||
BufReader::new(file) | ||
.lines() | ||
.filter_map(|line| { | ||
let line = line | ||
.expect(&format!("Failed to read line from log file: {:?}", path)); | ||
let entries: Result<Vec<Records>, _> = serde_json::from_str(&line); | ||
entries.ok() | ||
}) | ||
// Parsing each individual line gives us an iterator over Vec<Records>. | ||
// We're interested in chunking each record, so let's flat_map twice: | ||
// Iter<Vec<Records>> -> Iter<Records> -> Iter<Record> | ||
.flat_map(|r| r) | ||
.flat_map(|r| r) | ||
// Merge individual records into batches of RECOVERY_BATCH_SIZE: | ||
.chunks(RECOVERY_BATCH_SIZE) | ||
.into_iter() | ||
// Then create Packet objects from the data: | ||
.map(|chunk| { | ||
let data: Records = chunk.collect(); | ||
let link = Link::new(local_addr, local_addr); | ||
if is_transactional { | ||
let (ts, prevs) = checktable.recover(global_addr).unwrap(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm.. Shouldn't it be possible to just claim a single timestamp for the entire recovery (across all base nodes)? @fintelia may be able to shed some light. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a limitation of our transaction logic. Every timestamped message must originate from only a single base node. Further, every message from a base must also have its own distinct timestamp (which is why there can't be one timestamp per base either). |
||
Packet::Transaction { | ||
link, | ||
data, | ||
tracer: None, | ||
state: TransactionState::Committed(ts, global_addr, prevs), | ||
} | ||
} else { | ||
Packet::Message { | ||
link, | ||
data, | ||
tracer: None, | ||
} | ||
} | ||
}) | ||
.for_each(|packet| self.handle(box packet)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure this could deadlock if you had an A-B-A domain assignment. But we already disallow that elsewhere, so probably not an issue. |
||
} | ||
|
||
self.control_reply_tx | ||
.send(ControlReplyPacket::ack()) | ||
.unwrap(); | ||
} | ||
|
||
fn handle_replay(&mut self, m: Box<Packet>) { | ||
let tag = m.tag().unwrap(); | ||
let mut finished = None; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not larger? We know nothing else is happening in the graph until it has recovered anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fintelia suggested 512 - I'm not completely sure what the trade-offs are here. What do you think would be a good number?