-
Notifications
You must be signed in to change notification settings - Fork 26
Another piece of the reconciliation puzzle #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This covers the next piece of reconciliation where we actually copy extent files from one downstairs to another. Upstairs: Update the repair logic to now do more than just close and then re-open extents. The upstairs will now build the complete repair steps needed to repair an extent. This includes the logic about which downstairs should receive which repair commands. Protocol: New ExtentFlush to flush a specific extent. New ExtentRepair command for repairing a specific extent. Replaced a few specific ACKs with a generic repair ACK. Downstairs: Made repair mode always on, and starting at port + REPAIR_PORT_OFFSET. New repair-api command that can dump a json file for the repair-client API Added the logic to support the new repair messages. Downstairs region: The downstairs coordination of a repair live in this file New functions to get the names for the new repair directories and a bunch of tests around them. We can now get names for specific parts of the extent, the housing directory, the copy, replace, and completed directories, as well as provide an optional file extension to the file name. During repair, we check for and remove any copy or completed directories. If there is a replay directory, then the extent open will do the replay and replace existing extent files, only if the open is read/write. New command to just flush a specific extent. Downstairs repair: This new file is the dropshot server for extent repair requests. It listens on the specified port and serves (anyone currently) the list of files for an extent, or one of the actual extent files. Repair-client: Generate a library using progenitor that is used by a downstairs to request repair extents from another downstairs. This library generation uses the json file that is output from the downstairs repair-api command. Tests: More tests for repair More tests for building repair work list. A New test that will in a loop: Create a workload that will produce downstairs out of sync Restart and repair the out of sync downstairs. Improved usage messages for create-generic-ds.sh
jmpesp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Added a new function for validating repair files. Other small updates based on PR comments.
|
👍 re-reviewed, 🚤 |
ahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Please forgive my chattiness, but I haven't been in this code and it's all very cool.
My only lingering question is why the repair API is on a different port rather than being part of the downstairs API? Is there a big theory statement or something that could clarify?
downstairs/src/region.rs
Outdated
| * Produce the string name of the data file for a given extent number | ||
| */ | ||
| fn extent_path<P: AsRef<Path>>(dir: P, number: u32) -> PathBuf { | ||
| pub fn extent_file_name(number: u32, extension: Option<&str>) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should extension be an Option<&str> or might it make sense to have this be some other enum to limit the set of inputs we might see here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an enum here is probably the way to do it. I refactored this part a bunch of times and this seems like a good idea. Of course, once I did this, then a bunch of other places then became obvious to update as well.
downstairs/src/repair.rs
Outdated
| */ | ||
| #[endpoint { | ||
| method = GET, | ||
| path = "/extent/{eid}/data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading this right that these 4 endpoints are basically the same? What would you think of something like this:
#[derive(Deserialize, JsonSchema)]
pub enum FileType {
#[serde(rename = "data")
Data,
#[serde(rename = "db")
Database,
#[serde(rename = "db-shm")
DatabaseSharedMemory, // ??
#[serde(rename = "db-wal")
DatabaseLog,
}
// ...
struct FileSpec {
eid: u32,
file_type: FileType,
}
#[endpoint {
method = GET,
path = "/extent/{eid}/{file_type}",
}]
// ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's simpler. I've updated it a bit and then updated a bit more now that their is an enum for ExtentExtension. I can probably polish it a bit more, but at some point I want to get this back under test.
downstairs/src/repair.rs
Outdated
| #[derive(Deserialize, Serialize, JsonSchema)] | ||
| struct ExtentFiles { | ||
| files: Vec<String>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine either way, but you could (I think??) just return Vec<String> directly rather than wrapping them in a struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, simpler, done.
downstairs/src/repair.rs
Outdated
| // The db file should always exist. | ||
| full_name.set_extension("db"); | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we distinguish a client error (i.e. bogus eid) vs server error (i.e. internal inconsistency)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. And it can be difficult to know if we really received a bogus eid, or if the file that should exist for a given eid is missing. Right now the extent_file_name just transforms the name without actually checking if that extent is valid in the given region.
For the dropshot client (the downstairs requesting a file), they don't really care why there is a failure, as there is not much they can do about it and the repair will have to abort. It would be nice to know from a debug point of view though, so whomever is cleaning up the mess here will at least have some breadcrumbs as to why.
The one situation where the downstairs client could do something different is if the server either never responded and the client hit a timeout and can retry, or the server responded with some sort of "too busy, try again later" (not even sure if dropshot will do this).
downstairs/src/repair.rs
Outdated
| let mut full_name = extent_dir; | ||
|
|
||
| let extent_name = extent_file_name(eid, None); | ||
| full_name.push(extent_name.clone()); | ||
| let mut files = Vec::new(); | ||
| // The data file should always exist | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | ||
| } | ||
| files.push(extent_name); | ||
| // The db file should always exist. | ||
| full_name.set_extension("db"); | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | ||
| } | ||
| files.push(extent_file_name(eid, Some("db"))); | ||
|
|
||
| // The db-shm file may exist. | ||
| full_name.set_extension("db-shm"); | ||
| if full_name.exists() { | ||
| println!("Exists: {:?}", full_name.file_name().unwrap()); | ||
| files.push(extent_file_name(eid, Some("db-shm"))); | ||
| } | ||
| // The db-wal file may exist. | ||
| full_name.set_extension("db-wal"); | ||
| if full_name.exists() { | ||
| files.push(extent_file_name(eid, Some("db-wal"))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let mut full_name = extent_dir; | |
| let extent_name = extent_file_name(eid, None); | |
| full_name.push(extent_name.clone()); | |
| let mut files = Vec::new(); | |
| // The data file should always exist | |
| if !full_name.exists() { | |
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | |
| } | |
| files.push(extent_name); | |
| // The db file should always exist. | |
| full_name.set_extension("db"); | |
| if !full_name.exists() { | |
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | |
| } | |
| files.push(extent_file_name(eid, Some("db"))); | |
| // The db-shm file may exist. | |
| full_name.set_extension("db-shm"); | |
| if full_name.exists() { | |
| println!("Exists: {:?}", full_name.file_name().unwrap()); | |
| files.push(extent_file_name(eid, Some("db-shm"))); | |
| } | |
| // The db-wal file may exist. | |
| full_name.set_extension("db-wal"); | |
| if full_name.exists() { | |
| files.push(extent_file_name(eid, Some("db-wal"))); | |
| } | |
| let files = &[ | |
| (extent_file_name(eid, None), true), | |
| (extent_file_name(eid, Some("db")), true), | |
| (extent_file_name(eid, Some("db-shm"), false), | |
| (extent_file_name(eid, Some("db-wal"), false), | |
| ].into_iter() | |
| .filter_map(|(file, required)| { | |
| if file.exists() { | |
| Some(Ok(file)) | |
| } else if required { | |
| Some(Err(HttpError::for_bad_request(None, "EBADF".to_string()))) | |
| } else { | |
| None | |
| } | |
| }) | |
| .collect::<Result<_>>()?; |
I don't think this is perfect, but a sketch for your consideration. It may be more confusing than simplifying, but I wanted to try it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a bunch of ways to get this to work, but I could not come up with one using the
interator like you did here. I did get something that I think is a little better than what was there. Let me know your thoughts on what I cobbled together.
I also added some additional tests that I should have had originally.
Refactor the openapi a bit, less code duplication.
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed a bunch of @ahl comments.
downstairs/src/region.rs
Outdated
| * Produce the string name of the data file for a given extent number | ||
| */ | ||
| fn extent_path<P: AsRef<Path>>(dir: P, number: u32) -> PathBuf { | ||
| pub fn extent_file_name(number: u32, extension: Option<&str>) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an enum here is probably the way to do it. I refactored this part a bunch of times and this seems like a good idea. Of course, once I did this, then a bunch of other places then became obvious to update as well.
downstairs/src/repair.rs
Outdated
| // The db file should always exist. | ||
| full_name.set_extension("db"); | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. And it can be difficult to know if we really received a bogus eid, or if the file that should exist for a given eid is missing. Right now the extent_file_name just transforms the name without actually checking if that extent is valid in the given region.
For the dropshot client (the downstairs requesting a file), they don't really care why there is a failure, as there is not much they can do about it and the repair will have to abort. It would be nice to know from a debug point of view though, so whomever is cleaning up the mess here will at least have some breadcrumbs as to why.
The one situation where the downstairs client could do something different is if the server either never responded and the client hit a timeout and can retry, or the server responded with some sort of "too busy, try again later" (not even sure if dropshot will do this).
downstairs/src/repair.rs
Outdated
| let mut full_name = extent_dir; | ||
|
|
||
| let extent_name = extent_file_name(eid, None); | ||
| full_name.push(extent_name.clone()); | ||
| let mut files = Vec::new(); | ||
| // The data file should always exist | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | ||
| } | ||
| files.push(extent_name); | ||
| // The db file should always exist. | ||
| full_name.set_extension("db"); | ||
| if !full_name.exists() { | ||
| return Err(HttpError::for_bad_request(None, "EBADF".to_string())); | ||
| } | ||
| files.push(extent_file_name(eid, Some("db"))); | ||
|
|
||
| // The db-shm file may exist. | ||
| full_name.set_extension("db-shm"); | ||
| if full_name.exists() { | ||
| println!("Exists: {:?}", full_name.file_name().unwrap()); | ||
| files.push(extent_file_name(eid, Some("db-shm"))); | ||
| } | ||
| // The db-wal file may exist. | ||
| full_name.set_extension("db-wal"); | ||
| if full_name.exists() { | ||
| files.push(extent_file_name(eid, Some("db-wal"))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a bunch of ways to get this to work, but I could not come up with one using the
interator like you did here. I did get something that I think is a little better than what was there. Let me know your thoughts on what I cobbled together.
I also added some additional tests that I should have had originally.
downstairs/src/repair.rs
Outdated
| */ | ||
| #[endpoint { | ||
| method = GET, | ||
| path = "/extent/{eid}/data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's simpler. I've updated it a bit and then updated a bit more now that their is an enum for ExtentExtension. I can probably polish it a bit more, but at some point I want to get this back under test.
ahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
| #[endpoint { | ||
| method = GET, | ||
| path = "/extent/{eid}/db-shm", | ||
| unpublished = false, | ||
| path = "/newextent/{eid}/{fileType}", | ||
| }] | ||
| async fn get_shm( | ||
| async fn get_extent_file( | ||
| rqctx: Arc<RequestContext<FileServerContext>>, | ||
| path: Path<Eid>, | ||
| path: Path<FileSpec>, | ||
| ) -> Result<Response<Body>, HttpError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems better; agreed?
downstairs/src/region.rs
Outdated
| &self, | ||
| mut copy_dir: PathBuf, | ||
| extension: Option<&str>, | ||
| extension: Option<ExtentExtension>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you could have the "no extension" variant in ExtentExtension or maybe just use FileType directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a "data" and changed it to be called "ExtentType" which I think fits better with what the need is for this file.
openapi/downstairs-repair.json
Outdated
| "summary": "For a given extent, return a vec of strings representing the names of", | ||
| "description": "the files that exist for that extent.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just note that the first line is presumed to be the terse summary. It's not a big deal for this kind of internal API, but you might want to change the comment to something like:
/**
* Get the list of files related to an extent.
*
* For a given extent, return a set of strings representing the names of
* the files that exist for that extent.
*/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update that
|
Just realized I did not answer this. Let me figure out where I should put that as a comment somewhere so others can find it. I have lots of smaller comments on specifics, but nothing that ties them all together |
Added big picture comment on repair process.
leftwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have all of @ahl 's comments addressed now.
openapi/downstairs-repair.json
Outdated
| "summary": "For a given extent, return a vec of strings representing the names of", | ||
| "description": "the files that exist for that extent.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update that
downstairs/src/repair.rs
Outdated
| #[derive(Deserialize, Serialize, JsonSchema)] | ||
| struct ExtentFiles { | ||
| files: Vec<String>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, simpler, done.
downstairs/src/region.rs
Outdated
| &self, | ||
| mut copy_dir: PathBuf, | ||
| extension: Option<&str>, | ||
| extension: Option<ExtentExtension>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a "data" and changed it to be called "ExtentType" which I think fits better with what the need is for this file.
This covers the next piece of reconciliation where we transfer extent files from
one downstairs to another using a dropshot server and generating the API
to stream the extent files.
This builds on several previous parts that did the work of identifying mismatches
between downstairs and creating a solve list.
This piece executes the solve, and results in all three downstairs being the same
on completion of the solving. This also enables a basic functionality that should
work when no outside errors are encountered (like a downstairs reset during
solve).
To get an idea of what happens when a downstairs does an extent repair, I have
copied this block comment from
downstairs/src/region.rs:Testing: 7000 or so loops of: make a downstairs different, then repair it.
Upstairs:
Update the repair logic to now do more than just close and then
re-open extents. The upstairs will now build the complete repair
steps needed to repair an extent. This includes the logic about
which downstairs should receive which repair commands.
Protocol:
New ExtentFlush to flush a specific extent.
New ExtentRepair command for repairing a specific extent.
Replaced a few specific ACKs with a generic repair ACK.
Downstairs:
Made the repair server always on, and starting at port + REPAIR_PORT_OFFSET.
New repair-api command that can dump a json file for the repair-client API
Added the logic to support the new repair messages.
Downstairs region:
The downstairs logic for execution of a repair lives in this file.
Created new functions to get the names for the new repair directories and
a bunch of tests around them. We can now get names for specific
parts of the extent, the housing directory, the copy, replace, and
completed directories, as well as provide an optional file extension
to the file name.
During repair, we check for and remove any copy or completed directories.
If there is a replay directory, then the extent open will do the
replay and replace existing extent files, only if the open is read/write.
New command to just flush a specific extent.
Downstairs repair:
This new file is the dropshot server for extent repair requests.
It listens on the specified port and serves (anyone currently) the
list of files for an extent, or one of the actual extent files.
Repair-client:
Generate a library using progenitor that is used by a downstairs to
request repair extents from another downstairs. This library generation
uses the json file that is output from the downstairs repair-api command.
Tests:
More tests for repair
More tests for building repair work list.
A New test that will in a loop:
Create a workload that will produce downstairs out of sync
Restart and repair the out of sync downstairs.
Improved usage messages for create-generic-ds.sh
What work is still outstanding with this:
The following are not part of this commit, but will be addressed in a follow up
commit and issues will be made for these. I felt this PR was big enough as is, and
the work here does allow more testing to start.
TBD: How we handle TLS and keys for ds->ds communication.