-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/cello/io #105
Bugfix/cello/io #105
Conversation
It seems that some tests are testing that I am hunting down those tests and will add them here for review. |
Indeed. That makes sense. We should also add @jobordner to the review in case there was a reason he set it up like this. |
The file
That script is by @mabruzzo . I suspect he did that because that is what the code was doing. @mabruzzo , do you have any insight into why |
All of the VL+CT tests (including the ones using I have encountered the bug that you are talking about before. I recently found that there is a (undocumented?) parameter for the In the past I thought about trying to address this problem by modifying the I just wanted to express a few related minor concerns. Imagine we were running a simulation distributed over N processors and we used the output formatting options for our data dumps:
Under the current system, the outputs for cycle 1 are organized as follows:
Under the proposed changes, it's not so obvious to me what the metadata filename would look like; it's not obvious to me what value the "proc" variable will have while determining the filename (is it always 0?). Additionally, is there machinery in place to prevent each process from trying to write their own copies of the metadata files? (I would assume the answer is yes) These are both definitely addressable (@jobordner probably could answer these questions). My last concern is more subjective: it seems like it would take more work to "discover" the block_list files under the new system (assuming you don't know it ahead of time). However, my opinion is probably informed by the fact that I'm accustomed to the current system. So, if people think this is better, I'd be all for it. Plus, the new scalable IO might change things anyway. |
I modified how the tests are looking for the block_list files, so now the tests are passing. @mabruzzo, please review the test changes, as I believe you wrote that code. In my scenario, I was given the following block_list files:
Each processor is printing its own block_list. The content of each block_list follows the same pattern, giving the name of the matching .h5 file. For example, cycle 0, processor 1 has:
I would expect block_list issues with multiple processors using the |
Would we not want the
|
I suspect that the individual processors might take turns editing the .block_list file to record the information about which blocks they just wrote to disk. If that is indeed the case, then you would need to make sure that all of the individual processors agree about the name of the .block_list file. @jobordner will know more. |
I feel that what I have is a good fix for parameters and libconfig, but a bad fix for block_list. I am working on an "agreed" name for the block_list file that all the processors can use. |
…fig, block_list, and file_list.
Problem B If the
Cause Solution An alternative solution could be to hard-code the output file names for the 4 "root" files, e.g. Result |
My solution to Problem B touches more of the code than I wished (e.g. |
Since you're asking for a different ideas, I would like to gently re-express my support for a solution similar to the following snippet. But again, my preference comes from the fact that I'm already accustomed to the current system (and that's not necessarily a good reason for adopting this solution). // sanitized_name_dir is identical to name_dir, but any trailing `/`s have been removed
std:string sanitized_name_dir = sanitize_dir_path(name_dir);
if (sanitized_name_dir == "") {
// output block list and parameters to work directory
name_dir = ".";
// strip extension, use this for name
name_file = name_out_file.substr(0, name_out_file.rfind("."));
} else {
// output block list and parameters to subdirectory. The basename of the subdirectory
// serves as the prefix for the names of the block list and parameters files
std::size_t basename_start = sanitized_name_dir.rfind('/');
if (basename_start == std::string::npos){ // no '/' was found
name_file = sanitized_name_dir;
} else {
name_file = sanitized_name_dir.substr(basename_start);
}
} This addresses Problem A without introducing Problem B. The std::string sanitize_dir_path(std::string name_dir){
std::string out = "";
for (std::size_t i_plus_1 = name_dir.size(); i_plus_1 > 0; --i_plus_1){
std::size_t i = i_plus_1 - 1;
if (name_dir[i] == '/'){
continue;
} else {
out = name_dir.substr(0,i+1);
break;
}
}
return out;
} |
@mabruzzo , your solution would overwrite the block_list when you have outputs on more than one cycle in the same directory. For example, with the Output configuration from Problem A, you will have output for cycles 0, 1, and 2, but your solution will use the same block_list file each time, so you will only have the block_list for cycle 2. You will be overwriting your block_list for cycles 0 and 1. Problem B was not created by the solution for Problem A. It just happens to have a related cause. The current code would work with a limited number of configurations, such as:
Those configurations would hide both Problem A and Problem B. |
Sorry, I guess I hadn't fully understood (I've only ever used the code in one of the two configurations that you mentioned). To support writing outputs for multiple cycles to a single non-local directory, your approach makes a lot of sense! |
I'm deferring looking at this in detail since there is a new HDF5 output method in PR #97 which will make the current IO obsolete. The new IO handles the metadata files issue by not generating metadata files. The block_list files are actually much easier to generate outside of Enzo-E using simple scripts that call e.g. h5ls or h5dump. The parameters file is already generated at startup, I don't see the need to re-generate the same file at each data dump. |
I am closing this PR as it is likely no longer valid now that PR #97 referenced above has been merged. @jameslarrue if there is still an issue, please resubmit the PR. Thanks! |
Problem A
I have the following configuration in my
Output
group:The
dir
causes an incorrect output file for the.parameters
file (etc.):Cause
The issue appears to be in
Cello/io_OutputData.cpp
(line 141 by Andrew Emerick, 2020/03/05):In the
else
, thename_file
is being set toname_dir
, resulting in the doubling ofname_dir
as the output file.Solution
My solution is to remove the
else
and set thename_file
to the same value for all cases:Result
With the change, the parameters (and similar files) are created successfully, in my case, at: