Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input validation #167

Merged
merged 19 commits into from
Jul 11, 2023
Merged

Conversation

N-Maas
Copy link
Collaborator

@N-Maas N-Maas commented Jun 12, 2023

Adds validation that the input is actually a correct hypergraph (currently, KaHyPar likely segfaults in such cases or even produces garbage output). This applies to hgr files as well as the C interface and the Python interface.

This includes three major changes:

  • When parsing a hgr file, all (or at least most) possibilities for syntactic errors are checked and cause an appropriate error message
  • Adds a new utility for checking semantic errors, which is called after parsing the hgr file or when creating a hypergraph via the according interface function. All found errors as well as warnings (e.g. duplicate pins) are reported, as well as hyperedges/pins that should be ignored
  • Adjusts the hypergraph constructor so it can handle ignored hyperedges/pins

The required time for the semantic checks is reported in the output. The checks can be disabled using the new flag KAHYPAR_INPUT_VALIDATION. Alternatively, all warnings can be promoted to errors using KAHYPAR_INPUT_VALIDATION_PROMOTE_WARNINGS_TO_ERRORS.

@N-Maas N-Maas linked an issue Jun 12, 2023 that may be closed by this pull request
Copy link
Member

@SebastianSchlag SebastianSchlag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @N-Maas. This already looks great. I haven't reviewed everything yet, but here's a first round of comments.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
kahypar/application/kahypar.cc Show resolved Hide resolved
kahypar/datastructure/hypergraph.h Show resolved Hide resolved
kahypar/io/hypergraph_io.h Outdated Show resolved Hide resolved
kahypar/utils/validate.h Show resolved Hide resolved
kahypar/utils/validate.h Show resolved Hide resolved
kahypar/utils/validate.h Show resolved Hide resolved
python/tests/test_kahypar.py Show resolved Hide resolved
tools/bookshelf_to_hgr_converter.h Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 13, 2023

Codecov Report

Merging #167 (ae56cf1) into master (e53beae) will increase coverage by 0.18%.
The diff coverage is 90.22%.

❗ Current head ae56cf1 differs from pull request most recent head 3553901. Consider uploading reports for the commit 3553901 to get more accurate results

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   84.42%   84.60%   +0.18%     
==========================================
  Files         204      206       +2     
  Lines       17797    18097     +300     
  Branches     9815     9945     +130     
==========================================
+ Hits        15025    15311     +286     
- Misses       2772     2786      +14     
Impacted Files Coverage Δ
kahypar/application/kahypar.cc 0.00% <0.00%> (ø)
kahypar/datastructure/fast_reset_flag_array.h 91.17% <ø> (ø)
kahypar/macros.h 100.00% <ø> (ø)
tools/calculate_degree_pin_distribution.cc 0.00% <0.00%> (ø)
tools/calculate_epsilon.cc 0.00% <0.00%> (ø)
tools/compute_neighborhood_sizes.cc 0.00% <0.00%> (ø)
tools/create_weighted_hgr.cc 0.00% <0.00%> (ø)
tools/evaluate_mondriaan_partition.cc 0.00% <0.00%> (ø)
tools/hgr_to_bipartite_graphml_converter.cc 0.00% <0.00%> (ø)
tools/hgr_to_bipartite_metis_graph_converter.cc 0.00% <0.00%> (ø)
... and 26 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@N-Maas N-Maas mentioned this pull request Jul 4, 2023
@@ -99,8 +99,6 @@ KAHYPAR_API void kahypar_improve_hypergraph_partition(kahypar_hypergraph_t* kahy
kahypar_partition_id_t* improved_partition);


KAHYPAR_API void kahypar_hypergraph_free(kahypar_hypergraph_t* kahypar_hypergraph);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appeared twice in the file, which caused a compiler warning for me. Thus I removed one of the duplicates

index_vector.reserve(static_cast<size_t>(num_hyperedges) + /*sentinel*/ 1);
index_vector.push_back(edge_vector.size());
if (line_number_vector != nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to check if (validate_input)?

@@ -419,7 +500,7 @@ static inline void readFixedVertexFile(Hypergraph& hypergraph, const std::string
}
file.close();
} else {
std::cerr << "Error: File not found: " << filename << std::endl;
ERROR("File not found: " << filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a CheckedIStream here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckedIStream only represents a single line, so I'm not sure how one would use it here. I guess I could add another class, e.g. CheckedIFStream, which represents the complete file. But I don't know whether that is worth it

@@ -391,7 +472,7 @@ static inline void readPartitionFile(const std::string& filename, std::vector<Pa
}
file.close();
} else {
std::cerr << "Error: File not found: " << std::endl;
ERROR("File not found.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a CheckedIStream here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as above)

Copy link
Member

@SebastianSchlag SebastianSchlag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only got a few minor questions left. Other than those, this looks very good! 👍
Thank you very much for your efforts. I bet this will help a lot of people in the future!

@N-Maas
Copy link
Collaborator Author

N-Maas commented Jul 6, 2023

(duplicated comment because the first is not visible in the conversation thread for me, seemingly a bug with the Github UI)

Did you mean to check if (validate_input)?

Conceptionally, yes. However, the way the function is currently written it only takes the line_number_vector parameter, which is null if no additional input validation occurs after reading the file and is otherwise used as an output parameter for mapping vertices/edges to input lines. That seemed simpler to me than adding even more parameters

@SebastianSchlag SebastianSchlag merged commit 38ef99b into kahypar:master Jul 11, 2023
3 of 4 checks passed
@SebastianSchlag
Copy link
Member

Thank you very much, @N-Maas! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Input validation
2 participants