-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hpctoolkitv4 reader sparse #544
base: next
Are you sure you want to change the base?
Conversation
@@ -144,9 +147,10 @@ def from_hpctoolkit(dirname): | |||
from .readers.hpctoolkit_v4_reader import HPCToolkitV4Reader | |||
|
|||
if "experiment.xml" in os.listdir(dirname): | |||
# TODO: Make old hpctoolkit outputs sparse? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make the old hpctoolkit outputs sparse. We can discuss this with Abhinav.
value=0, | ||
) | ||
|
||
if not self.sparse_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this if statement to line 1609 because we don't need to use not_visited_nodes
in the sparse format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
hatchet/tests/hpctoolkit.py
Outdated
@@ -76,6 +80,11 @@ def test_graphframe(data_dir, calc_pi_hpct_db): | |||
elif col in ("name", "type", "file", "module", "node"): | |||
assert gf.dataframe[col].dtype == object | |||
|
|||
# In case of sparse format check to make sure we are not inserting dummy values | |||
# into the dataframe | |||
if sparse_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know how many rows for each node we should have in the dense format = number of ranks * number of threads. To test the sparse format, maybe we can check if there are some nodes with fewer number of rows? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, you actually caught an issue in my test (I was using old < v4 hpctoolkit data :D)
I made the hpctoolkit reader have the option to output in sparse format, and gated the option behind a keyword.
When we get the rest of the methods working, we should flip the default value to True from False.
Then, before the release we should delete the keyword.