Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make hpctoolkitv4 reader sparse #544

Open
wants to merge 9 commits into
base: next
Choose a base branch
from

Conversation

lithomas1
Copy link
Contributor

I made the hpctoolkit reader have the option to output in sparse format, and gated the option behind a keyword.

When we get the rest of the methods working, we should flip the default value to True from False.

Then, before the release we should delete the keyword.

@lithomas1 lithomas1 added status: ready for review This PR is ready to be reviewed by assigned reviewers area: graphframe PRs and Issues involving Hatchet's core GraphFrame datastructure and associated classes priority: high High priority issues and PRs labels Apr 21, 2024
@ocnkr ocnkr self-requested a review April 22, 2024 16:00
@@ -144,9 +147,10 @@ def from_hpctoolkit(dirname):
from .readers.hpctoolkit_v4_reader import HPCToolkitV4Reader

if "experiment.xml" in os.listdir(dirname):
# TODO: Make old hpctoolkit outputs sparse?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make the old hpctoolkit outputs sparse. We can discuss this with Abhinav.

value=0,
)

if not self.sparse_format:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this if statement to line 1609 because we don't need to use not_visited_nodes in the sparse format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -76,6 +80,11 @@ def test_graphframe(data_dir, calc_pi_hpct_db):
elif col in ("name", "type", "file", "module", "node"):
assert gf.dataframe[col].dtype == object

# In case of sparse format check to make sure we are not inserting dummy values
# into the dataframe
if sparse_format:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know how many rows for each node we should have in the dense format = number of ranks * number of threads. To test the sparse format, maybe we can check if there are some nodes with fewer number of rows? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, you actually caught an issue in my test (I was using old < v4 hpctoolkit data :D)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: graphframe PRs and Issues involving Hatchet's core GraphFrame datastructure and associated classes priority: high High priority issues and PRs status: ready for review This PR is ready to be reviewed by assigned reviewers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants