Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance on deeply nested topics #33

Merged
merged 11 commits into from
Sep 1, 2022
Merged

Improve performance on deeply nested topics #33

merged 11 commits into from
Sep 1, 2022

Conversation

riedgar-ms
Copy link
Contributor

@riedgar-ms riedgar-ms commented Aug 25, 2022

Address report of poor performance from FC people, who were using deeply nested topics.

There are two main improvements:

  • Simplify the subtopic checker so that it doesn't use a regular expression
  • Rewrite DataFrame.iterrows() loops to use DataFrame.apply()

Put in basic tests of the functionality as well.

@riedgar-ms riedgar-ms changed the title [WIP] Improve performance on deeply nested topics Improve performance on deeply nested topics Aug 26, 2022
@riedgar-ms
Copy link
Contributor Author

@Harsha-Nori I think this is ready to merge now.

@riedgar-ms
Copy link
Contributor Author

Ping @slundberg

Copy link
Contributor

@slundberg slundberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

axis=1
)
return has_subtopics_df.any()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still have performance issues we could try vector AND operations here. But this seems good for now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick with a topic tree, I'd be more inclined to rewrite the internal datastructure to be a tree, and just use DataFrame for serialisation.

@riedgar-ms riedgar-ms merged commit 2ccb0df into microsoft:main Sep 1, 2022
@riedgar-ms riedgar-ms deleted the riedgar-ms/test-tree-perf-01 branch September 1, 2022 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants