Improve performance on deeply nested topics #33

riedgar-ms · 2022-08-25T20:40:32Z

Address report of poor performance from FC people, who were using deeply nested topics.

There are two main improvements:

Simplify the subtopic checker so that it doesn't use a regular expression
Rewrite DataFrame.iterrows() loops to use DataFrame.apply()

Put in basic tests of the functionality as well.

riedgar-ms · 2022-08-26T13:27:07Z

@Harsha-Nori I think this is ready to merge now.

riedgar-ms · 2022-08-30T13:02:28Z

Ping @slundberg

slundberg

Looks good to me. Thanks!

slundberg · 2022-08-31T23:18:02Z

adatest/_test_tree.py

+            axis=1
+        )
+        return has_subtopics_df.any()
+


If we still have performance issues we could try vector AND operations here. But this seems good for now :)

If we stick with a topic tree, I'd be more inclined to rewrite the internal datastructure to be a tree, and just use DataFrame for serialisation.

riedgar-ms added 3 commits August 25, 2022 16:12

Stop is_subtopic() using regular expressions

84f3c8e

Remove incorrect implementation

ba5dede

Use DataFrame.apply()

3bbe999

riedgar-ms requested review from slundberg and Harsha-Nori August 25, 2022 20:40

Remove debugging info

015c7f5

riedgar-ms changed the title ~~[WIP] Improve performance on deeply nested topics~~ Improve performance on deeply nested topics Aug 26, 2022

riedgar-ms added 4 commits August 26, 2022 08:09

Refactor a little

46d8168

Copy/paste error

3aeac57

Putting in a simple test

3fda89b

blacken

2b2207e

riedgar-ms added 3 commits August 26, 2022 09:32

Stop exposing apply

e23c5eb

Merge branch 'main' into riedgar-ms/test-tree-perf-01

afc0bc4

Merge branch 'main' into riedgar-ms/wheel-script

bea8bf0

slundberg approved these changes Aug 31, 2022

View reviewed changes

riedgar-ms merged commit 2ccb0df into microsoft:main Sep 1, 2022

riedgar-ms deleted the riedgar-ms/test-tree-perf-01 branch September 1, 2022 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance on deeply nested topics #33

Improve performance on deeply nested topics #33

riedgar-ms commented Aug 25, 2022 •

edited

Loading

riedgar-ms commented Aug 26, 2022

riedgar-ms commented Aug 30, 2022

slundberg left a comment

slundberg Aug 31, 2022

riedgar-ms Sep 1, 2022

Improve performance on deeply nested topics #33

Improve performance on deeply nested topics #33

Conversation

riedgar-ms commented Aug 25, 2022 • edited Loading

riedgar-ms commented Aug 26, 2022

riedgar-ms commented Aug 30, 2022

slundberg left a comment

Choose a reason for hiding this comment

slundberg Aug 31, 2022

Choose a reason for hiding this comment

riedgar-ms Sep 1, 2022

Choose a reason for hiding this comment

riedgar-ms commented Aug 25, 2022 •

edited

Loading