Skip to content

Conversation

ashridh
Copy link
Collaborator

@ashridh ashridh commented May 7, 2019

Fixes : #338 .

Each cleanup function first compiles the regex to a regex object, and then performs the sub on the text using this compiled regex.

Copy link
Collaborator

@marco-c marco-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling the regexes when the function is called won't give you any benefit.
For example in the case of bugs, for each comment you will call the function, compile the regex and then apply it.

The optimization we want to do is to compile regexes just once, and then use them many times.

Two possible ways to implement this:

  1. Make the cleanup functions classes, like in bug_features, and compile the regex in their constructor
  2. Make the compiled regexes global variables

@ashridh
Copy link
Collaborator Author

ashridh commented May 8, 2019

Yes, I realised this later. I think making them classes is better, as it'll maintain uniformity throughout cleanup functions and feature extractors.

@marco-c marco-c merged commit c440db7 into mozilla:master May 9, 2019
@ashridh ashridh deleted the compile branch May 9, 2019 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature cleanup regexes could be compiled
2 participants