Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

Merged
merged 4 commits into from
Sep 16, 2021

Conversation

12mohaned
Copy link
Contributor

@12mohaned 12mohaned commented Sep 13, 2021

The pull request refactors the sentiment analyser class by generally doing the following points:

  • Removing dead code like sentiment value in for loop(which made my curious why it is defined in all_words while not used)
  • Replace '==' in checking boolean values with the variable value. For example refactor if value == False with if not value: (performance reasons)

A single if statement or even dozens won not be that bad, but hundred of if's statement, this may cause some minor performance issues. The below numbers illustrates better.

python -m timeit -s "variable=False" "if variable == True: pass"
output:

10000000 loops, best of 5: 28.7 nsec per loop

python -m timeit -s "variable=True" "if variable is True: pass"

10000000 loops, best of 5: 21 nsec per loop

python -m timeit -s "variable=True" "if variable: pass"

20000000 loops, best of 5: 13.6 nsec per loop

I believe this may affect the performance of the sentiment analyser in a minor way.

@tomaarsen
Copy link
Member

I suppose that is is indeed slightly faster than ==. For those unaware:

  • == is for value equality. For when we want to ensure that two objects have the same value.
  • is is for reference equality. For when we want to ensure that two references refer to the same object.

See this SO for more information.

With is, only the pointers need to be compared, while == has the additional step of traversing to the pointer's location and reading the value. I can see how this is (insignificantly) faster.

Generally, the use of is is not recommended (with the exception of singletons like None), as

>>> a = 2000
>>> b = 2000
>>> a == b
True
>>> a is b
False

It can also cause some really confusing bugs:

>>> a = 2000
>>> b = 2000
>>> a is b
False
>>> c = 200
>>> d = 200
>>> c is d
True

(more information on this "bug")

In conclusion:

Yes:   if greeting:
No:    if greeting == True:
Worse: if greeting is True:

(source)

So, in short, I'm in favor of this PR, with the exception of the following:

nltk/sentiment/sentiment_analyzer.py Outdated Show resolved Hide resolved
@purificant
Copy link
Member

I like this, using if foo: is more pythonic than doing if foo == True: or if foo is True: 👍

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
@12mohaned
Copy link
Contributor Author

I did add the suggested changes.

nltk/sentiment/sentiment_analyzer.py Outdated Show resolved Hide resolved
@tomaarsen tomaarsen merged commit 53dbaa5 into nltk:develop Sep 16, 2021
@tomaarsen
Copy link
Member

@12mohaned Thank you for your changes! I can always respect a code quality PR.
@purificant Thank you for the reviews. 👍

@12mohaned 12mohaned deleted the Sentiment_analyzerRefactor branch September 17, 2021 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants