refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

12mohaned · 2021-09-13T16:03:39Z

The pull request refactors the sentiment analyser class by generally doing the following points:

Removing dead code like sentiment value in for loop(which made my curious why it is defined in all_words while not used)
Replace '==' in checking boolean values with the variable value. For example refactor if value == False with if not value: (performance reasons)

A single if statement or even dozens won not be that bad, but hundred of if's statement, this may cause some minor performance issues. The below numbers illustrates better.

python -m timeit -s "variable=False" "if variable == True: pass"
output:

10000000 loops, best of 5: 28.7 nsec per loop

python -m timeit -s "variable=True" "if variable is True: pass"

10000000 loops, best of 5: 21 nsec per loop

python -m timeit -s "variable=True" "if variable: pass"

20000000 loops, best of 5: 13.6 nsec per loop

I believe this may affect the performance of the sentiment analyser in a minor way.

…mance code

tomaarsen · 2021-09-13T19:38:24Z

I suppose that is is indeed slightly faster than ==. For those unaware:

== is for value equality. For when we want to ensure that two objects have the same value.
is is for reference equality. For when we want to ensure that two references refer to the same object.

See this SO for more information.

With is, only the pointers need to be compared, while == has the additional step of traversing to the pointer's location and reading the value. I can see how this is (insignificantly) faster.

Generally, the use of is is not recommended (with the exception of singletons like None), as

>>> a = 2000
>>> b = 2000
>>> a == b
True
>>> a is b
False

It can also cause some really confusing bugs:

>>> a = 2000
>>> b = 2000
>>> a is b
False
>>> c = 200
>>> d = 200
>>> c is d
True

(more information on this "bug")

In conclusion:

Yes:   if greeting:
No:    if greeting == True:
Worse: if greeting is True:

(source)

So, in short, I'm in favor of this PR, with the exception of the following:

nltk/sentiment/sentiment_analyzer.py

purificant · 2021-09-13T19:59:21Z

I like this, using if foo: is more pythonic than doing if foo == True: or if foo is True: 👍

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

12mohaned · 2021-09-16T21:22:13Z

I did add the suggested changes.

nltk/sentiment/sentiment_analyzer.py

tomaarsen · 2021-09-16T21:42:12Z

@12mohaned Thank you for your changes! I can always respect a code quality PR.
@purificant Thank you for the reviews. 👍

mohanedmashaly21 added 3 commits September 13, 2021 18:00

refactor: refactor sentiment analyzer by removing dead and slow perfo…

4644c48

…mance code

refactor: refactor sentiment analyzer by removing dead and slow perfo…

f7ab0e2

…mance code

fix: add not to false boolean values

34e31d3

tomaarsen requested changes Sep 13, 2021

View reviewed changes

nltk/sentiment/sentiment_analyzer.py Outdated Show resolved Hide resolved

tomaarsen added the enhancement label Sep 13, 2021

Refactor: Add sentiment keyword in all_words

0ab6e40

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

tomaarsen approved these changes Sep 16, 2021

View reviewed changes

purificant approved these changes Sep 16, 2021

View reviewed changes

nltk/sentiment/sentiment_analyzer.py Outdated Show resolved Hide resolved

tomaarsen merged commit 53dbaa5 into nltk:develop Sep 16, 2021

12mohaned deleted the Sentiment_analyzerRefactor branch September 17, 2021 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

12mohaned commented Sep 13, 2021 •

edited

tomaarsen commented Sep 13, 2021

purificant commented Sep 13, 2021

12mohaned commented Sep 16, 2021

tomaarsen commented Sep 16, 2021

refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

refactor: refactor sentiment analyzer by removing dead and slow perfomance code #2804

Conversation

12mohaned commented Sep 13, 2021 • edited

tomaarsen commented Sep 13, 2021

purificant commented Sep 13, 2021

12mohaned commented Sep 16, 2021

tomaarsen commented Sep 16, 2021

12mohaned commented Sep 13, 2021 •

edited