Use whitelists/stdlib.py as default whitelist. #50

RJ722 · 2017-06-13T19:35:03Z

Description

Append the location of whitelists/stdlib.py at the end of args passed, so that vulture consumes this whitelist in every run.

Related Issue

Closes: #38

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation in the README file.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

RJ722 · 2017-06-13T20:37:29Z

Having a default whitelist might have side-effects:
If we are whitelisting some dead code for a file, say A, and another file, sayB contains the same dead code, we won't be able to report the unused code in B:

For example, in a file dead.py, I have:

import subprocess
import sys

def foo():
    pass

foo()

Only subprocess' import would be reported as unused because stdlib uses:

sys.stderr
sys.stdin
sys.stdout

This might be a bad user experience! :(

jendrikseipp · 2017-06-13T22:52:06Z

Good catch. I changed the stdlib.py file to account for this and added a comment there.

jendrikseipp · 2017-06-13T19:59:20Z

tests/test_script.py

@@ -25,7 +25,7 @@ def test_script_with_whitelist():


 def test_script_without_whitelist():


The two functions should now be called test_script_with_implicit_whitelist and test_script_with_explicit_whitelist.

jendrikseipp · 2017-06-13T19:59:55Z

vulture.py

@@ -34,6 +34,7 @@
 import re
 import sys
 import tokenize
+from whitelists import stdlib


stdlib imports should be separated from other imports by a newline.

jendrikseipp · 2017-06-13T23:02:47Z

vulture.py

+    whitelist_path = os.path.abspath(stdlib.__file__)
+    if whitelist_path.endswith('.pyc'):
+        whitelist_path = whitelist_path[:-1]
+    args.append(whitelist_path)


This solution only works when calling vulture from the command line, not as a library. You can add the following code into scavenge

modules = self._get_modules(paths) modules.append(_get_stdlib_whitelist_file())

and somewhere at the top of the script

def _get_stdlib_whitelist_file(): script = os.path.abspath(__file__) whitelist_dir = os.path.dirname(script) return os.path.join(whitelist_dir, 'stdlib.py')

As a side effect, users can now disable default whitelist files by using --exclude whitelists since "whitelists" is part of the path name.

As a side effect, users can now disable default whitelist files by using --exclude whitelists since "whitelists" is part of the path name.

That's a win-win 🎉

RJ722 · 2017-06-13T23:35:57Z

Done making changes!

jendrikseipp · 2017-06-14T08:08:18Z

vulture.py

@@ -35,6 +35,8 @@
 import sys
 import tokenize

+from whitelists import stdlib
+


This should be obsolete.

jendrikseipp · 2017-06-14T08:09:19Z

vulture.py

@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True):
                sys.exit('Error: %s could not be found.' % path)
        return modules

+    def _get_stdlib_whitelist_file(self):


This should be a function, not a method, since it doesn't need access to the Vulture object.

jendrikseipp · 2017-06-14T08:10:01Z

vulture.py

@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True):
                sys.exit('Error: %s could not be found.' % path)
        return modules

+    def _get_stdlib_whitelist_file(self):
+        script = os.path.abspath(stdlib.__file__)


use file instead of stdlib.file

Even if we get the location of vulture.py, can we have a relative path to stdlib?

I think we need to change the way vulture is installed. Currently, only the main script is installed. Since no directory is created for vulture, we can't add the stdlib.py file anywhere. I'm not sure what the best solution is to this problem. I'll do some reading, maybe you have an idea?

https://docs.python.org/3/distutils/setupscript.html#installing-package-data
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
https://stackoverflow.com/a/5899643

I think it's best to turn vulture into a package first, before we include the default whitelist. Can you open an issue for this? If you want to start a pull request, we need the following layout:

vulture/ __init__.py (contains __version__) core.py (with the contents from old vulture.py)

RJ722 · 2017-06-23T19:36:34Z

vulture/core.py

            try:
-                module_string = read_file(module)
+                module_string += read_file(module)


This seems to be highly inefficient!
Can we use pkg_resources

We can then change _get_stdlib_whitelist to:

def _get_stdlib_whitelist(): """ Returns absolute path of the `stdlib` whitelist. """ return pkg_resources.resource_filename('vulture', 'whitelists/stdlib.py')

Then, we can just append it to modules.

This would definitely be faster but on the other hand, we need to add pkg_resources as a dependency. IMHO this would be worth it.

I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:

def scavenge(self, paths): def exclude(name): return any(fnmatchcase(name, pattern) for pattern in self.exclude) for module in self._get_modules(paths): if exclude(module): self.log('Excluded:', module) continue self.log('Scanning:', module) try: module_string = read_file(module) except VultureInputException as err: print('Error: Could not read file %s - %s' % (module, err)) print('You might want to change the encoding to UTF-8.') else: self.scan(module_string, filename=module) whitelist_names = ['stdlib.py'] for name in whitelist_names: path = os.path.join('whitelists', name) if exclude(path): self.log('Excluded whitelist:', path) else: module_data = pkgutil.get_data('vulture', path) if module_data is None: sys.exit('Error: Please use "python -m vulture.core".') module_string = module_data.decode("utf-8") self.scan(module_string, filename=path)

There are more changes needed afterwards: the tests have to use python -m vulture.core instead of python vulture.py.

jendrikseipp · 2017-06-23T21:18:44Z

vulture/core.py

@@ -87,6 +88,10 @@ def read_file(filename):
        raise VultureInputException(err)


+def _get_stdlib_whitelist():
+    return pkgutil.get_data('vulture', 'whitelists/stdlib.py').decode("UTF-8")


Please use lowercase utf-8.

After you apply the change to scavenge() we don't need this function anymore.

jendrikseipp · 2017-06-24T03:41:59Z

vulture/core.py

            try:
-                module_string = read_file(module)
+                module_string += read_file(module)


I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:

def scavenge(self, paths): def exclude(name): return any(fnmatchcase(name, pattern) for pattern in self.exclude) for module in self._get_modules(paths): if exclude(module): self.log('Excluded:', module) continue self.log('Scanning:', module) try: module_string = read_file(module) except VultureInputException as err: print('Error: Could not read file %s - %s' % (module, err)) print('You might want to change the encoding to UTF-8.') else: self.scan(module_string, filename=module) whitelist_names = ['stdlib.py'] for name in whitelist_names: path = os.path.join('whitelists', name) if exclude(path): self.log('Excluded whitelist:', path) else: module_data = pkgutil.get_data('vulture', path) if module_data is None: sys.exit('Error: Please use "python -m vulture.core".') module_string = module_data.decode("utf-8") self.scan(module_string, filename=path)

There are more changes needed afterwards: the tests have to use python -m vulture.core instead of python vulture.py.

jendrikseipp · 2017-06-24T03:43:59Z

tests/test_script.py

-def test_script_without_whitelist():
-    assert call_vulture(['vulture/core.py']) == 1
+def test_script_with_implicit_whitelist():
+    assert call_vulture(['vulture/core.py']) == 0


There should also be test_script_without_whitelist() which uses --exclude whitelists/stdlib.py.

RJ722 · 2017-06-24T19:34:13Z

vulture/core.py

+                else:
+                    module_data = pkgutil.get_data('vulture', path)
+                    if module_data is None:
+                        sys.exit('Error: Please use "python -m vulture.core".')


Thank You @jendrikseipp 😄

Just one thing which is a little blurry at the moment: Why are we checking if module_data is None and also wouldn't running python -m vulture.core just change the entrypoint?

Previously it was possible to run "python core.py". Now this is not possible anymore.

How will it impact pkgutil.get_data?

Also, if I run python vulture/core.py vulture/core.py --exclude whitelists/stdlib.py, this seems to be working great.

I don't understand what you mean, but now everything should work fine :-)

RJ722 · 2017-06-25T08:01:49Z

Do we need to add any more tests?

jendrikseipp requested changes Jun 13, 2017

View reviewed changes

RJ722 force-pushed the default_whitelist branch from fd3bc0a to 1d89db0 Compare June 13, 2017 23:28

jendrikseipp requested changes Jun 14, 2017

View reviewed changes

RJ722 mentioned this pull request Jun 14, 2017

Package vulture #52

Closed

RJ722 force-pushed the default_whitelist branch 2 times, most recently from 6eadb5c to 86e204f Compare June 22, 2017 19:47

RJ722 mentioned this pull request Jun 22, 2017

Implement a flag for toggling the whitelists off #55

Closed

jendrikseipp mentioned this pull request Jun 23, 2017

Ship vulture as a package. #54

Merged

5 tasks

RJ722 force-pushed the default_whitelist branch from 86e204f to 8e33837 Compare June 23, 2017 19:30

RJ722 commented Jun 23, 2017

View reviewed changes

jendrikseipp requested changes Jun 24, 2017

View reviewed changes

Use stdlib.py as a default whitelist

9902b60

RJ722 force-pushed the default_whitelist branch from 8e33837 to 9902b60 Compare June 24, 2017 19:02

RJ722 commented Jun 24, 2017

View reviewed changes

jendrikseipp approved these changes Jun 25, 2017

View reviewed changes

jendrikseipp merged commit f2c0cf5 into jendrikseipp:master Jun 25, 2017

RJ722 deleted the default_whitelist branch June 25, 2017 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use whitelists/stdlib.py as default whitelist. #50

Use whitelists/stdlib.py as default whitelist. #50

RJ722 commented Jun 13, 2017 •

edited

Loading

RJ722 commented Jun 13, 2017 •

edited

Loading

jendrikseipp commented Jun 13, 2017

jendrikseipp Jun 13, 2017

jendrikseipp Jun 13, 2017

jendrikseipp Jun 13, 2017

RJ722 Jun 13, 2017

RJ722 commented Jun 13, 2017

jendrikseipp Jun 14, 2017

jendrikseipp Jun 14, 2017

jendrikseipp Jun 14, 2017

RJ722 Jun 14, 2017 •

edited

Loading

jendrikseipp Jun 14, 2017

jendrikseipp Jun 14, 2017

RJ722 Jun 23, 2017 •

edited

Loading

jendrikseipp Jun 24, 2017

jendrikseipp Jun 23, 2017

jendrikseipp Jun 24, 2017

jendrikseipp Jun 24, 2017

jendrikseipp Jun 24, 2017

RJ722 Jun 24, 2017 •

edited

Loading

jendrikseipp Jun 24, 2017

RJ722 Jun 25, 2017

jendrikseipp Jun 25, 2017

RJ722 commented Jun 25, 2017

		@@ -25,7 +25,7 @@ def test_script_with_whitelist():


		def test_script_without_whitelist():

Use whitelists/stdlib.py as default whitelist. #50

Use whitelists/stdlib.py as default whitelist. #50

Conversation

RJ722 commented Jun 13, 2017 • edited Loading

Description

Related Issue

Checklist:

RJ722 commented Jun 13, 2017 • edited Loading

jendrikseipp commented Jun 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RJ722 commented Jun 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RJ722 Jun 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RJ722 Jun 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RJ722 Jun 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RJ722 commented Jun 25, 2017

RJ722 commented Jun 13, 2017 •

edited

Loading

RJ722 commented Jun 13, 2017 •

edited

Loading

RJ722 Jun 14, 2017 •

edited

Loading

RJ722 Jun 23, 2017 •

edited

Loading

RJ722 Jun 24, 2017 •

edited

Loading