-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use whitelists/stdlib.py as default whitelist. #50
Conversation
Having a default whitelist might have side-effects: For example, in a file import subprocess
import sys
def foo():
pass
foo() Only subprocess' import would be reported as unused because sys.stderr
sys.stdin
sys.stdout This might be a bad user experience! :( |
Good catch. I changed the stdlib.py file to account for this and added a comment there. |
tests/test_script.py
Outdated
@@ -25,7 +25,7 @@ def test_script_with_whitelist(): | |||
|
|||
|
|||
def test_script_without_whitelist(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two functions should now be called test_script_with_implicit_whitelist
and test_script_with_explicit_whitelist
.
vulture.py
Outdated
@@ -34,6 +34,7 @@ | |||
import re | |||
import sys | |||
import tokenize | |||
from whitelists import stdlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdlib imports should be separated from other imports by a newline.
vulture.py
Outdated
whitelist_path = os.path.abspath(stdlib.__file__) | ||
if whitelist_path.endswith('.pyc'): | ||
whitelist_path = whitelist_path[:-1] | ||
args.append(whitelist_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution only works when calling vulture from the command line, not as a library. You can add the following code into scavenge
modules = self._get_modules(paths)
modules.append(_get_stdlib_whitelist_file())
and somewhere at the top of the script
def _get_stdlib_whitelist_file():
script = os.path.abspath(__file__)
whitelist_dir = os.path.dirname(script)
return os.path.join(whitelist_dir, 'stdlib.py')
As a side effect, users can now disable default whitelist files by using --exclude whitelists
since "whitelists" is part of the path name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side effect, users can now disable default whitelist files by using --exclude whitelists since "whitelists" is part of the path name.
That's a win-win 🎉
Done making changes! |
vulture.py
Outdated
@@ -35,6 +35,8 @@ | |||
import sys | |||
import tokenize | |||
|
|||
from whitelists import stdlib | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be obsolete.
vulture.py
Outdated
@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True): | |||
sys.exit('Error: %s could not be found.' % path) | |||
return modules | |||
|
|||
def _get_stdlib_whitelist_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a function, not a method, since it doesn't need access to the Vulture object.
vulture.py
Outdated
@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True): | |||
sys.exit('Error: %s could not be found.' % path) | |||
return modules | |||
|
|||
def _get_stdlib_whitelist_file(self): | |||
script = os.path.abspath(stdlib.__file__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use file instead of stdlib.file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we get the location of vulture.py
, can we have a relative path to stdlib
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to change the way vulture is installed. Currently, only the main script is installed. Since no directory is created for vulture, we can't add the stdlib.py file anywhere. I'm not sure what the best solution is to this problem. I'll do some reading, maybe you have an idea?
https://docs.python.org/3/distutils/setupscript.html#installing-package-data
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
https://stackoverflow.com/a/5899643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to turn vulture into a package first, before we include the default whitelist. Can you open an issue for this? If you want to start a pull request, we need the following layout:
vulture/
__init__.py (contains __version__)
core.py (with the contents from old vulture.py)
6eadb5c
to
86e204f
Compare
vulture/core.py
Outdated
try: | ||
module_string = read_file(module) | ||
module_string += read_file(module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be highly inefficient!
Can we use pkg_resources
We can then change _get_stdlib_whitelist
to:
def _get_stdlib_whitelist():
"""
Returns absolute path of the `stdlib` whitelist.
"""
return pkg_resources.resource_filename('vulture', 'whitelists/stdlib.py')
Then, we can just append it to modules
.
This would definitely be faster but on the other hand, we need to add pkg_resources
as a dependency. IMHO this would be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:
def scavenge(self, paths):
def exclude(name):
return any(fnmatchcase(name, pattern) for pattern in self.exclude)
for module in self._get_modules(paths):
if exclude(module):
self.log('Excluded:', module)
continue
self.log('Scanning:', module)
try:
module_string = read_file(module)
except VultureInputException as err:
print('Error: Could not read file %s - %s' % (module, err))
print('You might want to change the encoding to UTF-8.')
else:
self.scan(module_string, filename=module)
whitelist_names = ['stdlib.py']
for name in whitelist_names:
path = os.path.join('whitelists', name)
if exclude(path):
self.log('Excluded whitelist:', path)
else:
module_data = pkgutil.get_data('vulture', path)
if module_data is None:
sys.exit('Error: Please use "python -m vulture.core".')
module_string = module_data.decode("utf-8")
self.scan(module_string, filename=path)
There are more changes needed afterwards: the tests have to use python -m vulture.core
instead of python vulture.py
.
vulture/core.py
Outdated
@@ -87,6 +88,10 @@ def read_file(filename): | |||
raise VultureInputException(err) | |||
|
|||
|
|||
def _get_stdlib_whitelist(): | |||
return pkgutil.get_data('vulture', 'whitelists/stdlib.py').decode("UTF-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use lowercase utf-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you apply the change to scavenge()
we don't need this function anymore.
vulture/core.py
Outdated
try: | ||
module_string = read_file(module) | ||
module_string += read_file(module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:
def scavenge(self, paths):
def exclude(name):
return any(fnmatchcase(name, pattern) for pattern in self.exclude)
for module in self._get_modules(paths):
if exclude(module):
self.log('Excluded:', module)
continue
self.log('Scanning:', module)
try:
module_string = read_file(module)
except VultureInputException as err:
print('Error: Could not read file %s - %s' % (module, err))
print('You might want to change the encoding to UTF-8.')
else:
self.scan(module_string, filename=module)
whitelist_names = ['stdlib.py']
for name in whitelist_names:
path = os.path.join('whitelists', name)
if exclude(path):
self.log('Excluded whitelist:', path)
else:
module_data = pkgutil.get_data('vulture', path)
if module_data is None:
sys.exit('Error: Please use "python -m vulture.core".')
module_string = module_data.decode("utf-8")
self.scan(module_string, filename=path)
There are more changes needed afterwards: the tests have to use python -m vulture.core
instead of python vulture.py
.
tests/test_script.py
Outdated
def test_script_without_whitelist(): | ||
assert call_vulture(['vulture/core.py']) == 1 | ||
def test_script_with_implicit_whitelist(): | ||
assert call_vulture(['vulture/core.py']) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be test_script_without_whitelist()
which uses --exclude whitelists/stdlib.py
.
else: | ||
module_data = pkgutil.get_data('vulture', path) | ||
if module_data is None: | ||
sys.exit('Error: Please use "python -m vulture.core".') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank You @jendrikseipp 😄
Just one thing which is a little blurry at the moment: Why are we checking if module_data is None
and also wouldn't running python -m vulture.core
just change the entrypoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it was possible to run "python core.py". Now this is not possible anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will it impact pkgutil.get_data
?
Also, if I run python vulture/core.py vulture/core.py --exclude whitelists/stdlib.py
, this seems to be working great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean, but now everything should work fine :-)
Do we need to add any more tests? |
Description
Append the location of
whitelists/stdlib.py
at the end ofargs
passed, so that vulture consumes this whitelist in every run.Related Issue
Closes: #38
Checklist: