Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use whitelists/stdlib.py as default whitelist. #50

Merged
merged 1 commit into from
Jun 25, 2017

Conversation

RJ722
Copy link
Contributor

@RJ722 RJ722 commented Jun 13, 2017

Description

Append the location of whitelists/stdlib.py at the end of args passed, so that vulture consumes this whitelist in every run.

Related Issue

Closes: #38

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation in the README file.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@RJ722
Copy link
Contributor Author

RJ722 commented Jun 13, 2017

Having a default whitelist might have side-effects:
If we are whitelisting some dead code for a file, say A, and another file, sayB contains the same dead code, we won't be able to report the unused code in B:

For example, in a file dead.py, I have:

import subprocess
import sys

def foo():
    pass

foo()

Only subprocess' import would be reported as unused because stdlib uses:

sys.stderr
sys.stdin
sys.stdout

This might be a bad user experience! :(

@jendrikseipp
Copy link
Owner

Good catch. I changed the stdlib.py file to account for this and added a comment there.

@@ -25,7 +25,7 @@ def test_script_with_whitelist():


def test_script_without_whitelist():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two functions should now be called test_script_with_implicit_whitelist and test_script_with_explicit_whitelist.

vulture.py Outdated
@@ -34,6 +34,7 @@
import re
import sys
import tokenize
from whitelists import stdlib
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdlib imports should be separated from other imports by a newline.

vulture.py Outdated
whitelist_path = os.path.abspath(stdlib.__file__)
if whitelist_path.endswith('.pyc'):
whitelist_path = whitelist_path[:-1]
args.append(whitelist_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution only works when calling vulture from the command line, not as a library. You can add the following code into scavenge

modules = self._get_modules(paths)
modules.append(_get_stdlib_whitelist_file())

and somewhere at the top of the script

def _get_stdlib_whitelist_file():
    script = os.path.abspath(__file__)
    whitelist_dir = os.path.dirname(script)
    return os.path.join(whitelist_dir, 'stdlib.py')

As a side effect, users can now disable default whitelist files by using --exclude whitelists since "whitelists" is part of the path name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side effect, users can now disable default whitelist files by using --exclude whitelists since "whitelists" is part of the path name.

That's a win-win 🎉

@RJ722
Copy link
Contributor Author

RJ722 commented Jun 13, 2017

Done making changes!

vulture.py Outdated
@@ -35,6 +35,8 @@
import sys
import tokenize

from whitelists import stdlib

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be obsolete.

vulture.py Outdated
@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True):
sys.exit('Error: %s could not be found.' % path)
return modules

def _get_stdlib_whitelist_file(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a function, not a method, since it doesn't need access to the Vulture object.

vulture.py Outdated
@@ -165,8 +167,14 @@ def _get_modules(self, paths, toplevel=True):
sys.exit('Error: %s could not be found.' % path)
return modules

def _get_stdlib_whitelist_file(self):
script = os.path.abspath(stdlib.__file__)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use file instead of stdlib.file

Copy link
Contributor Author

@RJ722 RJ722 Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we get the location of vulture.py, can we have a relative path to stdlib?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change the way vulture is installed. Currently, only the main script is installed. Since no directory is created for vulture, we can't add the stdlib.py file anywhere. I'm not sure what the best solution is to this problem. I'll do some reading, maybe you have an idea?

https://docs.python.org/3/distutils/setupscript.html#installing-package-data
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
https://stackoverflow.com/a/5899643

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to turn vulture into a package first, before we include the default whitelist. Can you open an issue for this? If you want to start a pull request, we need the following layout:

vulture/
    __init__.py (contains __version__)
    core.py (with the contents from old vulture.py)

@RJ722 RJ722 mentioned this pull request Jun 14, 2017
@RJ722 RJ722 force-pushed the default_whitelist branch 2 times, most recently from 6eadb5c to 86e204f Compare June 22, 2017 19:47
@jendrikseipp jendrikseipp mentioned this pull request Jun 23, 2017
5 tasks
vulture/core.py Outdated
try:
module_string = read_file(module)
module_string += read_file(module)
Copy link
Contributor Author

@RJ722 RJ722 Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be highly inefficient!
Can we use pkg_resources

We can then change _get_stdlib_whitelist to:

def _get_stdlib_whitelist():
    """
    Returns absolute path of the `stdlib` whitelist.
    """
    return pkg_resources.resource_filename('vulture', 'whitelists/stdlib.py')

Then, we can just append it to modules.

This would definitely be faster but on the other hand, we need to add pkg_resources as a dependency. IMHO this would be worth it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:

    def scavenge(self, paths):
        def exclude(name):
            return any(fnmatchcase(name, pattern) for pattern in self.exclude)

        for module in self._get_modules(paths):
            if exclude(module):
                self.log('Excluded:', module)
                continue

            self.log('Scanning:', module)
            try:
                module_string = read_file(module)
            except VultureInputException as err:
                print('Error: Could not read file %s - %s' % (module, err))
                print('You might want to change the encoding to UTF-8.')
            else:
                self.scan(module_string, filename=module)

        whitelist_names = ['stdlib.py']
        for name in whitelist_names:
            path = os.path.join('whitelists', name)
            if exclude(path):
                self.log('Excluded whitelist:', path)
            else:
                module_data = pkgutil.get_data('vulture', path)
                if module_data is None:
                    sys.exit('Error: Please use "python -m vulture.core".')
                module_string = module_data.decode("utf-8")
                self.scan(module_string, filename=path)

There are more changes needed afterwards: the tests have to use python -m vulture.core instead of python vulture.py.

vulture/core.py Outdated
@@ -87,6 +88,10 @@ def read_file(filename):
raise VultureInputException(err)


def _get_stdlib_whitelist():
return pkgutil.get_data('vulture', 'whitelists/stdlib.py').decode("UTF-8")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lowercase utf-8.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you apply the change to scavenge() we don't need this function anymore.

vulture/core.py Outdated
try:
module_string = read_file(module)
module_string += read_file(module)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use pkgutil.get_data. The change is more complicated than anticipated. Let me propose the following solution:

    def scavenge(self, paths):
        def exclude(name):
            return any(fnmatchcase(name, pattern) for pattern in self.exclude)

        for module in self._get_modules(paths):
            if exclude(module):
                self.log('Excluded:', module)
                continue

            self.log('Scanning:', module)
            try:
                module_string = read_file(module)
            except VultureInputException as err:
                print('Error: Could not read file %s - %s' % (module, err))
                print('You might want to change the encoding to UTF-8.')
            else:
                self.scan(module_string, filename=module)

        whitelist_names = ['stdlib.py']
        for name in whitelist_names:
            path = os.path.join('whitelists', name)
            if exclude(path):
                self.log('Excluded whitelist:', path)
            else:
                module_data = pkgutil.get_data('vulture', path)
                if module_data is None:
                    sys.exit('Error: Please use "python -m vulture.core".')
                module_string = module_data.decode("utf-8")
                self.scan(module_string, filename=path)

There are more changes needed afterwards: the tests have to use python -m vulture.core instead of python vulture.py.

def test_script_without_whitelist():
assert call_vulture(['vulture/core.py']) == 1
def test_script_with_implicit_whitelist():
assert call_vulture(['vulture/core.py']) == 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be test_script_without_whitelist() which uses --exclude whitelists/stdlib.py.

else:
module_data = pkgutil.get_data('vulture', path)
if module_data is None:
sys.exit('Error: Please use "python -m vulture.core".')
Copy link
Contributor Author

@RJ722 RJ722 Jun 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank You @jendrikseipp 😄

Just one thing which is a little blurry at the moment: Why are we checking if module_data is None and also wouldn't running python -m vulture.core just change the entrypoint?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was possible to run "python core.py". Now this is not possible anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it impact pkgutil.get_data?

Also, if I run python vulture/core.py vulture/core.py --exclude whitelists/stdlib.py, this seems to be working great.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean, but now everything should work fine :-)

@RJ722
Copy link
Contributor Author

RJ722 commented Jun 25, 2017

Do we need to add any more tests?

@jendrikseipp jendrikseipp merged commit f2c0cf5 into jendrikseipp:master Jun 25, 2017
@RJ722 RJ722 deleted the default_whitelist branch June 25, 2017 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants