Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove .git and .hg from recursive file search #4636

Merged
merged 2 commits into from Sep 18, 2017

Conversation

@erasche
Copy link
Member

commented Sep 18, 2017

From @bgruening. We saw extraneous opens/reads for these directories during boot. E.g.

open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/.hg", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/.hg/store", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/.hg/store/data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/.hg/store/data/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/.hg/cache", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_mapstatistics/e4c1d9a768db/openms_mapstatistics/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("./config/tool_conf.xml", O_RDONLY) = 24
open("./config/shed_tool_conf.xml", O_RDONLY) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/.hg", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/.hg/store", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/.hg/store/data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/.hg/store/data/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/.hg/cache", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_rtevaluation/d5b15b21844b/openms_rtevaluation/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("./config/tool_conf.xml", O_RDONLY) = 24
open("./config/shed_tool_conf.xml", O_RDONLY) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/.hg", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/.hg/store", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/.hg/store/data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/.hg/store/data/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/.hg/cache", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24
open("../shed_tools/toolshed.g2.bx.psu.edu/repos/galaxyp/openms_internalcalibration/975efdac387b/openms_internalcalibration/test-data", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 24

(the duplicate re-reads of the shed tool conf were pre-#4495)

@jmchilton

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

Good call - thanks!

@@ -22,6 +22,7 @@

YAML_EXTENSIONS = [".yaml", ".yml", ".json"]
CWL_EXTENSIONS = YAML_EXTENSIONS + [".cwl"]
EXCLUDE_WALK_DIRS = ['.hg', '.git']

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Sep 18, 2017

Member

would it help to exclude .venv here ?

This comment has been minimized.

Copy link
@erasche

erasche Sep 18, 2017

Author Member

It doesn't hurt? But probably not necessary. I don't expect .venvs in the shed tool directories, but again won't hurt anything.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Sep 18, 2017

Member

Ah, I didn't get that this applied to tool directories only (it even is in the file name ...), but yeah, I guess it doesn't hurt.

@@ -264,6 +265,8 @@ def _find_files(directory, pattern='*'):

matches = []
for root, dirnames, filenames in os.walk(directory):
# exclude some directories (like .hg) from traversing
dirnames[:] = [dir for dir in dirnames if dir not in EXCLUDE_WALK_DIRS]

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 18, 2017

Member

You probably don't need the [:]

This comment has been minimized.

Copy link
@bgruening

bgruening Sep 18, 2017

Member

Mh, I thought we need it according to https://docs.python.org/2/library/os.html#os.walk:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 18, 2017

Member

The [:] makes sense on the left side only if you add some numbers around the colon (e.g. l[3:5]) to modify a part of the list, here dirnames is completely replaced.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Sep 18, 2017

Member

@nsoranzo I don't think you are right. One operation is pointing a variable at a new list and the other operation is replacing the contents of a list. Check out the following example:

>>> x = [1, 2, 3]
>>> y = x
>>> y = [4, 5, 6]
>>> x
[1, 2, 3]
>>> y = x
>>> y[:] = [4, 5, 6]
>>> x
[4, 5, 6]

If you just replace dirnames to point at a new list, there is no way os.walk internals would know about this. If you change the list itself though - it can use the new contents.

Update: Unless you are simply arguing that it would be more efficient to just strip the few items out of the list rather than replacing the whole list - which I would think makes sense.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 18, 2017

Member

Mmmh, I'll blame it on Monday morning, you're all right, sorry for the noise.

This comment has been minimized.

Copy link
@anuprulez

anuprulez Sep 18, 2017

Member

In the second case,

>>> y = x
>>> y[:] = [4, 5, 6]
>>> x
[4, 5, 6]

no new object is created. x and y point to the same object and that object we mutate using y[:]. But in the first case, we just have another pointer (y) to the object already pointed to by x (y = x). In line y = [4, 5, 6]
y points to another object. I think calling a method like an append or slicing, mutates a list.

@mvdbeek mvdbeek merged commit d9a8055 into galaxyproject:dev Sep 18, 2017

5 of 6 checks passed

toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
api test Build finished. 292 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 161 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 45 tests run, 0 skipped, 0 failed.
Details
lgtm analysis: JavaScript No alert changes
Details
@mvdbeek

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

Thanks @erasche!

@martenson martenson moved this from Startup to Merged in Performance Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.