Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdfind takes multiple minutes if a NTFS bootcamp volume is mounted #48

Closed
natewalck opened this issue Sep 19, 2014 · 11 comments
Closed

Comments

@natewalck
Copy link
Contributor

From rrmiddleton@gmail.com on November 26, 2010 23:30:36

Relating to:
munkicommon.getSpotlightInstalledApplications()

** What steps will reproduce the problem?
0. Need a computer with a bootcamp partition

  1. time /usr/bin/mdfind "kMDItemKind = 'Application'"
  2. in Disk Utility unmount the bootcamp volume
  3. time /usr/bin/mdfind "kMDItemKind = 'Application'"

** What is the expected output? What do you see instead?
On my MBAir:

  • 2min22s when Bootcamp mounted.
  • 0m0.072s when Bootcamp is not mounted.
    Spotlight can't do its thing on a read-only volume, but will still try to search it!

** Possible fix:
Reimplement:
munkicommon.getSpotlightInstalledApplications()
as a loop over the '/' directory:
/usr/bin/mdfind -onlyin /Applications
/usr/bin/mdfind -onlyin /Library
/usr/bin/mdfind -onlyin /Users
/usr/bin/mdfind -onlyin /System
... etc ...
deliberately skipping /Volumes at the stage of mdfind rather than as a post-processing exclusion.

** Why a problem:

  • on manual check, multiple minutes wasted
  • on auto check, high disk I/O for a couple of minutes while the user is trying to work
  • particularly for people using managed_updates, this check is triggered every time (if using a installs item of type application); for those only using managed_installs, once all apps are up to date this check is not triggered.

Original issue: http://code.google.com/p/munki/issues/detail?id=48

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on November 26, 2010 23:31:22

Labels: -Type-Defect -Priority-Medium Type-Enhancement Priority-Low

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on November 26, 2010 23:34:10

I am happy to make this change after receiving input from others on the idea.

This probably won't help any speed problems people using read-only NFS automounts might be having (though it could if these were all in a top-directory location easy to exclude before running mdfind).

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on November 27, 2010 00:41:48

Test code below (replaces munkicommon.getSpotlightInstalledApplications).

It provides a massive speedup on my test computer with bootcamp - from over 2 minutes to under 5 seconds for the entire getAppData routine.

On another test computer without the bootcamp problem I am seeing this modified function slow things down a little more than expected (from sub 1 second to about 4 seconds to execute; the mdfind specifically on the /Applications directory is slower than the same mdfind on the entire computer).

ie: it does have some performance hit for computers not experiencing the problem; however that performance hit is only a few seconds.

def mdfindAppsInDir(directory,applist):
"""Use spotlight to search for items of type application in directory
and append to applist (input parameter is modified).
"""
argv = ['/usr/bin/mdfind', '-0', '-onlyin', directory,
'kMDItemKind = 'Application'']
p = subprocess.Popen(argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(stdout, stderr) = p.communicate()
rc = p.wait()

if rc == 0:
    for app_path in stdout.split('\0'):
        if (not app_path.startswith('/Volumes/') and not
        isExcludedFilesystem(app_path)):
            applist.append(app_path)

def getSpotlightInstalledApplications():
"""Get paths of currently installed applications per Spotlight.
Return value is list of paths.
Ignores apps installed on other volumes
"""
skipDirs = ['Volumes','tmp','.vol','.Trashes',
'.Spotlight-V100','.fseventsd','Network','net','cores','dev']
applist = []

for f in os.listdir('/'):
    if not f in skipDirs:
        p = os.path.join('/',f)
        if os.path.isdir(p) and not os.path.islink(p) and not os.path.ismount(p):
            if f.endswith('.app'):
                applist.append(p)
            else:
                mdfindAppsInDir(p,applist)

return applist

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on November 27, 2010 00:43:05

Status: Started

@natewalck
Copy link
Contributor Author

From jr...@google.com on December 06, 2010 13:48:43

overall I'm semi OK with this. I had hoped to eventually widen the removal of /Volumes from mdfind output, and only remove if they are network paths. Local disks should still be under consideration.

be careful to use munkicommon.listdir() instead of os.listdir(), and while we're in here we should decode utf-8 output from mdfind into unicode objects.

@natewalck
Copy link
Contributor Author

From gregnea...@mac.com on December 06, 2010 13:58:01

There's no one perfect answer for this problem; it may be specific to each deployment.

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on December 06, 2010 14:05:57

munkicommon.isExcludedFilesystem suits me well enough for the filesystems to be excluded. NFS or read-only get excluded (not sure what it does to AFP / SMB mounts). I would be happy to walk into Volumes as well, and only run the mdfind function on directories within that return false for isExcludedFilesystem.

For my site walking into Volumes is unnecessary (but doesn't hurt either). Most computers have only one drive, some have more but in no standard way / not named the same thing. Managed applications will always be installed on the boot drive here.

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on December 08, 2010 01:00:42

Any thoughts on making the spotlight call from within python itself? I presume this starts up a thread that we don't really want to start up ...
Making a NSMetadataQuery call directly allows setting of the search scope to an array of folders (mdfind allows only one to be specified at each command line call).

This code:

  • includes /Users/Shared but excludes the rest of /Users (current code seems to include /Users)
  • excludes network filesystems
  • includes local volumes, but not if they are read-only (eg: boot-camp volume)

Oddly I'm noticing that on one of my systems with bootcamp I get the multi-minute delay, while on another spotlight appears to entirely ignore the read-only NTFS volume with bootcamp on it (and thus returns quickly). Perhaps slightly different builds of 10.6.5 (one is a MBAir).

from Foundation import NSMetadataQuery, NSPredicate, NSRunLoop, NSDate

def findAppsInDirs(dirlist,applist):
query = NSMetadataQuery.alloc().init()
query.setPredicate_(NSPredicate.predicateWithFormat_("(kMDItemKind = "Application")"))
#query.setSearchScopes_(['/Applications/Utilities','/Library','/Volumes/Untitled','/Users/Shared'])
query.setSearchScopes_(dirlist)
query.startQuery()
while query.isGathering():
NSRunLoop.currentRunLoop().runUntilDate_(NSDate.dateWithTimeIntervalSinceNow_(1))
query.stopQuery()

for item in query.results():
    p = item.valueForAttribute_("kMDItemPath")
    if not isExcludedFilesystem(p):
        applist.append(p)

def getSpotlightInstalledApplications():
"""Get paths of currently installed applications per Spotlight.
Return value is list of paths.
Ignores apps installed on other volumes
"""
skipDirs = ['Volumes','tmp','.vol','.Trashes','Users',
'.Spotlight-V100','.fseventsd','Network','net','cores','dev']
dirlist = []
applist = []

for f in listdir('/'):
    if not f in skipDirs:
        p = os.path.join('/',f)
        if os.path.isdir(p) and not os.path.islink(p) and not isExcludedFilesystem(p):
            if f.endswith('.app'):
                applist.append(p)
            else:
                dirlist.append(p)

for f in listdir('/Volumes'):
        p = os.path.join('/Volumes',f)
        if os.path.isdir(p) and not os.path.islink(p) and not isExcludedFilesystem(p):
            dirlist.append(p)

dirlist.append('/Users/Shared')

#print dirlist

findAppsInDirs(dirlist,applist)

return applist

@natewalck
Copy link
Contributor Author

From gregnea...@mac.com on December 08, 2010 10:51:29

I think calling Spotlight via the API sounds great. I have two concerns:

  1. Is NSRunLoop available at all times to a running Python script? I thought that required an application object...

  2. I'm not clear what you mean when you said "Oddly I'm noticing that on one of my systems with bootcamp I get the multi-minute delay" -- do you mean you get the delay even if you call Spotlight specifically excluding the NTFS filesystem?

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on December 09, 2010 03:41:59

"1) Is NSRunLoop available at all times to a running Python script? I thought that required an application object..."
It is working fine. I don't understand it well enough, but I understand it to be a thread control mechanism - one which is also used by all the GUI stuff. But it is Apple's generic wrap concept that deals with threads in command line stuff too.

"2) I'm not clear what you mean when you said "Oddly I'm noticing that on one of my systems with bootcamp I get the multi-minute delay" -- do you mean you get the delay even if you call Spotlight specifically excluding the NTFS filesystem?"
No - this code works without delay on all machines. I simply had second thoughts about whether it was worth fixing as munki-head doesn't display this fault consistently on all machines. However it seems it is more likely to show the fault on the first few runs after boot. Spotlight doesn't seem to be perfectly consistent in its behaviour if not told exactly what to do (but this change tells it exactly which paths it is allowed to look in).

I'm happy with the code below now, probably can commit the changes. Note: not tested on 10.5 yet (have few 10.5 computers left, and am upgrading them to 10.6 before installing munki).

This change does change behaviour in terms of what is included / excluded.

The unchanged LaunchServices code appears to find applications anywhere on the boot drive - including in user homes, these are later excluded in updatecheck.

The changed Spotlight code:

  • finds applications on the boot drive and other local r/w volumes
  • excludes /Users (but allows /Users/Shared) at point of search.

from Foundation import NSDate, NSMetadataQuery, NSPredicate, NSRunLoop

def findAppsInDirs(dirlist):
"""Do spotlight search for type applications within the
list of directories provided. Returns a list of paths to applications
these appear to always be some form of unicode string.
"""
applist = []
query = NSMetadataQuery.alloc().init()
query.setPredicate_(NSPredicate.predicateWithFormat_(
"(kMDItemKind = "Application")"))
query.setSearchScopes_(dirlist)
query.startQuery()
#Spotlight isGathering phase - this is the initial search. After the
# isGathering phase Spotlight keeps running returning live results from
# filesystem changes, we are not interested in that phase.
#Run for 0.3 seconds then check if isGathering has completed.
while query.isGathering():
NSRunLoop.currentRunLoop().runUntilDate_(
NSDate.dateWithTimeIntervalSinceNow_(0.3))
query.stopQuery()

for item in query.results():
    p = item.valueForAttribute_("kMDItemPath")
    if not isExcludedFilesystem(p):
        applist.append(p)
return applist

def getSpotlightInstalledApplications():
"""Get paths of currently installed applications per Spotlight.
Return value is list of paths.
Ignores apps installed on non-local or read-only volumes,
ignores apps within user accounts.
"""
skipdirs = ['Volumes', 'tmp', '.vol', '.Trashes', 'Users',
'.Spotlight-V100', '.fseventsd', 'Network', 'net',
'home', 'cores', 'dev']
dirlist = []
applist = []

for f in listdir(u'/'):
    if not f in skipdirs:
        p = os.path.join(u'/', f)
        if os.path.isdir(p) and not os.path.islink(p) \
                            and not isExcludedFilesystem(p):
            if f.endswith('.app'):
                applist.append(p)
            else:
                dirlist.append(p)

for f in listdir(u'/Volumes'):
    p = os.path.join(u'/Volumes', f)
    if os.path.isdir(p) and not os.path.islink(p) \
                        and not isExcludedFilesystem(p):
        dirlist.append(p)

dirlist.append(u'/Users/Shared')

applist.extend(findAppsInDirs(dirlist))
return applist

@natewalck
Copy link
Contributor Author

From rrmiddleton@gmail.com on January 30, 2011 15:46:51

Committed in r1003 . Closing issue.

Further discussions still to be had on why we care if we find an app in a non-default location, and what we should do about it.

Status: Verified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant