Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for improvement, perhaps by scoring an exact suffix match higher #12

Closed
r-owen opened this issue Dec 4, 2015 · 8 comments
Closed

Comments

@r-owen
Copy link

r-owen commented Dec 4, 2015

I'm finding that the fuzzy search tends to favor the wrong files, especially when searching for header files in a mix of those and html documentation (the html files are favored, even if the search ends in ".h").

This is using Atom 1.3.0-beta6 on MacOS 10.9.5 with fuzzy-finder setting "Use Alternate Scoring" turned on (and it definitely helps).

I have uploaded atom_test, a repository that shows the issue. It also includes some screen shots (described below).

Unpack atom_test, open it in Atom beta and try to find matchOptimisticB.h I have attached screen shots showing what I get for "mob.h", "maob.h" and "matob.h". The latter works, but the other two have the correct hit so far down that it isn't even visible.

Then try to find matchOptimistic.cc using "mob.c". It does suggested the desired file as the first hit, which is great, but the other match is puzzling and I doubt should be included.

I also included screen shots for Atom beta and Sublime Text 3 showing these same searches on my real project, which includes at least 100x as many files (too many to include in a demo). Sublime Text 3 does well with "mob.h" and "mob.c", but Atom beta struggles.

I hope the screen shots will give you some idea of how to improve things. I'm no expert, but I suspect the following two things might help:

  • If a suffix is provided then favor files that match it exactly over those that don't
  • Give very little weight to matches in the path unless the user includes "/" in the search term

atom_test.zip

@jeancroy
Copy link
Owner

jeancroy commented Dec 4, 2015

Hi @r-owen thanks for the report. There's a lot of information in here so I'll try to simplify.

This is how sublime sort for query mob.h
a11ef4b29ca5d86b10d53a34d822e4ab

Fuzzaldrin-plus sort reverse of that.

Why ?
a) we prefer case sensitive match so the snake case wins over the camelCase.
b) we prefers shorter fullpath & less directory depth.

Why not ?
a) we prefer smaller file name
b) You propose that we recognize extension and add a boost for a proper extension.

I can try to balance those. Does this simplified problem represent your issue or we should add another test case ?

@jeancroy
Copy link
Owner

jeancroy commented Dec 4, 2015

Here's what I think so far.

  • When both query and candidate contains a period.
  • Match what follow the last period.
  • Bonus proportional to both length matched and length of extension.
    • matched*matched/ext_len
  • length of extension could be computed as max of query and candidate detected extension.

make it so:
.hprefers .h to .html (matched=1, ext_len = 1 vs 4)
.ht prefers .html to .htaccess (matched=2, ext_len = 4 vs 8)

This way (sync what folow last dot) still allow some predictive ability wich we would loose in the ht case for matching suffix.

Ideally I'd score this only to candidate where I know are path.
That would be like detecting the pathseparator, BUT file in project folder do not have the path separator.
I'll have to see.

@r-owen
Copy link
Author

r-owen commented Dec 4, 2015

I think your proposed change would help a lot.

I also suggest handling path searching differently, if you can (I'm not sure what you mean by "in project folder do not have the path separator", but it sounds ominous). In particular:

If the search string contains "/" then anything to the left is only matched to path (directory) names, and anything to the right is only matched to file names. Furthermore, anything to the left is preferentially matched to a single directory name (since the user would have added more "/" otherwise). For example if the search string is "ma/mob.h" then the initial "ma" would be matched to only to path components (not file names), and prefer matches where the "m" and "a" are in the same subdirectory name (not different directories). The final "mob.h" would be matched only to file names, not directory names.

If the search string contains more than one "/" then only look for different directory names for each chunk. For example if the search string is "m/a/mob.h" then look for "m" in any directory name (at any depth except the parent dir of files), preferably at the start of a word, "a" in any sub-directory of that directory and "mob.h" only in file names.

If the search string has no "/" then severely downweight matches to path components compared to matches to file names. Thus users are expected to provide directory cues, if relevant.

@jeancroy
Copy link
Owner

jeancroy commented Dec 5, 2015

Atom have a large user base with legacy habit we have to support.
For example email handler must be able to match against email/handler.py

Moreover fuzzaldrin is used for anything fuzzy in atom. That is path of fuzzy-finder, but also autocomplete, snippet, and command palette. For that reason I really prefers to have a context free approach, detecting features in a string the same way one would use computer vision to detect an orange in a picture, round-ish orange-ish blob, rather than having any specific knowledge about parts of an orange.

So with that being said, I have to process your request of file.ext in a way to would not completely mess entries such as console.log( . And to make things worse, files in the root of your atom project will appears without any slash in them. ( I reveice ./file simply as file).

There's a bit on an explanation about how we do file/folder matching here.

I'll keep an eye on your recommendations, however what would be really useful would be example of result pairs you find in the wrong order. Test is very test driven and I try to build a corpus of what is intuitive.

I try to do what works with the least expectation & preference. You assume people like to use slash but the vim world is full of people that would like to type accmain to match /application/config/controller/main

@r-owen
Copy link
Author

r-owen commented Dec 5, 2015

It is worth seeing if your first suggestion fixes the examples I have provided. If so, that probably suffices.

For a search string that explicitly contains "/", presumably that character becomes part of the search, so I think it is practical to improve path matching in the case that the user provides explicit cues, but it doesn't help when such cues are absent.

Here is another case to consider. In screen shot "2015-12-03 at 4.18.02 PM.png" the second match of "mob.c" is build/astrometry_net/catalogs/usnob.c" where the "m" is in astrometry, and the 3rd match is similar. To my eyes these look inferior to the 4th match "build/astrometry_net/blind/matchobj.c" because the "m" is not at the beginning of a word in the 2nd and 3rd matches.

I am a bit surprised that exactly the same code is used for fuzzy file name matching and everything else you mention, since most of those cases don't have paths. But even if so, treating path separators as slightly special could still be universal, since most searches would not even have them. Or I could imagine a configurable search, where one configuration is used for path-like objects and a different configuration used for non-path-like objects. Perhaps just a list of path separators would be a valid configuration.

@jeancroy
Copy link
Owner

jeancroy commented Dec 5, 2015

Path separator are special yes, that is the exception.

The way it works is there is a zoom, first match against full path, then try to match against filename only for bonus point. If there's / in the query the zommed version match filename as well as last few folder (as many folder from end of path as folder from query)

@jeancroy
Copy link
Owner

jeancroy commented Dec 9, 2015

Hi @r-owen fuzzadlrin now pass your use case

thanks for the report.

@r-owen
Copy link
Author

r-owen commented Dec 9, 2015

Thank you very much. fuzzadlrin is a huge improvement over the current algorithm and I hope it becomes the default in the next formal release (but as long as the switch is there, I'll be happy).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants