Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework strict path matching #24

Closed
natecraddock opened this issue Feb 23, 2023 · 13 comments
Closed

Rework strict path matching #24

natecraddock opened this issue Feb 23, 2023 · 13 comments
Labels
enhancement New feature or request
Milestone

Comments

@natecraddock
Copy link
Owner

natecraddock commented Feb 23, 2023

Strict path matching was implemented from a request (#11) and I misunderstood the initial request.

As implemented in version 0.7.0 I do think it is currently useful. But there is room for improvement as suggested in later replies in that issue. Current implementation: when a / is found the path matching is made strict to the right and requires contiguous paths in the query.

There are two good points raised in the issue

  1. Examples that currently fail

Given these files:

app/models/foo/bar/baz.rb
app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb
app/monsters/dungeon/foo/bar/baz.rb

I would expect that typing model/barbaz would filter down to:

app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb

Because, for a search input of model/barbaz, I'm expecting that barbaz should match only within one level of depth in the dir tree. foo/bar/baz.rb has "bar/baz" split across two levels.

  1. a model of "locking" and fuzziness

It's like each slash should "lock" a portion of the path to the left, but still leave the portion to the right for fuzzier matching until the next slash and so on...

Search:
> dung/bar/
  | 1 | 2 |  3 (fuzzy)...

Match:
app/monsters/dungeon/foo/bar/baz.rb
|        1          |   2   |  3 (fuzzy)...
                    ^       ^
                 "lock"  "lock"

I've thought about this for a while and I believe I have a good solution

Solution

  • Enable strict path matching when a query token contains a / anywhere.
  • Strict path matching is defined as: each segment of a query must match within a single path segment, but the segments are not required to be contigous
  • the already existing multi-token nature of zf enables the suggested model of "locking" and fuzziness

Here are some examples, all using our lovely test strings

app/models/foo/bar/baz.rb
app/models/foo/bar-baz.rb
app/models/foo-bar-baz.rb
app/monsters/dungeon/foo/bar/baz.rb

Example 1

query: app/mod/fo/ba/baz
       |1 |2  |3 |4 |5  |

app/models/foo/bar/baz.rb
|1 |2     |3  |4  |5     | (pass)

app/monsters/dungeon/foo/bar/baz.rb
|1 |        (fail: 'mod' not found in single segment)

Example 2

query: model/barbaz
       | 1  | 2   |

app/models/foo/bar-baz.rb
   | 1    |   | 2       | (match)

app/models/foo-bar-baz.rb
   | 1    |    |2       | (match)

app/models/foo/bar/baz.rb
   | 1    |       (fail: 'barbaz' not found in single segment)

Example 3

query: mod/ foo
       |1 | |1'|

(matches the first 3 test strings)

app/monsters/dungeon/foo/bar/baz.rb)
            (fail: 'mod' not found in single segment)

With those examples in mind, I think this covers all the possible use cases, and makes things quite useful!

To recap: A query token containing any / will be used for strict path matching. The path segments are not required to be contiguous.

Currently a query of mod/ would match both mo/d/ and /mod/, but with these changes only /mod/ would be selected.

Open questions

It's like each slash should "lock" a portion of the path to the left, but still leave the portion to the right for fuzzier matching until the next slash and so on...

I like this idea, but zf already natively supports this fuzziness by inserting a space and adding a new query token. I'm not against adding this, but I think adding a space keeps things more consistent.

@Pistos
Copy link

Pistos commented Feb 23, 2023

Based on all your examples, I think that is exactly what I want.

Thank you for giving this feature request so much thought and attention. I look forward to using it once it's implemented. I already have a use in mind, in another open source project. I'll show you when integration is done.

@natecraddock
Copy link
Owner Author

Thanks for reviewing! I have set the deadline for version 0.8.0 for April 1, but I think I can get this feature done well before then. I'll share here once I have it working. I already think I know the changes needed and it shouldn't be too complex actually.

natecraddock added a commit that referenced this issue Mar 2, 2023
Now the strict path matching no longer requires contiguous patch
segments to match, and a segment before the first / will also be used
for strict path matching.

See #24
@natecraddock
Copy link
Owner Author

natecraddock commented Mar 2, 2023

I think I have everything working! See the rework-strict-path-match branch for the latest changes. Please don't feel rushed or obligated, but I would appreciate some feedback :)

The behavior is now

  • if a query token contains a / that token will be used for strict path matching
  • the path segments of the token (separated by / characters) all must be found in path segments of the candidate string
  • the segments are no longer required to be contiguous

In other words, exactly as described above.

For @ratfactor's idea to leave the right part of the token "fuzzy", I didn't implement this for a few reasons.

  • it would have complicated the code to special-case the last path segment
  • adding a new token (with a space) easily allows for the fuzzy behavior
  • Example 1 in the description of this issue would match an additional undesired path

I'm still open to this "fuzzy" right path if we can find a way to do it. But I'm not sure if it's possible. And I think adding a space is a good enough solution.

@ratfactor
Copy link
Contributor

I did my best to break it (I even enlisted my wife, who is really good at breaking software). Here's my attempt at an increasingly pathological test:

$ cat paths.txt
oatmeal/sugar.txt
oatmeal/sugar_txt
oatmeal/sug/ar
oat/meal
oat/meal/sug/ar
oat/meal/sug/ar/sugar
oat/meal/sug/ar/sugar.txt
oat/meal/sug/ar/sugar.js
oatmeal/sugar/sugar.txt
oatmeal/sugar/snakes.txt
oatmeal/sugar/skeletons.txt
oatmeal/brown_sugar.txt
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar/sugar_brown.js
oatmeal/granulated_sugar.txt
oatmeal/raisins/sugar.js
oatmeal/raisins/helicopter/sugar.js
oatmeal/raised/sugar.js
raised sugar helicopter
sugar raised helicopter
helicopter sugar raised
helicopter raised sugar
raised helicopter sugar
sugar helicopter raised
oat/rise/js
oat/rise/sug.js
Helicopter Sugar.js
oats raised suggestively jam slowly
oats raised heliocentric slowly grape
oats resist helicopter signs
oatmeal BrOwn SUGar
JavaScript text .js .txt
JavaScript text .js .txt sugar
oat sugar.js
oat.txt sugar.js
oat.txt /sugar.js
oat.txt /sugar.js /sugar.js
//////oat.txt /sugar.js /sugar.js
//////oat.txt
///../..//oat.txt
///../..//oat.txt/sugar.js
///../..//oat.txt/sug ar.js
///../..//oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar .js
oat/
oat/ meal
oat /meal /sug/
oat /meal/ sug/ar//
1234567890/oat

👍 I agree that zf's existing support for separate fuzzy tokens does what I need. There's no actual need for a "right/left" matching like I was picturing. This already does what I would expect.

As long as all portions of a path are true, the match is good regardless of the order I enter them. The results of this weird search are fine:

> oat/sugar sugar/sugar meal/sugar
oatmeal/sugar/sugar.txt
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/brown_sugar.js

And if I want to enforce the right/left order of the path portions, I just need to keep the path together.

With a space:

> oat/sugar brown
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar.txt
oatmeal/brown_sugar/sugar.js
oatmeal/brown_sugar/sugar_brown.js

No space:

> oat/sugar/brown
oatmeal/brown_sugar/brown.js
oatmeal/brown_sugar/brown_sugar.js
oatmeal/brown_sugar/sugar_brown.js

The matching of the last part, "brown," is still fuzzy, but the order is enforced. That's all I need!

A cool bonus feature that is also possibly unrelated from this path matching specifically, but something I noticed while testing: A search token consisting of some number of slashes is a way to request a minimum path depth.

Example: The three slashes here filtered out any paths that weren't at least three directories deep:

> oat/sugar ///
//////oat.txt /sugar.js /sugar.js
///../..//oat.txt/sugar.js
///../..//oat.txt/sug ar.js
/ //// /oat.txt/ sug ar.js
/ //// /oat.txt/ sug ar .js
///../..//oat.txt/ sug ar.js

I thought that was neat.

In all, it works great and I wasn't able to break it. As before, I'm really excited about this feature and I've already switched to the new version for personal use. 🎉

@Pistos
Copy link

Pistos commented Mar 5, 2023

This is good! But, and I apologize 😅, it's not quite where I'd like it to be.

Given that this path exists:

app/models/foo/bar/baz.rb

It doesn't match with this search string:

> mod/baz.rb
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb

My thinking is: I know I have a model named baz, so I type mod for the models tree, and baz.rb to get at the baz model written in Ruby, which is at some unknown depth within the models/ tree.

@natecraddock
Copy link
Owner Author

@Pistos this really should work. I haven't looked at the code yet, but I think I know where this bug is coming from. I'll try to fix it soon

@ratfactor
Copy link
Contributor

How did I not run into this? I am so bad at testing software. 🤣

@Pistos
Copy link

Pistos commented Mar 5, 2023

@natecraddock Yes, I had a feeling this is a bug rather than intentional design. As a clue, my first guess would be that there's an edge case having to do with a full segment match (bar.rb).

Some more data:

> mod/ba
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb
./app/models/foo/bar/baz.rb

which is highlighting the ba of /bar. So maybe you have some code taking baz character by character, and, when it processes ba, it finds /bar, and then when a z is added, it thinks ba+z doesn't match ba+r, and so it filters out that whole subtree.

@natecraddock
Copy link
Owner Author

natecraddock commented Mar 5, 2023

I was right, it was a simple bug to fix! I have fixed it (and pushed to the branch) and now ./app/models/foo/bar/baz.rb matches mod/baz.rb

Something I noticed that is related, but a separate issue I'll address later

> mod/baz.rb
./app/models/foo-bar-baz.rb
./app/models/foo/bar-baz.rb
./app/models/foo/bar/baz.rb

In this case I think we should expect mod/baz.rb to rank ./app/models/foo/bar/baz.rb the hightest because baz.rb has the greatest path segment coverage on that string. I'll adjust the ranking for this later, but at least now the line does match.

@natecraddock
Copy link
Owner Author

@ratfactor thanks for your thorough testing. I missed this bug myself, so no worries!

@Pistos
Copy link

Pistos commented Mar 5, 2023

I'll test some more later, but it seems all good so far! Good work. 👍

I have some more feature requests, but I'll make them in separate github issues.

@natecraddock
Copy link
Owner Author

In this case I think we should expect mod/baz.rb to rank ./app/models/foo/bar/baz.rb the hightest because baz.rb has the greatest path segment coverage on that string. I'll adjust the ranking for this later, but at least now the line does match.

Issue to track this: #26

I'll test some more later, but it seems all good so far!

I'll go ahead and merge the brach, but I'll leave the issue open for the next couple days

@natecraddock
Copy link
Owner Author

I added a bunch of tests inspired by the test cases in this issue. It should (hopefully) prevent any regressions in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants