New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve generator name suggestions a bit. #18395
Improve generator name suggestions a bit. #18395
Conversation
I can add a missing test and add some more improvements here and there if it ever interests anyone, but this PR gets stale, so I wonder I would just reduce it all down to just use Current
My Suggestion
|
@schneems thoughts? |
msg << "Maybe you meant #{ suggestions.map {|s| "'#{s}'"}.join(" or ") }\n" | ||
msg << "Run `rails generate --help` for more options." | ||
puts msg | ||
unless sorted_groups.group_by(&:size).min.first > namespace.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a confusing line of code, we're saying skip the suggestion if the longest option is less than the provided option (i.e. if longest_option < namespace.length
) ? We generally prefer if's with the exception of inline logic.
I don't think it's needed, and fails in the case someone is actually looking for the longest option, and accidentally enters an extra character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this line exists to reject too short words. sorted_groups.group_by(&:size).min.first
returns the length of the shortest commands available, which is 3
in the rails master.
Unless we have another algorithm degicated for short inputs to suggest only commands starting with those chars (e.g. ap -> API
, apple
or application
), I don't see point of making a suggestion for just 1 or 2 letter inputs.
I commented on some code. We're making a ton of assumptions here on how we want it to work and how the user will expect it to work. I think we can make some improvements but i'm not sold on this change wholesale. What was your original use case for this change? You weren't typing For the line you referenced, it was purely copy/pasta. I think we should remove it, as it does produce an incorrect result. rails/railties/lib/rails/generators.rb Line 267 in 7972898
|
@schneems Thanks for reviewing. I really appreciate your extra effort. You're entirely correct that I wasn't motivated to make this PR by the real typo. Rather, was just investigating the generators, and happened to feel that the suggestions might not always include good ones. While still believing in what I want to bring, not sure this PR is so convincing considering how little feedback it has gathered. And I'm totally fine to listen to what the original contributor thinks of. So I think I will withdraw a big change, but reduce this PR to remove only the line you agreed as well as using |
I'm 👍 on removing that line (and also rails/railties/lib/rails/generators.rb Line 263 in 7972898
max )
I'm 👍 on the Why don't you rebase this commit and make those two changes, and I'm good merging this in. In a separate PR I would be willing to consider some kind of a max bounds but i'm thinking higher, like 3 or 4, though honestly i'm just pulling magic numbers from the air. I can see the benefit of saying that edit distance is so far off that we can't guess, but since we're so uncertain list out all the options. For the sake of getting things in, in a timely fashion it would be best to split that out into a separate PR (if you're still interested), you can @ mention me and i'll be a bit quicker on that review. @samphippen do you have any thoughts on the value of a max edit distance? |
@schneems before I issue an opinion can you make sure I've understood the context:
Is that correct? |
@samphippen you got it. The idea is to be a bit more helpful if you run
It doesn't make sense to say "did you mean 'migrations'" and in that case we should just print all the results. However if we're pretty comfortable saying, the edit distance is low enough, then we'll give you some suggestions. The question is, what do you think "low enough" would be? |
Most commonly, I swap the order of a single character because I type quickly but accurately. The edit distance for that error would be two. I'd suggest 4 (allowing for two characters to have swapped positions, or some more macro typo) and see how it feels. Test intentionally fat fingering some commands and see what happens? |
@schneems one time, when it was acceptable to write a really bad search implementation, we used edit distance 3 on people's names and nobody seemed to complain (here, have a totally artificial data point, along with my more formal reckons). |
@schneems it's also probably out of scope for this PR, but I'd be tempted to implement jaro-winkler, sort and select top n and see if it performs any better. |
@schneems jaro winkler (for example) prefers prefix matches over non-prefix matches, which levenshtein doesn't. |
@schneems maybe we can pair on rolling a jaro-winkler branch at railsconf and see if it's any good? |
@samphippen sounds good. @shunsukeaida let's shoot for an edit distance max of 3 on the separate PR. Let me know when you removed those bad lines from the existing edit distance calculation, and used the better |
@schneems from my experience I can say fixed magic numbers don't always work. did_you_mean gem used to use a fixed number and I found it unhelpful specially when the misspelled word is long. So I changed it to calculate a threshold based on the length of the misspelled word. This strategy works at least better than fixed numbers. We still have to come up with a nice magic number to calculate the threshold, though. |
@@ -264,7 +264,6 @@ def self.levenshtein_distance str1, str2 | |||
|
|||
return m if (0 == n) | |||
return n if (0 == m) | |||
return n if (n - m).abs > max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 263 should also be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks.
Previously, the Levenshtein distances to most commands were wrongly calculated due to a big shortcut. This might be included in the original code for the performance sake, but I'm not sure that's something we should take over accuracy in Rails. Before 'foo' to 'assets' #=> 3 'foo' to 'controller' #=> 3 'foo' to 'generator' #=> 3 'foo' to 'helper' #=> 3 'foo' to 'integration_test' #=> 3 'foo' to 'jbuilder' #=> 3 'foo' to 'job' #=> 2 'foo' to 'mailer' #=> 3 'foo' to 'migration' #=> 3 'foo' to 'model' #=> 3 'foo' to 'resource' #=> 3 'foo' to 'resource_route' #=> 3 'foo' to 'scaffold' #=> 3 'foo' to 'scaffold_controller' #=> 3 'foo' to 'task' #=> 4 'foo' to 'active_record:migration' #=> 3 'foo' to 'active_record:model' #=> 3 'foo' to 'coffee:assets' #=> 3 'foo' to 'css:assets' #=> 3 'foo' to 'css:scaffold' #=> 3 'foo' to 'erb:controller' #=> 3 'foo' to 'erb:mailer' #=> 3 'foo' to 'erb:scaffold' #=> 3 'foo' to 'js:assets' #=> 3 'foo' to 'scss:assets' #=> 3 'foo' to 'scss:scaffold' #=> 3 'foo' to 'test_unit:controller' #=> 3 'foo' to 'test_unit:generator' #=> 3 'foo' to 'test_unit:helper' #=> 3 'foo' to 'test_unit:integration' #=> 3 'foo' to 'test_unit:job' #=> 3 'foo' to 'test_unit:mailer' #=> 3 'foo' to 'test_unit:model' #=> 3 'foo' to 'test_unit:plugin' #=> 3 'foo' to 'test_unit:scaffold' #=> 3 After 'foo' to 'assets' #=> 6 'foo' to 'controller' #=> 8 'foo' to 'generator' #=> 8 'foo' to 'helper' #=> 6 'foo' to 'integration_test' #=> 15 'foo' to 'jbuilder' #=> 8 'foo' to 'job' #=> 2 'foo' to 'mailer' #=> 6 'foo' to 'migration' #=> 8 'foo' to 'model' #=> 4 'foo' to 'resource' #=> 7 'foo' to 'resource_route' #=> 12 'foo' to 'scaffold' #=> 6 'foo' to 'scaffold_controller' #=> 16 'foo' to 'task' #=> 4 'foo' to 'active_record:migration' #=> 21 'foo' to 'active_record:model' #=> 17 'foo' to 'coffee:assets' #=> 12 'foo' to 'css:assets' #=> 10 'foo' to 'css:scaffold' #=> 10 'foo' to 'erb:controller' #=> 12 'foo' to 'erb:mailer' #=> 10 'foo' to 'erb:scaffold' #=> 10 'foo' to 'js:assets' #=> 9 'foo' to 'scss:assets' #=> 11 'foo' to 'scss:scaffold' #=> 11 'foo' to 'test_unit:controller' #=> 18 'foo' to 'test_unit:generator' #=> 18 'foo' to 'test_unit:helper' #=> 16 'foo' to 'test_unit:integration' #=> 20 'foo' to 'test_unit:job' #=> 12 'foo' to 'test_unit:mailer' #=> 16 'foo' to 'test_unit:model' #=> 14 'foo' to 'test_unit:plugin' #=> 16 'foo' to 'test_unit:scaffold' #=> 16 Besides that, the conjunction "or" of the message now appears only between the last two suggestions.
@schneems I finally squashed my commits. Could you take a look at it? Also, would you still want me to submit a PR to set the max distance to I actually think we can make an isolated module for a full-fledged suggestion feature, since it can be used in more part of Rails. The first thing coming to my mind is "template is missing" messages with suggestions, but perhaps it's a long way (Rails 5.1 or later). |
…ggestions Improve generator name suggestions a bit.
I merged your fixes in. Thanks for your help here ❤️
I would actually defer to sam and yuki on this one. I like the idea of basing the magic number partially on input values. But again it's hard to test this without a large set of data of commonly misspelled generator names. I'm open to a PR like this provided it doesn't add too much complexity.
That sounds pretty cool. I'm a fan of small progressive enhancements. However I don't want to go overboard. If we stick to real world pains that we've seen and fix work on them, we should be fine. New feature pull requests don't need to wait for a specific version. If you've run into the template problem before I think this could be a good enhancement. We want to be careful that we don't cause any performance bottlenecks in production, limiting the check to development should be good enough. |
Bug was discovered and discussed in #18395.
Bug was discovered and discussed in rails#18395.
92cf133 brought this feature, which was great, yet also sometimes frustrating because it always returned three suggestions at once, and some of them are indeed a bit too far off. Examples are:
rails g foo
Do we really need the last two? (I even doubt the first one). For that matter, as we have a big shortcut (max value) to calculate the distance (which I think is okay for the perf sake), the distance to longer words are computed rather incorrectly, but they can be included in the suggestions after sorted. I guess that’s why
model
ortask
(4 char distant),assets
(6) are behindcss:assets
(10) in the above example.I don't think it was supposed to be an impeccable suggestion feature (given the introduction of #15497 (comment)), but sometimes less is more. Me thinks that likes of suggestions are appreciated typically when we can infer a solid amount of what users meant to run, and otherwise it would just end up as haphazard items.
If users try to run something like rails g xyzzy, pretty much every command available is off so it's safe to say he/she has no idea on what generator commands are. In such a case, I'd just list up generator commands, as it used to be (or run -h), instead of serving the least 'off' names.
After trying dozens of real-world typos, I think the threshold of the distance should be just
1
. If the input contains more than 2 typos, users can immediately notice it themselves, or they don’t know what to type. It can be raised depending on the length of the word, but the longest isactive_record:migration
and even that is now split as["active", "record", "migration"]
, each of them is individually tested to match.Test cases also should be updated, but I'd like to wait until #17050 is merged.