Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve generator name suggestions a bit. #18395

Merged
merged 1 commit into from Feb 23, 2015
Merged

Improve generator name suggestions a bit. #18395

merged 1 commit into from Feb 23, 2015

Conversation

s-aida
Copy link
Contributor

@s-aida s-aida commented Jan 8, 2015

92cf133 brought this feature, which was great, yet also sometimes frustrating because it always returned three suggestions at once, and some of them are indeed a bit too far off. Examples are:

rails g foo

Could not find generator 'foo'. Maybe you meant 'job' or 'css:assets' or 'generator'

Do we really need the last two? (I even doubt the first one). For that matter, as we have a big shortcut (max value) to calculate the distance (which I think is okay for the perf sake), the distance to longer words are computed rather incorrectly, but they can be included in the suggestions after sorted. I guess that’s why model or task (4 char distant), assets (6) are behind css:assets (10) in the above example.

I don't think it was supposed to be an impeccable suggestion feature (given the introduction of #15497 (comment)), but sometimes less is more. Me thinks that likes of suggestions are appreciated typically when we can infer a solid amount of what users meant to run, and otherwise it would just end up as haphazard items.

If users try to run something like rails g xyzzy, pretty much every command available is off so it's safe to say he/she has no idea on what generator commands are. In such a case, I'd just list up generator commands, as it used to be (or run -h), instead of serving the least 'off' names.

After trying dozens of real-world typos, I think the threshold of the distance should be just 1. If the input contains more than 2 typos, users can immediately notice it themselves, or they don’t know what to type. It can be raised depending on the length of the word, but the longest is active_record:migration and even that is now split as ["active", "record", "migration"], each of them is individually tested to match.

Test cases also should be updated, but I'd like to wait until #17050 is merged.

@s-aida
Copy link
Contributor Author

s-aida commented Feb 13, 2015

I can add a missing test and add some more improvements here and there if it ever interests anyone, but this PR gets stale, so I wonder I would just reduce it all down to just use to_sentence(last_word_connector: " or ") instead of .join(" or ").

Current

Maybe you meant 'job' or 'coffee:assets' or 'generator'

My Suggestion

Maybe you meant 'job', 'coffee:assets' or 'generator'

@arthurnn
Copy link
Member

@schneems thoughts?

msg << "Maybe you meant #{ suggestions.map {|s| "'#{s}'"}.join(" or ") }\n"
msg << "Run `rails generate --help` for more options."
puts msg
unless sorted_groups.group_by(&:size).min.first > namespace.length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a confusing line of code, we're saying skip the suggestion if the longest option is less than the provided option (i.e. if longest_option < namespace.length) ? We generally prefer if's with the exception of inline logic.

I don't think it's needed, and fails in the case someone is actually looking for the longest option, and accidentally enters an extra character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this line exists to reject too short words. sorted_groups.group_by(&:size).min.first returns the length of the shortest commands available, which is 3 in the rails master.
Unless we have another algorithm degicated for short inputs to suggest only commands starting with those chars (e.g. ap -> API, apple or application), I don't see point of making a suggestion for just 1 or 2 letter inputs.

@schneems
Copy link
Member

I commented on some code. We're making a ton of assumptions here on how we want it to work and how the user will expect it to work. I think we can make some improvements but i'm not sold on this change wholesale.

What was your original use case for this change? You weren't typing rails g foo and getting upset at the results. What were you actually trying to accomplish?

For the line you referenced, it was purely copy/pasta. I think we should remove it, as it does produce an incorrect result.

return n if (n - m).abs > max
we might also consider blaming the original code in rubygems to see when/why it was introduced and fixing there. That's a really good catch, thanks for reporting.

@s-aida
Copy link
Contributor Author

s-aida commented Feb 18, 2015

@schneems Thanks for reviewing. I really appreciate your extra effort.

You're entirely correct that I wasn't motivated to make this PR by the real typo. Rather, was just investigating the generators, and happened to feel that the suggestions might not always include good ones. While still believing in what I want to bring, not sure this PR is so convincing considering how little feedback it has gathered. And I'm totally fine to listen to what the original contributor thinks of.

So I think I will withdraw a big change, but reduce this PR to remove only the line you agreed as well as using to_sentence(last_word_connector: " or ").

@schneems
Copy link
Member

I'm 👍 on removing that line (and also

as it's the only place that references max)

I'm 👍 on the to_sentence change.

Why don't you rebase this commit and make those two changes, and I'm good merging this in. In a separate PR I would be willing to consider some kind of a max bounds but i'm thinking higher, like 3 or 4, though honestly i'm just pulling magic numbers from the air. I can see the benefit of saying that edit distance is so far off that we can't guess, but since we're so uncertain list out all the options. For the sake of getting things in, in a timely fashion it would be best to split that out into a separate PR (if you're still interested), you can @ mention me and i'll be a bit quicker on that review.

@samphippen do you have any thoughts on the value of a max edit distance?

@fables-tales
Copy link

@schneems before I issue an opinion can you make sure I've understood the context:

  • suggestions are currently sorted by edit distance
  • the top n results are taken
  • some things that make no sense sometimes show up, even though the edit distance is large, because they're still the closest things
  • you want to eliminate them by thresholding the edit distance

Is that correct?

@schneems
Copy link
Member

@samphippen you got it. The idea is to be a bit more helpful if you run

rails g i hate javascript

It doesn't make sense to say "did you mean 'migrations'" and in that case we should just print all the results. However if we're pretty comfortable saying, the edit distance is low enough, then we'll give you some suggestions. The question is, what do you think "low enough" would be?

@fables-tales
Copy link

Most commonly, I swap the order of a single character because I type quickly but accurately. The edit distance for that error would be two. I'd suggest 4 (allowing for two characters to have swapped positions, or some more macro typo) and see how it feels. Test intentionally fat fingering some commands and see what happens?

@fables-tales
Copy link

@schneems one time, when it was acceptable to write a really bad search implementation, we used edit distance 3 on people's names and nobody seemed to complain (here, have a totally artificial data point, along with my more formal reckons).

@fables-tales
Copy link

@schneems it's also probably out of scope for this PR, but I'd be tempted to implement jaro-winkler, sort and select top n and see if it performs any better.

@fables-tales
Copy link

@schneems jaro winkler (for example) prefers prefix matches over non-prefix matches, which levenshtein doesn't.

@fables-tales
Copy link

@schneems maybe we can pair on rolling a jaro-winkler branch at railsconf and see if it's any good?

@schneems
Copy link
Member

@samphippen sounds good.

@shunsukeaida let's shoot for an edit distance max of 3 on the separate PR. Let me know when you removed those bad lines from the existing edit distance calculation, and used the better to_sentence.

@yuki24
Copy link
Contributor

yuki24 commented Feb 19, 2015

@schneems from my experience I can say fixed magic numbers don't always work. did_you_mean gem used to use a fixed number and I found it unhelpful specially when the misspelled word is long. So I changed it to calculate a threshold based on the length of the misspelled word. This strategy works at least better than fixed numbers. We still have to come up with a nice magic number to calculate the threshold, though.

@s-aida s-aida changed the title Improve Generator Name Suggestions (stricter, and more partially) Improve generator name suggestions a bit. Feb 21, 2015
@@ -264,7 +264,6 @@ def self.levenshtein_distance str1, str2

return m if (0 == n)
return n if (0 == m)
return n if (n - m).abs > max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 263 should also be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks.

Previously, the Levenshtein distances to most commands were wrongly
calculated due to a big shortcut. This might be included in the original
code for the performance sake, but I'm not sure that's something we should
take over accuracy in Rails.

Before
'foo' to 'assets'                  #=> 3
'foo' to 'controller'              #=> 3
'foo' to 'generator'               #=> 3
'foo' to 'helper'                  #=> 3
'foo' to 'integration_test'        #=> 3
'foo' to 'jbuilder'                #=> 3
'foo' to 'job'                     #=> 2
'foo' to 'mailer'                  #=> 3
'foo' to 'migration'               #=> 3
'foo' to 'model'                   #=> 3
'foo' to 'resource'                #=> 3
'foo' to 'resource_route'          #=> 3
'foo' to 'scaffold'                #=> 3
'foo' to 'scaffold_controller'     #=> 3
'foo' to 'task'                    #=> 4
'foo' to 'active_record:migration' #=> 3
'foo' to 'active_record:model'     #=> 3
'foo' to 'coffee:assets'           #=> 3
'foo' to 'css:assets'              #=> 3
'foo' to 'css:scaffold'            #=> 3
'foo' to 'erb:controller'          #=> 3
'foo' to 'erb:mailer'              #=> 3
'foo' to 'erb:scaffold'            #=> 3
'foo' to 'js:assets'               #=> 3
'foo' to 'scss:assets'             #=> 3
'foo' to 'scss:scaffold'           #=> 3
'foo' to 'test_unit:controller'    #=> 3
'foo' to 'test_unit:generator'     #=> 3
'foo' to 'test_unit:helper'        #=> 3
'foo' to 'test_unit:integration'   #=> 3
'foo' to 'test_unit:job'           #=> 3
'foo' to 'test_unit:mailer'        #=> 3
'foo' to 'test_unit:model'         #=> 3
'foo' to 'test_unit:plugin'        #=> 3
'foo' to 'test_unit:scaffold'      #=> 3

After
'foo' to 'assets'                  #=> 6
'foo' to 'controller'              #=> 8
'foo' to 'generator'               #=> 8
'foo' to 'helper'                  #=> 6
'foo' to 'integration_test'        #=> 15
'foo' to 'jbuilder'                #=> 8
'foo' to 'job'                     #=> 2
'foo' to 'mailer'                  #=> 6
'foo' to 'migration'               #=> 8
'foo' to 'model'                   #=> 4
'foo' to 'resource'                #=> 7
'foo' to 'resource_route'          #=> 12
'foo' to 'scaffold'                #=> 6
'foo' to 'scaffold_controller'     #=> 16
'foo' to 'task'                    #=> 4
'foo' to 'active_record:migration' #=> 21
'foo' to 'active_record:model'     #=> 17
'foo' to 'coffee:assets'           #=> 12
'foo' to 'css:assets'              #=> 10
'foo' to 'css:scaffold'            #=> 10
'foo' to 'erb:controller'          #=> 12
'foo' to 'erb:mailer'              #=> 10
'foo' to 'erb:scaffold'            #=> 10
'foo' to 'js:assets'               #=> 9
'foo' to 'scss:assets'             #=> 11
'foo' to 'scss:scaffold'           #=> 11
'foo' to 'test_unit:controller'    #=> 18
'foo' to 'test_unit:generator'     #=> 18
'foo' to 'test_unit:helper'        #=> 16
'foo' to 'test_unit:integration'   #=> 20
'foo' to 'test_unit:job'           #=> 12
'foo' to 'test_unit:mailer'        #=> 16
'foo' to 'test_unit:model'         #=> 14
'foo' to 'test_unit:plugin'        #=> 16
'foo' to 'test_unit:scaffold'      #=> 16

Besides that, the conjunction "or" of the message now appears only between
the last two suggestions.
@s-aida
Copy link
Contributor Author

s-aida commented Feb 22, 2015

@schneems I finally squashed my commits. Could you take a look at it? Also, would you still want me to submit a PR to set the max distance to 3? I think what @yuki24 says is worth considering, in that it's hard to make something satisfying everybody anyway.

I actually think we can make an isolated module for a full-fledged suggestion feature, since it can be used in more part of Rails. The first thing coming to my mind is "template is missing" messages with suggestions, but perhaps it's a long way (Rails 5.1 or later).

schneems added a commit that referenced this pull request Feb 23, 2015
…ggestions

Improve generator name suggestions a bit.
@schneems schneems merged commit 98d896e into rails:master Feb 23, 2015
@schneems
Copy link
Member

I merged your fixes in. Thanks for your help here ❤️

Also, would you still want me to submit a PR to set the max distance

I would actually defer to sam and yuki on this one. I like the idea of basing the magic number partially on input values. But again it's hard to test this without a large set of data of commonly misspelled generator names. I'm open to a PR like this provided it doesn't add too much complexity.

"template is missing" messages with suggestions, but perhaps it's a long way (Rails 5.1 or later)

That sounds pretty cool. I'm a fan of small progressive enhancements. However I don't want to go overboard. If we stick to real world pains that we've seen and fix work on them, we should be fine. New feature pull requests don't need to wait for a specific version. If you've run into the template problem before I think this could be a good enhancement. We want to be careful that we don't cause any performance bottlenecks in production, limiting the check to development should be good enough.

schneems added a commit that referenced this pull request Feb 25, 2015
Bug was discovered and discussed in #18395.
yuki24 pushed a commit to yuki24/rails that referenced this pull request Mar 2, 2015
Bug was discovered and discussed in rails#18395.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants