-
Notifications
You must be signed in to change notification settings - Fork 11.1k
base.Ascii: Add methods indexOfIgnoreCase, containsIgnoreCase, startsWithIgnoreCase and endsWithIgnoreCase #3023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
|
I signed it! |
|
CLAs look good, thanks! |
| * @since 16.0 // TODO: which number? | ||
| */ | ||
| public static int indexOfIgnoreCase(CharSequence sequence, CharSequence subSequence) { | ||
| return toLowerCase(sequence).indexOf(toLowerCase(subSequence)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this implementation may be too slow: If the length of sequence is n, the length of subSequence is m, then the time complexity is always O(n+m) or higher, because all the chars need to be converted into a string first; if we can change a letter to lower case one by one, then the execution time may be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I didn't initially think of this, but I'll definitely have a look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fixed that one by implementing the toLowerCase() call in the for-loop itself
| } | ||
| // if the prefix is longer than the sequence itself or from the offset, it is impossible | ||
| // for the sequence to contain the prefix | ||
| if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just put if (prefix.length() > seq.length() - offset) unless someone maliciously passed a negative number for offset parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it like that initially but in the case of seq as empty string and prefix as "x" it will result in a negative prefix, which lead me to this implementation. Should I check this in another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then this is totally fine.
| * between {@code 'a'} and {@code 'z'} or {@code 'A'} and {@code 'Z'} inclusive, or | ||
| * {@code -1}, if no occurence is found. | ||
| * | ||
| * @since 16.0 // TODO: which number? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change these to @since NEXT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know! I will update them.
| if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) { | ||
| return false; | ||
| } | ||
| return indexOfIgnoreCase(seq, prefix) == offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you expect startsWithIgnoreCase("aaa", "a", 1) to return? I'd vote for true, but this is impossible to implement without a indexOfIgnoreCase(seq, prefix, offset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is
- logically flawed
- Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing! I fixed it in the latest commit.
| * | ||
| * @since 16.0 // TODO: which number? | ||
| */ | ||
| public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix, int offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A negative offset should probably be forbidden.
| * @since 16.0 // TODO: which number? | ||
| */ | ||
| public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) { | ||
| return startsWithIgnoreCase(seq, prefix, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much less efficient than a direct check, e.g., for startsWithIgnoreCase("aaaaaaaaaaaaaaaaa", "b").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the last one is fixed, then it is fine here.
| if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) { | ||
| return false; | ||
| } | ||
| return indexOfIgnoreCase(seq, prefix) == offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is
- logically flawed
- Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)
| * @since 16.0 // TODO: which number? | ||
| */ | ||
| public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) { | ||
| return startsWithIgnoreCase(seq, prefix, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the last one is fixed, then it is fine here.
| * @since NEXT | ||
| */ | ||
| public static int indexOfIgnoreCase(String string, String subString, int fromIndex) { | ||
| return indexOfIgnoreCase(string.toCharArray(), 0, string.length(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toCharArray() involves a linear time copy of array. It is slow, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I totally forgot about that. I fixed it now.
| if (subSequence.length() > length) { | ||
| return false; | ||
| } | ||
| return indexOfIgnoreCase(sequence.toString(), subSequence.toString()) > -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still should not toString
| if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) { | ||
| return false; | ||
| } | ||
| return indexOfIgnoreCase(seq.toString(), prefix.toString()) == offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is
- Logically flawed
- Too slow (A single for loop can fix it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounded like a good idea at first but now I agree - you guys are 100% correct. Will fix that.
| for (int i = sourceOffset + fromIndex; i <= max; i++) { | ||
| /* Look for first character. */ | ||
| if (toLowerCase(source[i]) != first) { | ||
| while (++i <= max && toLowerCase(source[i]) != first) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change it to a do-while with increment of i in the loop? that way the code look cleaner.
| return fromIndex; | ||
| } | ||
|
|
||
| char first = toLowerCase(target[targetOffset]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just use getAlphaIndex to make things efficient
| * @param targetCount count of the target string. | ||
| * @param fromIndex the index to begin searching from. | ||
| */ | ||
| private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can copy the whole method, changing two char array arguments to char sequence arguments so that you can deal with strings and string builders directly without toCharArray calls that duplicate all the char contents.
| */ | ||
| private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount, | ||
| char[] target, int targetOffset, int targetCount, | ||
| private static int indexOfIgnoreCase(String source, int sourceOffset, int sourceCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Charsequense instead
| * | ||
| * @since NEXT | ||
| */ | ||
| public static int indexOfIgnoreCase(String string, String subString, int fromIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String ->CharSequence
| if (toLowerCase(source.charAt(i)) != first) { | ||
| do { | ||
| ++i; | ||
| } while(i <= max && toLowerCase(source.charAt(i)) != first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace all toLowerCase calls to getAlphaIndex calls for efficiency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was quite some work but now I changed the implementation to use getAlphaIndex instead of toLowerCase. Please have a look again. If everything is fine, I'll squash the commits.
|
Thanks for this great pull request! |
…se and endsWithIgnoreCase to class base.Ascii
|
You're welcome 😃 thanks a lot, too, for the very thorough and good code reviews of you all, they were very constructive and helpful. |
|
Hey there, just wanted to ask if there's any news concerning this pull request? |
|
@martinfrancois I'm not on the Guava team myself, so I can't truly speak on their behalf, but as someone who's contributed to the project for a couple of years now, I think it's worth letting you know that they can be very slow addressing new issues and PRs, which I think happens because they're busy as a whole dealing with Google-internal priorities. So if they take forever to respond (which seems likely at this stage, sadly), it absolutely won't be because you did something wrong; they're just apparently very, very busy with their day-to-day jobs, and someone from the team will eventually respond to this PR. If you don't get a response as quickly as you'd like, writing a reminder message like the one you wrote here every now and then should encourage a more timely response, even if they're a bit too busy to review the PR properly. I hope this helps! :) |
|
There are various reasons that we haven't done a good job about replying to issues. Busyness is part of it, but we should do better. It looks like our current triage process misses pull requests. I'll try to get that fixed. |
|
Thanks for accepting my pull request! I wanted to ask when the changes of it are going to get synced out? I haven't seen it on the master yet. |
|
Any news on this PR? |
|
I'm sorry that we are so slow on this. Fundamentally we appreciate the request, but we have a bit of work to do on our end that we haven't been able to prioritize yet. FYI, I do see a pretty significant amount of Google code writing stuff like |
|
@kevinb9n thanks for the update on this! It has now been almost half a year, are there any internal processes it still needs to go through or did the team just not get around to merging it into the codebase yet? |
|
I want to quickly note here that when an issue or PR is marked "triaged" and assigned to someone, that doesn't necessarily mean it's accepted. Triaged just means that someone has looked at it and thinks it's well-formed and worth thinking about more, and assigned it to someone who they think should best be able to consider it. That said, Kevin has noted that this seems like something that there's a good chance we'll want to do. Another note: API additions do generally need to go through an API review process we have internally, which tends to involve writing up a doc exploring pros and cons of an API, basic statistics for how often existing code in our codebase would want to use the API(s), options for how to address the problem, etc., followed by scheduling it to be discussed with a group of reviewers. That's obviously a bit of work, so even when you've written the code, it can be hard to fit it in with everything else we're doing. That's not to say we can't do better--we can and should do better, and are working on improving our processes around bugs/PRs--but just to give some idea of why things might take so long sometimes. |
|
Thanks a lot @cgdecker! I tried to find a description of your process but was unable to. Your description made it very clear to me. Now I understand what goes into it and it explains the long process, but I do appreciate your attention to detail and sense of quality, it gives me an even greater confidence in the high standards of the guava code base. |
|
Hi everyone, |
|
François Martin has personally and elaborately explained to me this PR and it's history. Actually he was so kind to copy over the implementation into a project we are working on together. I vote for merging this PR. |
Implements the changes proposed in issue #3011
I was not sure in which version this is going to be pulled in - so I left
// TODOcomments at the@sincetags. Feel free to tell me which number to change it to 😃