base.Ascii: Add methods indexOfIgnoreCase, containsIgnoreCase, startsWithIgnoreCase and endsWithIgnoreCase #3023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

martinfrancois wants to merge 1 commit into google:master from martinfrancois:master

martinfrancois commented Jan 3, 2018

Implements the changes proposed in issue #3011
I was not sure in which version this is going to be pulled in - so I left // TODO comments at the @since tags. Feel free to tell me which number to change it to 😃

Collaborator

googlebot commented Jan 3, 2018

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

googlebot added the cla: no label

Author

martinfrancois commented Jan 3, 2018

I signed it!

Collaborator

googlebot commented Jan 3, 2018

CLAs look good, thanks!

googlebot added cla: yes and removed cla: no labels

kluever requested a review from kevinb9n

January 4, 2018 00:07

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 * @since 16.0 // TODO: which number?

                 */

                public static int indexOfIgnoreCase(CharSequence sequence, CharSequence subSequence) {

                  return toLowerCase(sequence).indexOf(toLowerCase(subSequence));

liach Jan 4, 2018

I fear this implementation may be too slow: If the length of sequence is n, the length of subSequence is m, then the time complexity is always O(n+m) or higher, because all the chars need to be converted into a string first; if we can change a letter to lower case one by one, then the execution time may be faster.

Author

martinfrancois Jan 4, 2018

Good point. I didn't initially think of this, but I'll definitely have a look into it.

Author

martinfrancois Jan 4, 2018

Just fixed that one by implementing the toLowerCase() call in the for-loop itself

guava/src/com/google/common/base/Ascii.java Outdated

    
                  }

                  // if the prefix is longer than the sequence itself or from the offset, it is impossible

                  // for the sequence to contain the prefix

                  if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {

liach Jan 4, 2018

You can just put if (prefix.length() > seq.length() - offset) unless someone maliciously passed a negative number for offset parameter.

Author

martinfrancois Jan 4, 2018

I had it like that initially but in the case of seq as empty string and prefix as "x" it will result in a negative prefix, which lead me to this implementation. Should I check this in another way?

liach Jan 4, 2018

then this is totally fine.

guava/src/com/google/common/base/Ascii.java Outdated

    
                 * between {@code 'a'} and {@code 'z'} or {@code 'A'} and {@code 'Z'} inclusive, or

                 * {@code -1}, if no occurence is found.

                 *

                 * @since 16.0 // TODO: which number?

liach Jan 4, 2018

You can change these to @since NEXT

Author

martinfrancois Jan 4, 2018

Thanks for letting me know! I will update them.

Maaartinus reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                  if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {

                    return false;

                  }

                  return indexOfIgnoreCase(seq, prefix) == offset;

Maaartinus Jan 4, 2018

What do you expect startsWithIgnoreCase("aaa", "a", 1) to return? I'd vote for true, but this is impossible to implement without a indexOfIgnoreCase(seq, prefix, offset).

liach Jan 4, 2018

This one is

logically flawed
Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)

Author

martinfrancois Jan 4, 2018

Thanks for noticing! I fixed it in the latest commit.

Maaartinus reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 *

                 * @since 16.0 // TODO: which number?

                 */

                public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix, int offset) {

Maaartinus Jan 4, 2018

A negative offset should probably be forbidden.

Maaartinus reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java

    
                 * @since 16.0 // TODO: which number?

                 */

                public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) {

                  return startsWithIgnoreCase(seq, prefix, 0);

Maaartinus Jan 4, 2018

This is much less efficient than a direct check, e.g., for startsWithIgnoreCase("aaaaaaaaaaaaaaaaa", "b").

liach Jan 4, 2018

If the last one is fixed, then it is fine here.

liach suggested changes

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                  if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {

                    return false;

                  }

                  return indexOfIgnoreCase(seq, prefix) == offset;

liach Jan 4, 2018

This one is

logically flawed
Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)

guava/src/com/google/common/base/Ascii.java

    
                 * @since 16.0 // TODO: which number?

                 */

                public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) {

                  return startsWithIgnoreCase(seq, prefix, 0);

liach Jan 4, 2018

If the last one is fixed, then it is fine here.

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 * @since NEXT

                 */

                public static int indexOfIgnoreCase(String string, String subString, int fromIndex) {

                  return indexOfIgnoreCase(string.toCharArray(), 0, string.length(),

liach Jan 4, 2018

toCharArray() involves a linear time copy of array. It is slow, too.

Author

martinfrancois Jan 4, 2018

Thanks, I totally forgot about that. I fixed it now.

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                  if (subSequence.length() > length) {

                    return false;

                  }

                  return indexOfIgnoreCase(sequence.toString(), subSequence.toString()) > -1;

liach Jan 4, 2018

Still should not toString

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                  if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {

                    return false;

                  }

                  return indexOfIgnoreCase(seq.toString(), prefix.toString()) == offset;

liach Jan 4, 2018

This one is

Logically flawed
Too slow (A single for loop can fix it)

Author

martinfrancois Jan 4, 2018

Sounded like a good idea at first but now I agree - you guys are 100% correct. Will fix that.

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                  for (int i = sourceOffset + fromIndex; i <= max; i++) {

                    /* Look for first character. */

                    if (toLowerCase(source[i]) != first) {

                      while (++i <= max && toLowerCase(source[i]) != first) {

liach Jan 4, 2018

Can you change it to a do-while with increment of i in the loop? that way the code look cleaner.

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                    return fromIndex;

                  }

                  char first = toLowerCase(target[targetOffset]);

liach Jan 4, 2018

You should just use getAlphaIndex to make things efficient

liach suggested changes

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 * @param targetCount  count of the target string.

                 * @param fromIndex    the index to begin searching from.

                 */

                private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount,

liach Jan 4, 2018

You can copy the whole method, changing two char array arguments to char sequence arguments so that you can deal with strings and string builders directly without toCharArray calls that duplicate all the char contents.

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 */

                private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount,

                                                     char[] target, int targetOffset, int targetCount,

                private static int indexOfIgnoreCase(String source, int sourceOffset, int sourceCount,

liach Jan 4, 2018

Charsequense instead

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                 *

                 * @since NEXT

                 */

                public static int indexOfIgnoreCase(String string, String subString, int fromIndex) {

liach Jan 4, 2018

String ->CharSequence

liach reviewed

View reviewed changes

guava/src/com/google/common/base/Ascii.java Outdated

    
                    if (toLowerCase(source.charAt(i)) != first) {

                      do {

                        ++i;

                      } while(i <= max && toLowerCase(source.charAt(i)) != first);

liach Jan 4, 2018

Can you replace all toLowerCase calls to getAlphaIndex calls for efficiency?

Author

martinfrancois Jan 4, 2018

will do!

Author

martinfrancois Jan 4, 2018

It was quite some work but now I changed the implementation to use getAlphaIndex instead of toLowerCase. Please have a look again. If everything is fine, I'll squash the commits.

liach approved these changes

View reviewed changes

liach commented Jan 5, 2018

Thanks for this great pull request!


          Add methods indexOfIgnoreCase, containsIgnoreCase, startsWithIgnoreCa…

c9dfbab

…se and endsWithIgnoreCase to class base.Ascii

martinfrancois force-pushed the master branch from 9b64912 to c9dfbab Compare

January 5, 2018 00:57

Author

martinfrancois commented Jan 5, 2018 •

edited

Loading

You're welcome 😃 thanks a lot, too, for the very thorough and good code reviews of you all, they were very constructive and helpful.
I now squashed the commits. Maybe one of you Googlers can run the benchmark (I hope I did it right)? I don't think there is any possibility to run them myself. Would be interesting to see the performance differences. Besides that, from my side it's good to merge.

Author

martinfrancois commented Jan 15, 2018

Hey there, just wanted to ask if there's any news concerning this pull request?

Contributor

jbduncan commented Jan 15, 2018 •

edited

Loading

@martinfrancois I'm not on the Guava team myself, so I can't truly speak on their behalf, but as someone who's contributed to the project for a couple of years now, I think it's worth letting you know that they can be very slow addressing new issues and PRs, which I think happens because they're busy as a whole dealing with Google-internal priorities.

So if they take forever to respond (which seems likely at this stage, sadly), it absolutely won't be because you did something wrong; they're just apparently very, very busy with their day-to-day jobs, and someone from the team will eventually respond to this PR.

If you don't get a response as quickly as you'd like, writing a reminder message like the one you wrote here every now and then should encourage a more timely response, even if they're a bit too busy to review the PR properly.

I hope this helps! :)

Member

cpovirk commented Jan 16, 2018

There are various reasons that we haven't done a good job about replying to issues. Busyness is part of it, but we should do better. It looks like our current triage process misses pull requests. I'll try to get that fixed.

Author

martinfrancois commented Jan 18, 2018

@jbduncan thanks for the very kind words and the tips 😃 I appreciate that a lot, it's good to know that it's not an ununsual case.

@cpovirk also thanks for taking it serious and trying to get pull requests integrated into your triage process 👍

ronshapiro assigned kevinb9n

ronshapiro added status=triaged package=base labels

martinfrancois mentioned this pull request

Add containsIgnoreCase to StringUtils #3011

Open

Author

martinfrancois commented Feb 10, 2018

Thanks for accepting my pull request! I wanted to ask when the changes of it are going to get synced out? I haven't seen it on the master yet.

Author

martinfrancois commented Mar 12, 2018

Any news on this PR?

Contributor

kevinb9n commented Mar 14, 2018

I'm sorry that we are so slow on this. Fundamentally we appreciate the request, but we have a bit of work to do on our end that we haven't been able to prioritize yet.

FYI, I do see a pretty significant amount of Google code writing stuff like if (s.toLowerCase().startsWith(...)) so I do think the case for adding these methods to Guava may be a pretty good one.

Author

martinfrancois commented Jun 28, 2018

@kevinb9n thanks for the update on this! It has now been almost half a year, are there any internal processes it still needs to go through or did the team just not get around to merging it into the codebase yet?
Thanks!

Member

cgdecker commented Jun 29, 2018

I want to quickly note here that when an issue or PR is marked "triaged" and assigned to someone, that doesn't necessarily mean it's accepted. Triaged just means that someone has looked at it and thinks it's well-formed and worth thinking about more, and assigned it to someone who they think should best be able to consider it. That said, Kevin has noted that this seems like something that there's a good chance we'll want to do.

Another note: API additions do generally need to go through an API review process we have internally, which tends to involve writing up a doc exploring pros and cons of an API, basic statistics for how often existing code in our codebase would want to use the API(s), options for how to address the problem, etc., followed by scheduling it to be discussed with a group of reviewers. That's obviously a bit of work, so even when you've written the code, it can be hard to fit it in with everything else we're doing. That's not to say we can't do better--we can and should do better, and are working on improving our processes around bugs/PRs--but just to give some idea of why things might take so long sometimes.

Author

martinfrancois commented Jun 29, 2018 •

edited

Loading

Thanks a lot @cgdecker! I tried to find a description of your process but was unable to. Your description made it very clear to me. Now I understand what goes into it and it explains the long process, but I do appreciate your attention to detail and sense of quality, it gives me an even greater confidence in the high standards of the guava code base.

Author

martinfrancois commented Dec 18, 2018 •

edited

Loading

Hi everyone,
I don't want to sound impatient, but it's been almost a year now. Are there any news on this one?
Thanks,
François

maehly commented Mar 6, 2019 •

edited

Loading

François Martin has personally and elaborately explained to me this PR and it's history. Actually he was so kind to copy over the implementation into a project we are working on together. I vote for merging this PR.

Author

martinfrancois commented May 21, 2019

@kevinb9n or @cgdecker could I please get an update on the current status of your internal review process concerning this PR?

martinfrancois mentioned this pull request

remove Ascii class dlsc-software-consulting-gmbh/PreferencesFX#89

Merged

cgdecker added the P3 label

kevinb9n removed their assignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes P3 package=base status=triaged