Skip to content

Conversation

@martinfrancois
Copy link

Implements the changes proposed in issue #3011
I was not sure in which version this is going to be pulled in - so I left // TODO comments at the @since tags. Feel free to tell me which number to change it to 😃

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@martinfrancois
Copy link
Author

I signed it!

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no labels Jan 3, 2018
@kluever kluever requested a review from kevinb9n January 4, 2018 00:07
* @since 16.0 // TODO: which number?
*/
public static int indexOfIgnoreCase(CharSequence sequence, CharSequence subSequence) {
return toLowerCase(sequence).indexOf(toLowerCase(subSequence));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this implementation may be too slow: If the length of sequence is n, the length of subSequence is m, then the time complexity is always O(n+m) or higher, because all the chars need to be converted into a string first; if we can change a letter to lower case one by one, then the execution time may be faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I didn't initially think of this, but I'll definitely have a look into it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixed that one by implementing the toLowerCase() call in the for-loop itself

}
// if the prefix is longer than the sequence itself or from the offset, it is impossible
// for the sequence to contain the prefix
if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put if (prefix.length() > seq.length() - offset) unless someone maliciously passed a negative number for offset parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it like that initially but in the case of seq as empty string and prefix as "x" it will result in a negative prefix, which lead me to this implementation. Should I check this in another way?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this is totally fine.

* between {@code 'a'} and {@code 'z'} or {@code 'A'} and {@code 'Z'} inclusive, or
* {@code -1}, if no occurence is found.
*
* @since 16.0 // TODO: which number?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change these to @since NEXT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know! I will update them.

if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {
return false;
}
return indexOfIgnoreCase(seq, prefix) == offset;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you expect startsWithIgnoreCase("aaa", "a", 1) to return? I'd vote for true, but this is impossible to implement without a indexOfIgnoreCase(seq, prefix, offset).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is

  1. logically flawed
  2. Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing! I fixed it in the latest commit.

*
* @since 16.0 // TODO: which number?
*/
public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix, int offset) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A negative offset should probably be forbidden.

* @since 16.0 // TODO: which number?
*/
public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) {
return startsWithIgnoreCase(seq, prefix, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much less efficient than a direct check, e.g., for startsWithIgnoreCase("aaaaaaaaaaaaaaaaa", "b").

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the last one is fixed, then it is fine here.

if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {
return false;
}
return indexOfIgnoreCase(seq, prefix) == offset;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is

  1. logically flawed
  2. Too slow (You should just use a for loop to check. If the length of prefix is m, then the time complexity is O(m) maximum)

* @since 16.0 // TODO: which number?
*/
public static boolean startsWithIgnoreCase(CharSequence seq, CharSequence prefix) {
return startsWithIgnoreCase(seq, prefix, 0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the last one is fixed, then it is fine here.

* @since NEXT
*/
public static int indexOfIgnoreCase(String string, String subString, int fromIndex) {
return indexOfIgnoreCase(string.toCharArray(), 0, string.length(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toCharArray() involves a linear time copy of array. It is slow, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I totally forgot about that. I fixed it now.

if (subSequence.length() > length) {
return false;
}
return indexOfIgnoreCase(sequence.toString(), subSequence.toString()) > -1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still should not toString

if (prefix.length() > seq.length() || prefix.length() > seq.length() - offset) {
return false;
}
return indexOfIgnoreCase(seq.toString(), prefix.toString()) == offset;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is

  1. Logically flawed
  2. Too slow (A single for loop can fix it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounded like a good idea at first but now I agree - you guys are 100% correct. Will fix that.

for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (toLowerCase(source[i]) != first) {
while (++i <= max && toLowerCase(source[i]) != first) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change it to a do-while with increment of i in the loop? that way the code look cleaner.

return fromIndex;
}

char first = toLowerCase(target[targetOffset]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just use getAlphaIndex to make things efficient

* @param targetCount count of the target string.
* @param fromIndex the index to begin searching from.
*/
private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can copy the whole method, changing two char array arguments to char sequence arguments so that you can deal with strings and string builders directly without toCharArray calls that duplicate all the char contents.

*/
private static int indexOfIgnoreCase(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
private static int indexOfIgnoreCase(String source, int sourceOffset, int sourceCount,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charsequense instead

*
* @since NEXT
*/
public static int indexOfIgnoreCase(String string, String subString, int fromIndex) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String ->CharSequence

if (toLowerCase(source.charAt(i)) != first) {
do {
++i;
} while(i <= max && toLowerCase(source.charAt(i)) != first);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace all toLowerCase calls to getAlphaIndex calls for efficiency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was quite some work but now I changed the implementation to use getAlphaIndex instead of toLowerCase. Please have a look again. If everything is fine, I'll squash the commits.

@liach
Copy link

liach commented Jan 5, 2018

Thanks for this great pull request!

…se and endsWithIgnoreCase to class base.Ascii
@martinfrancois
Copy link
Author

martinfrancois commented Jan 5, 2018

You're welcome 😃 thanks a lot, too, for the very thorough and good code reviews of you all, they were very constructive and helpful.
I now squashed the commits. Maybe one of you Googlers can run the benchmark (I hope I did it right)? I don't think there is any possibility to run them myself. Would be interesting to see the performance differences. Besides that, from my side it's good to merge.

@martinfrancois
Copy link
Author

Hey there, just wanted to ask if there's any news concerning this pull request?

@jbduncan
Copy link
Contributor

jbduncan commented Jan 15, 2018

@martinfrancois I'm not on the Guava team myself, so I can't truly speak on their behalf, but as someone who's contributed to the project for a couple of years now, I think it's worth letting you know that they can be very slow addressing new issues and PRs, which I think happens because they're busy as a whole dealing with Google-internal priorities.

So if they take forever to respond (which seems likely at this stage, sadly), it absolutely won't be because you did something wrong; they're just apparently very, very busy with their day-to-day jobs, and someone from the team will eventually respond to this PR.

If you don't get a response as quickly as you'd like, writing a reminder message like the one you wrote here every now and then should encourage a more timely response, even if they're a bit too busy to review the PR properly.

I hope this helps! :)

@cpovirk
Copy link
Member

cpovirk commented Jan 16, 2018

There are various reasons that we haven't done a good job about replying to issues. Busyness is part of it, but we should do better. It looks like our current triage process misses pull requests. I'll try to get that fixed.

@martinfrancois
Copy link
Author

@jbduncan thanks for the very kind words and the tips 😃 I appreciate that a lot, it's good to know that it's not an ununsual case.

@cpovirk also thanks for taking it serious and trying to get pull requests integrated into your triage process 👍

@martinfrancois
Copy link
Author

Thanks for accepting my pull request! I wanted to ask when the changes of it are going to get synced out? I haven't seen it on the master yet.

@martinfrancois
Copy link
Author

Any news on this PR?

@kevinb9n
Copy link
Contributor

I'm sorry that we are so slow on this. Fundamentally we appreciate the request, but we have a bit of work to do on our end that we haven't been able to prioritize yet.

FYI, I do see a pretty significant amount of Google code writing stuff like if (s.toLowerCase().startsWith(...)) so I do think the case for adding these methods to Guava may be a pretty good one.

@martinfrancois
Copy link
Author

@kevinb9n thanks for the update on this! It has now been almost half a year, are there any internal processes it still needs to go through or did the team just not get around to merging it into the codebase yet?
Thanks!

@cgdecker
Copy link
Member

I want to quickly note here that when an issue or PR is marked "triaged" and assigned to someone, that doesn't necessarily mean it's accepted. Triaged just means that someone has looked at it and thinks it's well-formed and worth thinking about more, and assigned it to someone who they think should best be able to consider it. That said, Kevin has noted that this seems like something that there's a good chance we'll want to do.

Another note: API additions do generally need to go through an API review process we have internally, which tends to involve writing up a doc exploring pros and cons of an API, basic statistics for how often existing code in our codebase would want to use the API(s), options for how to address the problem, etc., followed by scheduling it to be discussed with a group of reviewers. That's obviously a bit of work, so even when you've written the code, it can be hard to fit it in with everything else we're doing. That's not to say we can't do better--we can and should do better, and are working on improving our processes around bugs/PRs--but just to give some idea of why things might take so long sometimes.

@martinfrancois
Copy link
Author

martinfrancois commented Jun 29, 2018

Thanks a lot @cgdecker! I tried to find a description of your process but was unable to. Your description made it very clear to me. Now I understand what goes into it and it explains the long process, but I do appreciate your attention to detail and sense of quality, it gives me an even greater confidence in the high standards of the guava code base.

@martinfrancois
Copy link
Author

martinfrancois commented Dec 18, 2018

Hi everyone,
I don't want to sound impatient, but it's been almost a year now. Are there any news on this one?
Thanks,
François

@maehly
Copy link

maehly commented Mar 6, 2019

François Martin has personally and elaborately explained to me this PR and it's history. Actually he was so kind to copy over the implementation into a project we are working on together. I vote for merging this PR.

@martinfrancois
Copy link
Author

@kevinb9n or @cgdecker could I please get an update on the current status of your internal review process concerning this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants