Skip to content

Commit

Permalink
Use an explicit "UTF-8" character set argument when creating Strings …
Browse files Browse the repository at this point in the history
…from bytes. The platform default character set is guaranteed to be UTF-8.

PiperOrigin-RevId: 576577338
  • Loading branch information
herbyderby authored and sjamesr committed Oct 25, 2023
1 parent 7339d54 commit 97df44e
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion java/com/google/re2j/Matcher.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
package com.google.re2j;

import com.google.re2j.MatcherInput.Encoding;
import java.io.UnsupportedEncodingException;
import java.util.Map;

/**
Expand Down Expand Up @@ -363,7 +364,11 @@ private boolean genMatch(int startByte, int anchor) {
String substring(int start, int end) {
// UTF_8 is matched in binary mode. So slice the bytes.
if (matcherInput.getEncoding() == Encoding.UTF_8) {
return new String(matcherInput.asBytes(), start, end - start);
try {
return new String(matcherInput.asBytes(), start, end - start, "UTF-8");
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e); // Not possible.
}
}

// This is fast for both StringBuilder and String.
Expand Down

1 comment on commit 97df44e

@herbyderby
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that description had a typo, should have said "The platform default character set is not guaranteed to be UTF-8"

Please sign in to comment.