Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #30 #254

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

fix #30 #254

wants to merge 8 commits into from

Conversation

demon36
Copy link

@demon36 demon36 commented Jul 6, 2021

fix #30, DirectMemoryIO.getString() fails for non UTF-8

DirectMemoryIO.getString() fails for non UTF-8
@headius
Copy link
Member

headius commented Jul 7, 2021

Will review, thank you!

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions to improve the code... at least the formatting should be fixed, and any refinement you can make for the other comments would be great!

@@ -186,8 +187,30 @@ public String getString(long offset) {


public String getString(long offset, int maxLength, Charset cs) {
final byte[] bytes = IO.getZeroTerminatedByteArray(address() + offset, maxLength);
return cs.decode(ByteBuffer.wrap(bytes)).toString();
if(cs == StandardCharsets.UTF_8){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reformat according to more common Java formatting standards:

if (cond) {
...
} else {
...
for (stuff) {
...

Otherwise we get a mix of formats and future patches will likely include unhelpful formatting updates.

@@ -186,8 +187,30 @@ public String getString(long offset) {


public String getString(long offset, int maxLength, Charset cs) {
final byte[] bytes = IO.getZeroTerminatedByteArray(address() + offset, maxLength);
return cs.decode(ByteBuffer.wrap(bytes)).toString();
if(cs == StandardCharsets.UTF_8){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic could apply to any Charset that has a minimum character width of 1 byte, correct? All such encodings should use a single-byte \0 for C string termination I think?

At the very least this should include the ISO-8859 encodings, which are all single-byte (their CharsetEncoder.maxBytesPerChar will all be 1.0).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that I should include other single byte terminator charsets, but CharsetEncoder.maxBytesPerChar won't be sufficient because it does not resolve to 1.0 for utf-8

}else{
byte[] bytes = new byte[maxLength];
IO.getByteArray(address() + offset, bytes, 0, maxLength);
final byte[] nullCharBytes = new String("\0").getBytes(cs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could possibly be cached for common encodings (UTF-16, UTF-32) or it might be valid to just use a width of \0 equal to the encoding's CharsetEncoder.maxBytesPerChar value.

@headius
Copy link
Member

headius commented Aug 12, 2021

Request another review when you are ready. Thanks for keeping at it!

@demon36
Copy link
Author

demon36 commented Aug 12, 2021

this part of the code is quite tricky, will keep you posted, thanks

@demon36
Copy link
Author

demon36 commented Aug 12, 2021

btw is the irc channel still alive ?

@headius
Copy link
Member

headius commented Aug 16, 2021

@demon36 Ah no I expect the IRC channel, being on now-defunct FreeNode, is probably dead. I should set up a new channel on Matrix or libera. For now if you want to chat with those of us maintaining these libraries, stop by the #jruby channel on Matrix!

@demon36
Copy link
Author

demon36 commented Aug 18, 2021

@headius good to know that, the PR is ready for a review, please also take a look at Struct.java

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking near completion. Main issues are:

  • formatting throughout does not match Java conventions or the rest of the codebase
  • no tests provided for the behavior

Almost there!

return cs.decode(ByteBuffer.wrap(bytes)).toString();
long baseAddress = address() + offset;
int nullTermSize = StringUtil.terminatorWidth(cs);
if(nullTermSize == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting throughout should match the rest of the code, which is intended to match typical Java coding conventions:

  • four space indentation for blocks of code, 8-space indentation for line continuation
  • space between keywords like if and while and their parenthesized conditionals
  • spaces around operators like + and %

It's nitpicky but if we don't maintain consistent code formatting then we end up with future commits and PRs that have lots of unrelated changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, will take care of that

@@ -64,7 +66,8 @@ public static CharSequence getCharSequence(ByteBuffer buf, Charset charset) {
final ByteBuffer buffer = buf.slice();
// Find the NUL terminator and limit to that, so the
// StringBuffer/StringBuilder does not have superfluous NUL chars
int end = indexOf(buffer, (byte) 0);
final byte[] nullCharBytes = new byte[StringUtil.terminatorWidth(charset)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can cache the three known lengths and not reallocate this byte[] every time?

@headius headius added this to the 2.2.5 milestone Aug 18, 2021
@headius
Copy link
Member

headius commented Aug 18, 2021

Looks like this might need a merge or rebase from master to pick up that missing import too.

@headius
Copy link
Member

headius commented Aug 31, 2021

@demon36 Should we just close this in light of the work to do this all natively in jffi? I am inclined to keep this as fallback code when we do not have an updated jffi binary, but what are your thoughts going forward?

@headius headius modified the milestones: 2.2.5, 2.2.6, 2.2.7 Sep 1, 2021
@headius headius modified the milestones: 2.2.7, 2.2.8 Sep 16, 2021
@headius headius modified the milestones: 2.2.8, 2.2.9 Oct 26, 2021
@headius headius modified the milestones: 2.2.9, 2.2.10 Nov 22, 2021
@headius headius modified the milestones: 2.2.10, 2.2.11 Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Struct.UTFString.get() fails for UTF-16
2 participants