New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between Socket.sockaddr_in in jruby vs ruby #4863

Open
ChuckWoodraska opened this Issue Nov 21, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@ChuckWoodraska

ChuckWoodraska commented Nov 21, 2017

Environment

Provide at least:

  • JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc)
    jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 Java HotSpot(TM) 64-Bit Server VM 25.144-b01 on 1.8.0_144-b01 +indy +jit [linux-x86_64]
  • Operating system and platform (e.g. uname -a)
    Linux 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Other relevant info you may wish to add:
Its the logstash 6.0.0 version of jruby

Expected Behavior

  • Describe your expectation of how JRuby should behave, perhaps by showing how CRuby/MRI behaves.
    I expect:
    Socket.sockaddr_in(80, "127.0.0.1")
    to output: "\x02\x00\x00P\x7F\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00"

Actual Behavior

  • Describe or show the actual behavior.
    this version of jruby it outputs: "\x00\x02\x00P\xD0+f\xFA\x00\x00\x00\x00\x00\x00\x00\x00"
    which looks like it the first two bytes are switched
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 21, 2017

Member

You'd do best to not look under the covers of Sockaddr_in and friends on JRuby; we're not using native structures, so they're likely going to differ from what the platform says.

The difference in your case may be a real bug, but I find the exposure of Sockaddr_in and friends in the Ruby API to be serious code smells. It's usually difficult to write code using it that works across platform, and it's forcing an arbitrary, hidden implementation detail on users of the library.

That said, I know we should at least be consistent. Why are you looking at the Sockaddr_in bytes at all?

Member

headius commented Nov 21, 2017

You'd do best to not look under the covers of Sockaddr_in and friends on JRuby; we're not using native structures, so they're likely going to differ from what the platform says.

The difference in your case may be a real bug, but I find the exposure of Sockaddr_in and friends in the Ruby API to be serious code smells. It's usually difficult to write code using it that works across platform, and it's forcing an arbitrary, hidden implementation detail on users of the library.

That said, I know we should at least be consistent. Why are you looking at the Sockaddr_in bytes at all?

@ChuckWoodraska

This comment has been minimized.

Show comment
Hide comment
@ChuckWoodraska

ChuckWoodraska Nov 21, 2017

I guess actually what I really wanted was Socket.unpack_sockaddr_in("\x02\x00\x00P\x7F\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00")
but it expects those first two bytes to be switched. The reason I'm trying to use that is that I'm reading in Linux auditd logs and one of the fields is saddr which is in byte form. I would like to extract the port and ip so that I can search on it in elasticsearch. I was trying to transform the data in a logstash ruby filter using unpack_sockaddr_in and ran into this issue. So both sockaddr_in and unpack_sockaddr_in seem to want those first two bytes switched, but the incoming data matches the expected behavior of ruby. For right now I just flip the bytes in my filter and it works, but felt like this should behave similarly to ruby.

ChuckWoodraska commented Nov 21, 2017

I guess actually what I really wanted was Socket.unpack_sockaddr_in("\x02\x00\x00P\x7F\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00")
but it expects those first two bytes to be switched. The reason I'm trying to use that is that I'm reading in Linux auditd logs and one of the fields is saddr which is in byte form. I would like to extract the port and ip so that I can search on it in elasticsearch. I was trying to transform the data in a logstash ruby filter using unpack_sockaddr_in and ran into this issue. So both sockaddr_in and unpack_sockaddr_in seem to want those first two bytes switched, but the incoming data matches the expected behavior of ruby. For right now I just flip the bytes in my filter and it works, but felt like this should behave similarly to ruby.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 22, 2017

Member

@ChuckWoodraska Ok, that makes things a little clearer for me. So the short answer is that you won't be able to use JRuby for this right now without a platform-appropriate sockaddr_in.

The long answer is that we're probably going to need to move toward native sockets soon anyway. That will probably be easiest if we can adopt the FFI-based socket library that @YorickPeterse wrote, but it will take some work to get there since it depends on another Ruby VM's internals.

Another option, which isn't super attractive but probably less work, is to simply do a better job of outputting proper sockaddr_in for each platform. As far as I can tell, none of our output is right on macos, and it doesn't appear to be right for you either.

Member

headius commented Nov 22, 2017

@ChuckWoodraska Ok, that makes things a little clearer for me. So the short answer is that you won't be able to use JRuby for this right now without a platform-appropriate sockaddr_in.

The long answer is that we're probably going to need to move toward native sockets soon anyway. That will probably be easiest if we can adopt the FFI-based socket library that @YorickPeterse wrote, but it will take some work to get there since it depends on another Ruby VM's internals.

Another option, which isn't super attractive but probably less work, is to simply do a better job of outputting proper sockaddr_in for each platform. As far as I can tell, none of our output is right on macos, and it doesn't appear to be right for you either.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 22, 2017

Member

Ping @enebo...there's many reasons for going to native sockets, but I'm not sure of the right path. A jnr-socket library would be very useful for way more than just JRuby, but we'd be doing a lot of the work by hand...unless we can get jnr-clang working.

Member

headius commented Nov 22, 2017

Ping @enebo...there's many reasons for going to native sockets, but I'm not sure of the right path. A jnr-socket library would be very useful for way more than just JRuby, but we'd be doing a lot of the work by hand...unless we can get jnr-clang working.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Nov 22, 2017

Member

@headius yeah navigating the proper path here is likely the path of least support but still 100% correct. I agree that jnr-something would work but it would be quite a bit of ongoing support. ffi does not really reduce that work much but yorick's library would cut out quite a bit of the initial work. The support would be large either way though? Being in Ruby would likely allow more people to PR? I lean slightly to Ruby FFI vs JNR but only because so few people want to support our JNR stuff.

Another question is whether we can really use these byte addrs at all right now? I mean we are providing them because MRI does but can how can we use them. I understand the reporter just wants to compare them so if we just did an ffi call for just Sockaddr_in what would break? I mean both ways so you can convert those bytes back to something our Java impl can use.

Member

enebo commented Nov 22, 2017

@headius yeah navigating the proper path here is likely the path of least support but still 100% correct. I agree that jnr-something would work but it would be quite a bit of ongoing support. ffi does not really reduce that work much but yorick's library would cut out quite a bit of the initial work. The support would be large either way though? Being in Ruby would likely allow more people to PR? I lean slightly to Ruby FFI vs JNR but only because so few people want to support our JNR stuff.

Another question is whether we can really use these byte addrs at all right now? I mean we are providing them because MRI does but can how can we use them. I understand the reporter just wants to compare them so if we just did an ffi call for just Sockaddr_in what would break? I mean both ways so you can convert those bytes back to something our Java impl can use.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 22, 2017

Member

@enebo The uses I've seen for these APIs mostly just want to capture some structure representing the addrinfo contents. I believe this is intended to be replaced by the proper Addrinfo class, of which we have implemented a decent portion.

The case reported here is somewhere in the middle; @ChuckWoodraska wants to be able to read in a log from auditd, which inserts some of addrinfo into the log as an encoded representation of the platform-local sockaddr_in byte array.

The bytes we present are...unusual. I'm not sure what they match now. These structures could very easily have changed from 32 to 64 bit as well. In any case, nobody seems to understand them but us, so we can't read in those log entries.

I think for uses of addrinfo and sockaddr_in and sockaddr_un and friends from within Ruby, the direction is toward using the Addrinfo abstraction rather than passing around the byte arrays. But integrating with other socket utilities on the same system may always require the appropriate bytes. 🙄

Member

headius commented Nov 22, 2017

@enebo The uses I've seen for these APIs mostly just want to capture some structure representing the addrinfo contents. I believe this is intended to be replaced by the proper Addrinfo class, of which we have implemented a decent portion.

The case reported here is somewhere in the middle; @ChuckWoodraska wants to be able to read in a log from auditd, which inserts some of addrinfo into the log as an encoded representation of the platform-local sockaddr_in byte array.

The bytes we present are...unusual. I'm not sure what they match now. These structures could very easily have changed from 32 to 64 bit as well. In any case, nobody seems to understand them but us, so we can't read in those log entries.

I think for uses of addrinfo and sockaddr_in and sockaddr_un and friends from within Ruby, the direction is toward using the Addrinfo abstraction rather than passing around the byte arrays. But integrating with other socket utilities on the same system may always require the appropriate bytes. 🙄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment