Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nil packet triggers an exception in Rex::Post::Meterpreter (packet.rb:323) #10177

Closed
2 of 4 tasks
asoto-r7 opened this issue Jun 15, 2018 · 23 comments · Fixed by #10699
Closed
2 of 4 tasks

nil packet triggers an exception in Rex::Post::Meterpreter (packet.rb:323) #10177

asoto-r7 opened this issue Jun 15, 2018 · 23 comments · Fixed by #10699
Assignees
Labels

Comments

@asoto-r7
Copy link
Contributor

asoto-r7 commented Jun 15, 2018

Steps to reproduce

There are two cases in which a closing thread causes Meterpreter to send an empty packet, presumably because the module fails during an exception and the thread is forcibly shutdown.

Case # 1: auxiliary/server/socks5 is unable to connect to the requested destination.

Case # 2: auxiliary/scanner/mysql/mysql_version fails while scanning a MySQL server.

Case # 3: #10160 - SMB fails to connect, likely due to a lack of SMB2 support.

In all cases, the traceback doesn't provide an explanation as to why the module failed.

Expected behavior

An empty packet should not be sent, the thread should close without a traceback, and an informative error message should be provided to the user.

Current behavior

#<Thread:0x00007faf68edec10@/usr/share/metasploit-framework/lib/msf/core/thread_manager.rb:93 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
        17: from /usr/share/metasploit-framework/lib/msf/core/thread_manager.rb:100:in `block in spawn'
        16: from /usr/share/metasploit-framework/lib/rex/thread_factory.rb:22:in `block in spawn'
        15: from /usr/share/metasploit-framework/vendor/bundle/ruby/2.5.0/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:121:in `block in monitor_rsock'
        14: from /usr/share/metasploit-framework/vendor/bundle/ruby/2.5.0/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:121:in `loop'
        13: from /usr/share/metasploit-framework/vendor/bundle/ruby/2.5.0/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:188:in `block (2 levels) in monitor_rsock'
        12: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/extensions/stdapi/net/socket_subsystem/tcp_client_channel.rb:90:in `close_write'
        11: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/extensions/stdapi/net/socket_subsystem/tcp_client_channel.rb:106:in `shutdown'
        10: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:220:in `send_request'
         9: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:248:in `send_packet_wait_response'
         8: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:181:in `send_packet'
         7: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:823:in `to_r'
         6: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:823:in `call'
         5: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:595:in `to_r'
         4: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:453:in `each'
         3: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:453:in `each'
         2: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:596:in `block in to_r'
         1: from /usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:323:in `to_r'
/usr/share/metasploit-framework/lib/rex/post/meterpreter/packet.rb:323:in `pack': no implicit conversion of nil into Integer (TypeError)```

System stuff

Metasploit version

This behavior was observed with mysql_version on 4.16.48-dev, and with socks5 on the latest master branch.

I installed Metasploit with:

OS

This behavior was observed under Kali 2017.3 and Mac OS X 10.13.4.

@asoto-r7
Copy link
Contributor Author

asoto-r7 commented Jul 5, 2018

Gathering potentially-related comments on potentially-related issues:

#9356 (route add and closed connections with connect)

@sempervictus:

The problem would manifest as a nil read on a supposed tlv after many packets.

IIRC, this became much more visible (used to be super rare) when I did the UDP pivot stuff. I changed the stream abstraction code down in rex, and in msf while doing this. [...]

I'm wondering if the buffer was initialized with an empty string, which got read and sent to the other side, did that send op work off a nil or an EOL?

#10160 (route add and auxiliary/scanner/smb/smb_login failed)

This may have been due to a lack of SMB2 support in RubySMB, which may have forced the target to RST the TCP connection, triggering the same state as the previous issue.

#10600 (Intermittent/frequent errors when using several modules)

@rwhitcroft:

The same errors occur when using a different payload (win/x64/meterp/rtcp), and different modules - aux/scanner/smb/smb_ms17_010 and aux/scanner/portscan/tcp were the only ones I tried, both exhibit same errors.

@rwhitcroft
Copy link
Contributor

rwhitcroft commented Sep 11, 2018

Is there a workaround to stop the bleeding here? My screen is all backtrace. Downgrade ruby or rex-socket? Anything?

Just to clarify what I'm experiencing - anything over a session route has a 30-40% chance of causing an error/backtrace.

@OJ
Copy link
Contributor

OJ commented Sep 12, 2018

So the line of code that is bailing is in this function:

  def to_r
    # Forcibly convert to ASCII-8BIT encoding
    raw = value.to_s.unpack("C*").pack("C*")

    if (self.type & TLV_META_TYPE_STRING == TLV_META_TYPE_STRING)
      raw += "\x00"
    elsif (self.type & TLV_META_TYPE_UINT == TLV_META_TYPE_UINT)
      raw = [value].pack("N")                                       <----------- this one here.
    elsif (self.type & TLV_META_TYPE_QWORD == TLV_META_TYPE_QWORD)
      raw = [ self.htonq( value.to_i ) ].pack("Q<")
    elsif (self.type & TLV_META_TYPE_BOOL == TLV_META_TYPE_BOOL)
      if (value == true)
        raw = [1].pack("c")
      else
        raw = [0].pack("c")
      end
    end
   . . . . . . < snip > ....
    [raw.length + HEADER_SIZE, self.type].pack("NN") + raw
  end

Two things that come to mind are:

  1. Why are we referencing value and not self.value ? It seems like we should probably be doing the latter.
  2. Should we just return nil (or an empty string) if value is nil?

This version of to_r is part of the Tlv class, so it might be fair to say that if the given value is nil we just don't do anything.

@rwhitcroft
Copy link
Contributor

I actually tried returning nil (and fixing up caller) and it seemed to work at first, but not really. Returning nil maybe doesn't send the packet but framework still waits for a response as if it did?

@sempervictus
Copy link
Contributor

Looks like its expecting a uint... So I don't think nil/"" will work. Iirc I tried walking the callers at some point, but its cross thread so I lost the chain.
IMO, this is likely coming from c meterpreter as I've never seen it with mettle (even mingw).

@sempervictus
Copy link
Contributor

@OJ: if the framework end is at that line, its partially through a tlv parse, right? Meaning the sending end encoded a structure which, on deserialization, appears to have a type of uint and yet a value of nil... Alternatively, preceding steps in the framework pipeline in the recv stack are somehow truncating the packet/tlv. Right?

@OJ
Copy link
Contributor

OJ commented Sep 12, 2018

It's trying a pack a uint that's nil. Yes. But if we return '' when the value is nil then we basically avoid the problem (and is an edge case that we should cater for anyway). It might not solve the issue, but I think it's worth doing anyway.

@rwhitcroft
Copy link
Contributor

rwhitcroft commented Sep 25, 2018

Not to sound like an ungrateful jerk, but I'm failing to understand why this isn't a bigger issue. Routes are essentially unusable at the moment unless you're willing to put up with fullscreen backtraces every 2 seconds.

Is using routes just not that popular anymore? I can't figure out why a hundred other people aren't posting identical issue reports. If anyone needs a reminder of just how awful this is, run smb_login with a userlist over a session route.

At this point I'd be thrilled to be shown I'm wrong or I'm using routes "the wrong way", but I don't see it. What the hell is going on here?

@busterb
Copy link
Member

busterb commented Sep 25, 2018

Has anyone tried to git bisect it? I haven't tried reproducing it yet, but can take a look.

@busterb busterb self-assigned this Sep 25, 2018
@busterb
Copy link
Member

busterb commented Sep 25, 2018

Grabbing since I can now reproduce it. Took a little while due to various firewalls and other nannies on this machine causing things to fail too gracefully at the OS layer.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

On python meterpreter, we see this series of events running smb_login. The failure happens in close, which is at the end of the TCP connection, so it produces more noise than problem during the connection.

[] core_channel_open dispatching to handler: channel_open_stdapi_net_tcp_client
[
] added channel id: 1 type: MeterpreterSocketTCPClient
[] running method core_channel_write
[
] running method core_channel_write
[] running method core_channel_write
[
] running method core_channel_write
[] running method core_channel_close
[-] method core_channel_close resulted in error: #1
[
] running method core_channel_close

@busterb
Copy link
Member

busterb commented Sep 25, 2018

The first commit where this bug occurs is in the transition to Ruby 2.5. Likely reasons why the hordes have not descended: our packages are still based on earlier versions of Ruby, you can still use any version of Ruby you want to 2.3.

$ git bisect good
d67ae503bf547f33564b2547d924e8b925ac3c6c is the first bad commit
commit d67ae503bf547f33564b2547d924e8b925ac3c6c
Author: Brent Cook <busterb@gmail.com>
Date:   Mon Mar 26 02:52:52 2018 -0500

    Update base to Ruby 2.5

I'm going to try 2.4.4 on master and see what happens next.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

Works fine with Ruby 2.4.4:

[*] Route added
rhosts => dell

[*] 192.168.1.34:445     - 192.168.86.1:445 - Starting SMB login bruteforce
[-] 192.168.1.34:445     - 192.168.86.1:445 - Failed: '.\blah:blah',
[*] dell:445              - Scanned 1 of 1 hosts (100% complete)
[*] Auxiliary module execution completed

What does it mean? We're either depending on undefined behavior, there's an exception being swallowed in 2.4 that's different in 2.5, or 2.5 has a regression. It'd be nice to revert for now, but Kali Linux and other security distributions moved on to 2.5 globally for all Ruby tools around the time of the commit above, which is what dragged us along for the ride. Since they do their own packaging, and always use system packages, it's hard to revert, but it could be possible.

Still, we should probably figure out what causes this to happen in Ruby 2.5 only.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

For reference, the backtrace in master at the time of posting is:

Traceback (most recent call last):
	17: from /Users/bcook/projects/metasploit-framework/lib/msf/core/thread_manager.rb:100:in `block in spawn'
	16: from /Users/bcook/projects/metasploit-framework/lib/rex/thread_factory.rb:22:in `block in spawn'
	15: from /Users/bcook/.rvm/gems/ruby-2.5.1@metasploit-framework/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:121:in `block in monitor_rsock'
	14: from /Users/bcook/.rvm/gems/ruby-2.5.1@metasploit-framework/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:121:in `loop'
	13: from /Users/bcook/.rvm/gems/ruby-2.5.1@metasploit-framework/gems/rex-core-0.1.13/lib/rex/io/socket_abstraction.rb:188:in `block (2 levels) in monitor_rsock'
	12: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/extensions/stdapi/net/socket_subsystem/tcp_client_channel.rb:90:in `close_write'
	11: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/extensions/stdapi/net/socket_subsystem/tcp_client_channel.rb:106:in `shutdown'
	10: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:172:in `send_request'
	 9: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:200:in `send_packet_wait_response'
	 8: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet_dispatcher.rb:133:in `send_packet'
	 7: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:823:in `to_r'
	 6: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:823:in `call'
	 5: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:595:in `to_r'
	 4: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:453:in `each'
	 3: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:453:in `each'
	 2: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:596:in `block in to_r'
	 1: from /Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:323:in `to_r'
/Users/bcook/projects/metasploit-framework/lib/rex/post/meterpreter/packet.rb:323:in `pack': no implicit conversion of nil into Integer (TypeError)

@busterb
Copy link
Member

busterb commented Sep 25, 2018

It appears that the channel ID in the close message is Nil or sometimes '1' when I run this test with a clean setup. In the '1' case, it is still an error, because meterpreter says there is no such channel ID:

[*] running method stdapi_net_socket_tcp_shutdown
[-] method stdapi_net_socket_tcp_shutdown resulted in an error
Traceback (most recent call last):
  File "<string>", line 1363, in create_response
  File "", line 1734, in stdapi_net_socket_tcp_shutdown
KeyError: 1
[*] running method core_channel_close
[-] method core_channel_close resulted in error: #1
[*] 192.168.86.34:445     - 192.168.86.34:445 - Starting SMB login bruteforce
[-] 192.168.86.34:445     - 192.168.86.34:445 - Failed: '.\blah:blah',
how: 1, cid: nil
#<Thread:0x00007fd84eb446b8@/Users/bcook/projects/metasploit-framework/lib/msf/core/thread_manager.rb:93 run> terminated with exception (report_on_exception is true):

@busterb
Copy link
Member

busterb commented Sep 25, 2018

Gross, look what I found lurking in the channel code, a finalizer!

  def self.finalize(client, cid)
    proc { self._close(client,cid) }
  end

This does a close not when the user asks, but whenever the garbage collector happens to run. That's quite Ruby-version dependent in behavior. This already caused a bunch of crashes and problems in the registry code, let's see if it affects this too.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

When the TCP channel gets closed here, it appears to happen at least 3 separate times. There's a lot of non-threadsafe code here has TOCTOU problems. This pattern:

unless cid.nil?
   send_close_message(cid)
   cid = nil
end

Pretty much ensures that eventually Ruby would be fast enough that 2 threads would be in that inner condition at the same time, and someone would send a packet with a nil cid.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

Another issue is at least python meterpreter, if the remote side closes a TCP connection, will delete the channel from its internal table via handle_dead_resource_channel before metasploit can tell it to also close the channel with stdapi_net_socket_tcp_shutdown, which also leads to an error.

@busterb
Copy link
Member

busterb commented Sep 25, 2018

And, after tweaking python meterpreter to accept closing channels that already self-closed, and metasploit to only try closing once, got it to work with Ruby 2.5.1 as well!

[*] 192.168.86.34:445     - 192.168.86.34:445 - Starting SMB login bruteforce
connected on cid 1
how: 1, cid: 1
[-] 192.168.86.34:445     - 192.168.86.34:445 - Failed: '.\blah:blah',
[*] dell:445              - Scanned 1 of 1 hosts (100% complete)
[*] Auxiliary module execution completed
[*] Killing all sessions...
[*] 127.0.0.1 - Meterpreter session 1 closed.
[*] running method core_channel_open
[*] core_channel_open dispatching to handler: channel_open_stdapi_net_tcp_client
[*] added channel id: 1 type: MeterpreterSocketTCPClient
[*] running method core_channel_write
[*] running method core_channel_write
[*] running method core_channel_write
[*] running method core_channel_write
[*] running method stdapi_net_socket_tcp_shutdown

@sempervictus
Copy link
Contributor

Now that is a ghost in the machine. I hadn't even considered the VM in use since I only run MRI (till @brixen gets that jit fixed), though I do feel a bit stupid for missing the GC call. Wonder how many other cases there are :-).

Thank you very much @busterb

busterb added a commit to busterb/metasploit-framework that referenced this issue Sep 25, 2018
Don't send a close message for a nil channel ID, and if we do send a close
message, only do it once. I could have added a mutex somewher in _close(), but
because it's a class method, it's a little awkward and would require all of the
callers to instead have voluntary lock. As an alternative, I just made the
finalizer close the channel instead.

Fixes rapid7#10177
@busterb
Copy link
Member

busterb commented Sep 27, 2018

@rwhitcroft any comments on the referenced patch? Thanks!

@rwhitcroft
Copy link
Contributor

No sir, just thanks!

@busterb
Copy link
Member

busterb commented Oct 2, 2018

Sure thing. I have a handful of other patches that I'll get up that fix other finalizer-related bugs as well, since this antipattern got repeated on a number of Meterpreter objects.

@busterb busterb mentioned this issue Jan 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants