Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Hostname.get_fqdn on AWS Lambda #697

Closed
metaskills opened this issue Jun 5, 2021 · 14 comments · Fixed by #1015
Closed

Fix Hostname.get_fqdn on AWS Lambda #697

metaskills opened this issue Jun 5, 2021 · 14 comments · Fixed by #1015
Assignees
Labels
bug community To tag external issues and PRs submitted by the community

Comments

@metaskills
Copy link

Hi! My name is Ken Collins and I work on Ruby/Rails integration with AWS Lambda. More details here. https://lamby.custominktech.com. I've done posts before on how to use APM with Rails and Lambda and today I noticed a new issue when shutting down an instance when working on a simple task runner project. There is no hostname on AWS Lambda.

** [NewRelic][2021-06-04 19:24:15 -0400 169.254.216.237 (27)] INFO : Starting Agent shutdown
/var/runtime/lib/runtime.rb: No such file or directory - hostname

My assumption was that Hostname.get_fqdn is the reason for this. If I use NEW_RELIC_AGENT_ENABLED=false the error disappears. The get_fqdn method is written in such a that I should see an "Unable to determine fqdn #{e}" message but I do not. I attempted to override the get_fqdn method but still saw the "No such file or directory - hostname" message.

I've gone thru the code here, but can not find where the error happens. It does not cause a major issue but thought I'd ask is there anywhere any ideas you may have on where this message is coming from?

@tannalynn
Copy link
Contributor

Ahh yeah I see what you're saying about the self.get_fqdn method, we're running the hostname command there. I'm surprised overriding the get_fqdn method didn't stop that from happening, I wouldn't have expected that to bypass that command (and so error message) for sure. What did the override that you added look like?

As for the log message in that method, "Unable to determine fqdn #{e}", that's logged at the debug level. The default log level is info, so unless you change that in the config, the debug messages won't be logged. If you do have debug level enabled already, it should be showing a log message for that, so if it's not that would be suspicious.

Looking through the code, I don't really see any other areas that are using this command, but we do call Socket.gethostname in the self.get method in the same file as get_fgdn. I don't know for sure what that's doing under the covers, so maybe it's possible it could be coming from there? Not sure about that, just the only other thing I can find at this point that seems related.

@tannalynn tannalynn added the community To tag external issues and PRs submitted by the community label Jun 11, 2021
@metaskills
Copy link
Author

What did the override that you added look like?

Something like this so I could ensure the const was loaded prior to patching.

NewRelic::Agent::Hostname.module_eval do
  def self.get_fqdn
    # ...
  end
end

You can verify the technique like so. No idea why it did not work.

module HostTest
  def self.get_fqdn
    'your.host.com'
  end
end
HostTest.module_eval do
  def self.get_fqdn
    'my.host.com'
  end
end
HostTest.get_fqdn # => "my.host.com"

Any thoughts on where to look next?

@tannalynn
Copy link
Contributor

tannalynn commented Jun 16, 2021

Hmm, yeah that does seem like it should work correctly. Surprising that even with bypassing that method and the hostname command in it, you're still seeing that No such file or directory - hostname message. We do call Socket.gethostname, so I'm curious what happens for you if you try just calling that. I'm thinking maybe that error might be coming from inside there? I'm not sure though, to be honest. Very odd.

If that doesn't give that same error message output, then I'm not really sure. I'm not finding any other instances of us using that command in the agent, and we don't call that method in very many places. We do call NewRelic::Agent::Hostname.get in a few more places. That's one place that uses Socket.gethostname, which is why I'm thinking maybe that. Right now, I'm not sure beyond that, but I'd be very interested to see if those methods seem to be related and what happens with those for you.

@angelatan2 angelatan2 added the feedback Pending feedback/more information label Jun 22, 2021
@angelatan2
Copy link
Contributor

angelatan2 commented Jun 26, 2021

Hello @metaskills, did you get a chance to review the comments from tannalynn? Please let us know if you were able to try the suggestion of using Socket.gethostname.

@angelatan2
Copy link
Contributor

Thank you for submitting the issue. However, at this time we are unable to make progress without further feedback. Please reopen the issue if this is still a problem for you.

Ruby Engineering Board automation moved this from Triage to Done/Pending Release Jul 1, 2021
@metaskills
Copy link
Author

Thanks @angelatan2, I understand. I'm unsure where to take this as well. Just really appreciate y'all taking a look and offering some thoughts.

I'm changing the log level in my guide here (https://dev.to/aws-heroes/using-new-relic-apm-with-rails-on-aws-lambda-51gi) to be fatal which should avoid seeing it no matter what. I'll also be doing a post this week on moving that sync api call to flush data to a new Lambda Extension that allows this to be outside the request/response time. https://github.com/customink/lambda_punch#-usage

This will help us keep using New Relic APM in Rails & Lambda too. Feedback always welcome!

@brcarp
Copy link

brcarp commented Feb 15, 2022

I dug into this issue a bit and I thought I'd share some of my learnings with you, @angelatan2 and @metaskills:

  1. Socket.gethostname returns a numerical IP address in the absence of an alphanumeric hostname, as I'd expect from having glanced at its source
  2. As a test I monkeypatched lib/new_relic/agent/hostname.rb to use an environment value and to log with ::NewRelic::Agent.logger.info "Hostname reported as #{fqdn}" and that caused the No such file or directory error to go away (or I should say one of them, since I think we have another gem producing the same behavior).

So I do think that confirms that NewRelic::Agent::Hostname.get_fqdn is the source of this error message on a system that does not have a hostname executable or alias. This may be a low priority, but here are two strategies for addressing this that occurred to me:

  • Change instances of %x(hostname) to %x(which hostname >/dev/null && hostname) which will avoid the No such file or directory error and return the empty string when hostname can't be found
  • Catch the error and relay it more nicely through NewRelic::Agent.logger

@metaskills I'm not quite sure why module_eval didn't work, but I suspect it could be because there's some precompilation/pre-build taking place for the container and that calling module_eval at runtime may be too late. Unfortunately I'm coming up against the limits of my Ruby knowledge in trying to understand that aspect.

@metaskills
Copy link
Author

Thanks Brian. Super happy you found the right way to over ride and test this. @angelatan2 Worth re-opening?

@angelatan2
Copy link
Contributor

angelatan2 commented Feb 15, 2022

@brcarp Thanks for digging into this issue and sharing the steps to reproduce the problem. We really appreciate it.
@metaskills Yes, reopening to take a second look at this problem.

@angelatan2 angelatan2 reopened this Feb 15, 2022
@angelatan2 angelatan2 moved this from Done/Pending Release to Triage in Ruby Engineering Board Feb 15, 2022
@angelatan2 angelatan2 added bug and removed declined feedback Pending feedback/more information labels Feb 15, 2022
@angelatan2 angelatan2 added the Hero Inbox To review further during Support Hero Duty label Feb 28, 2022
@fallwith
Copy link
Contributor

fallwith commented Mar 8, 2022

I took a look at this one.

  • The NewRelic::Agent::Hostname class offers the get method which uses Socket.gethostname and the get_fqdn method which uses %x(hostname).
  • The "fqdn" (fully qualified domain name) value returned by get_fqdn is intended to be more detailed than the value returned by get
  • When ran from a Linux AMI based AWS EC2 instance for example, get will return ip-1-2-3-4 and get_fqdn will return ip-1-2-3-4.<aws region>.compute.internal
  • AWS Lambda does not provide a hostname binary in the running function's PATH
  • Calling NewRelic::Agent::Hostname.get_fqdn from AWS Lambda returns nil
  • The "No such file or directory - hostname" error raised by get_fdqn's use of %x(hostname) will hit STDERR / logs, but is rescued by get_fqdn.
  • The nil result from NewRelic::Agent::Hostname.get_fqdn will be handled by NewRelic::Agent::UtilizationData#append_full_hostname and will cause it to not append a value.
  • In addition to performing %x(hostname), the agent also performs %x(uname). The uname call works fine under Lambda.
  • Socket.gethostname leverages an internal Ruby C function
  • NewRelic::Agent::Hostname.get (which uses Socket.gethostname) works fine under Lambda and yields an IP address

I see two possible agent changes that we can consider introducing:

  • The "No such file or directory - hostname" error is handled, but doesn't look great when showing up in STDERR / logs. We can attempt to confirm a hostname binary exists prior to trying to execute it.
  • A failure to obtain an "fqdn" value simply means that the value won't be appended with the utilization data. For AWS Lambda and other systems that may not be able to perform %x(hostname), we can consider appending the IP address instead.

@metaskills
Copy link
Author

I like those two options. Thoughts @brcarp?

@brcarp
Copy link

brcarp commented Mar 8, 2022

I like those two options. Thoughts @brcarp?

I think these ideas compliment each other and are both reasonable. Attempting to verify hostname might requir care; I suggested %x(which hostname >/dev/null && hostname) which works on this platform, but separately I observed the other day that which wasn't found in a GitHub runner so even that solution gives me some pause.

Catching the error is perhaps more robust, for a slightly higher level of effort. I'm not attached to any particular path forward; I'm content to defer to the maintainers on appropriate remediation.

@brcarp
Copy link

brcarp commented Mar 8, 2022

I suggested %x(which hostname >/dev/null && hostname) which works on this platform, but separately I observed the other day that which wasn't found in a GitHub runner so even that solution gives me some pause.

And now I'm realizing that this is the result of not giving it more than a second or two's thought. Here's an equivalent solution which avoids the use of which and should be expected to work in just about any Bourne-derived shell:

  %x(hostname &> /dev/null && hostname)

To make it even more robust and avoid shell syntax altogether, Ruby's Open3.popen2 (or Open3.popen2e) allow for checking the exit status (or ignoring it entirely) and shouldn't independently write to STDERR like backticks will. For example:

  require 'popen3'
  begin
    Open3.popen2('hostname') { |stdin, stdout, wait_thr| puts stdout.read.chomp }
  rescue
    nil
  end

@fallwith
Copy link
Contributor

fallwith commented Mar 9, 2022

Thank you @metaskills and @brcarp! I'll keep you both posted on the maintainers' progress on these two options. I dig the use of Open3, good idea.

fallwith added a commit that referenced this issue Mar 9, 2022
* Do not shell out directly with `%x()` to any executable. Instead, use
the new Open3 based `NewRelic::Helper.run_command` method.
* The new `NewRelic::Helper.run_command` should prevent any unwanted
error messages pertaining to binary execution from appearing in any
output or log unless the log level is set to 'debug'.
* When executing an external binary, make sure it exists first
* If the 'hostname' binary does not exist, have
`NewRelic::Agent::Hostname.get_fqdn` fall back to `get`, which uses
`Socket`.
* When using 'uname' for JRuby to determine the platform, use 'unknown'
if the 'uname' call fails
* Introduce new `NewRelic::CommandExcutableNotFoundError` and
`NewRelic::CommandRunFailedError` error classes to help with logic flow.
* Bring in 'minitest-stub-const' as a development dependency for
stubbing constants

resolves #697

Thank you to @metaskills for submitting #697 and pointing out that
'hostname' will not be available to AWS Lambda Ruby functions.

Thank you to @brcarp for suggesting the use of Open3 as an improvement
over `%x()` to handle output.

Thank you to both @metaskills and @brcarp for their continued input and
maintainer collaboration on issue #697
@angelatan2 angelatan2 moved this from Triage to In progress in Ruby Engineering Board Mar 9, 2022
@angelatan2 angelatan2 removed the Hero Inbox To review further during Support Hero Duty label Mar 9, 2022
Ruby Engineering Board automation moved this from In progress to Done/Pending Release Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug community To tag external issues and PRs submitted by the community
Projects
Archived in project
Ruby Engineering Board
  
Code Complete/Done
Development

Successfully merging a pull request may close this issue.

5 participants