Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HttpStatusAgent #1776

Merged
merged 8 commits into from
Dec 31, 2016
Merged

Fix HttpStatusAgent #1776

merged 8 commits into from
Dec 31, 2016

Conversation

knu
Copy link
Member

@knu knu commented Nov 4, 2016

I found some bugs in HttpStatusAgent and flaws in the spec while investigating build failure in #1772. I tried to guess what the original intentions were because the specs seemed to contradict with the implementation when it comes to actual behavior, but I'm not certain if I interpreted the existing code and specs correctly.

Please take a look into individual commits and check if they make sense.

/cc: @strugee

@darrencauthon
Copy link
Contributor

I like the idea of testing through webmock, as it's a tighter fit to replicate the situation. I did not understand how to use the tool like that, so I unit tested the code instead. I think testing it through webmock is a superior method - especially because it's very tedious to have to watch the agent work, debug it, and then fake bits like "interpolated" just to verify what I saw during test runs.

However, what's not a superior process is to make changes to the agent without running it through actual test runs, and demonstrating the fix. As annoying as the unit tests may be, they were built to replicate the actual execution state that I saw through debugging, and I also tested every feature through Huginn itself to verify the results. And every pull request included screenshots of the feature running.

So on that note, this pull request says to "fix some bugs" but I think it's mostly about the final_url? I've never used this feature. I have the previous code running in my own Huginn, so I fired it up and I think I can demonstrate a bug:

screenshot_110416_070005_am

screenshot_110416_070704_am

@darrencauthon
Copy link
Contributor

Hmm... it seems to me that final_url should report back the "final url..." which would either be the page to which the user was-or-would-have-been redirected.

I haven't made a raw request to http://google.com; perhaps it does redirect to http://www.google.com? Maybe this is even working in that circumstance? I'll look it up later.

@knu
Copy link
Member Author

knu commented Nov 4, 2016

Let me describe what I called bugs more precisely, including how to reproduce them:

  1. The value of redirected in a created event is always true when the disable_redirect_follow option is set to false.

    To reproduce this, create an HttpStatusAgent with "Url" set to any URL that does not redirect, "Disable redirect follow" set to "false", and (dry) run it. You'll see it create a new event with redirected set to true even if there's no redirect and the url and final_url look identical.

    This occurs because when disable_redirect_follow is false, final_url is set to measured_result.result.to_hash[:url] which is a URI object, and the value of redirected is the result of url != final_url, which is always false because of the type mismatch.

  2. When an HttpStatusAgent receives an event with url, it still accesses the url in the options ignoring the value in the event. The same goes for headers_to_save. I'm not sure if this is intended, but reading the describe "receive" section in the spec file, the specs seem to expect agent to use the values in received events because agent's options are not changed from its default values ({ "url": "http://google.com", "headers_to_save": "Server" }).

    To reproduce this, create an HttpStatusAgent with "Url" set to any URL, and (dry) run it, sending an event with the key url.

@cantino
Copy link
Member

cantino commented Nov 6, 2016

Both cases seem like bugs to me. Do you think we should change those behaviors @darrencauthon?

@@ -66,7 +66,8 @@ def check
def receive(incoming_events)
incoming_events.each do |event|
interpolate_with(event) do
check_this_url interpolated[:url], header_array(interpolated[:headers_to_save])
check_this_url event.payload[:url] || interpolated[:url],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe event.payload[:url].presence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I'm not sure. If an incoming event had a url key, I'd presume the flow would have been built on the assumption that the HttpStatusAgent would check a received URL, not it would use a received event just as a trigger. So, falling back to using the value in the options might not be what the user intended.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need event.payload[:url]? Even if it seems intended in the specs, favoring the "raw" value could change the behavior of existing agents, say having {{ url | replace 'something', 'else' }} as the url option.

@strugee
Copy link
Contributor

strugee commented Nov 10, 2016

Sorry for the delayed response.

@knu frankly I'm new at Ruby so it doesn't surprise me that my code has bugs :(

The intention, IIRC, was that final_url would always be the last URL that was requested, regardless of redirects. So in the case where redirects were disabled, the fact that final_url was the same as url is the intended behavior.

As for redirected, I believe I manually tested it to make sure it worked but I could be wrong and in any case, I don't remember exactly how it was supposed to work. So I'd just pick an intelligent design and do that (which it looks like you have)

Re: bug number 2, that certainly wasn't the intention and should be fixed. Nice catch :)

@darrencauthon the http://www.google.com/ result is indeed correct - I had the same thought when implementing the changes in question. You can verify this on the command line:

% http HEAD http://google.com
HTTP/1.1 301 Moved Permanently
Cache-Control: public, max-age=2592000
Content-Length: 219
Content-Type: text/html; charset=UTF-8
Date: Thu, 10 Nov 2016 23:30:45 GMT
Expires: Sat, 10 Dec 2016 23:30:45 GMT
Location: http://www.google.com/
Server: gws
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

I have no idea why it behaves this way. My best guess would be that they never changed it because the HSTS preload list means that no one (in a browser) ever hits http://google.com anymore.

@knu
Copy link
Member Author

knu commented Nov 18, 2016

@darrencauthon final_url is set to the value of url when disable_redirect_follow is true, so there's no wonder in that result. Disabling redirect follow, final_url will always equal to url, and redirected will always be false regardless of the response.

If that's not intended, we can add specs and fix the implementation.

@darrencauthon
Copy link
Contributor

That makes sense... 😄

@knu
Copy link
Member Author

knu commented Nov 18, 2016

Can we merge this? #1772 has been blocked by this problem.

@@ -66,7 +66,8 @@ def check
def receive(incoming_events)
incoming_events.each do |event|
interpolate_with(event) do
check_this_url interpolated[:url], header_array(interpolated[:headers_to_save])
check_this_url event.payload[:url] || interpolated[:url],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need event.payload[:url]? Even if it seems intended in the specs, favoring the "raw" value could change the behavior of existing agents, say having {{ url | replace 'something', 'else' }} as the url option.

@@ -66,7 +66,8 @@ def check
def receive(incoming_events)
incoming_events.each do |event|
interpolate_with(event) do
check_this_url interpolated[:url], header_array(interpolated[:headers_to_save])
check_this_url event.payload[:url] || interpolated[:url],
header_array(event.payload[:headers_to_save] || interpolated[:headers_to_save])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question applies here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but this Agent is relatively young and unmatured, and there are contradictions between the written specs and the implementation. We have the core contributors/users here, so I take this as a good opportunity to hear from them, know the original intentions and get it right rather than worrying too much about compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that your intension is to implement the original specs, but I don't understand what the benefit is to support event.payload[:url]. Nobody could have been using it and liquid is always more flexible. Why should we potentially break existing use cases to support an intension that never worked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't have any strong opinion. I'm (currently) not a user of this Agent and I stepped forward just because its specs somehow depended on the internals I slightly changed in my other PR.

I see you are interested in this more than I do, so if we get no feedback soon I'll just follow your point and get this done, and I'll revisit later when I start using this. 😉

payload.merge!({ 'final_url' => final_url, 'redirected' => (url != final_url), 'response_received' => true, 'status' => current_status })
# Deal with headers
if local_headers.present?
header_results = measured_result.result.headers.select {|header, value| local_headers.include?(header)}
header_results = measured_result.result.headers.slice(*local_headers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't headers a normal ruby Hash? As far as I know slice is case-sensitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_headers is a Ruby array, so the original code was case-sensitive anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually the headers is a Faraday::Utils::Headers object which is a case-insensitive hash. The original code spoiled the case-insensitiveness by calling include? on a normal Ruby array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I checked the faraday docs which only say headers returns anObject which confused me. Do we need a spec to ensure it is kept case insensitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a spec for that!

This key was introduced in #1590 but it is likely that it has never
worked as expected.
That seems to be what was actually intended, judging from the specs.

See [my comment](#1521 (comment)) in #1521.
- Raplace hand-made mocking for web requests with Webmock

- Stop overriding internal methods of Agent like `interpolated`, because
  that made the specs not reflect actual behavior
@strugee
Copy link
Contributor

strugee commented Nov 28, 2016

A thousand 👍s to @knu for cleaning up my code :(

Thanks!

There's no need to, because Agent options can interpolate values in an
event payload.
@knu knu merged commit c0509ba into master Dec 31, 2016
@knu knu deleted the fix_http_status_agent branch December 31, 2016 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants