Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Angular merge #12

Merged
merged 29 commits into from
Jul 6, 2015
Merged

Angular merge #12

merged 29 commits into from
Jul 6, 2015

Conversation

LightGuard
Copy link
Collaborator

This looks fairly large, but what we're doing here is adding in AngularJS to the report for templating and filtering.

Along the way I switched to an actual error class instead of the OpenStruct and also generating a JSON file to add to the report.

LightGuard and others added 18 commits May 12, 2015 16:09
OpenStructs are horrible for performance. I'm using a regular class now.
Adding myself to the gemspec.
Cleaning up a few issues from RubyMine.
Changing method names so they don't override Ruby built-in methods.
Here are some basic updates, nothing special, but I ran it and perf is looking much better!
Not sure if this is actually going to be working for us or not.
300 level response codes create warnings, not errors now.
We still need to check the links in the site against the sitemap.
Conflicts:
	blinkr.gemspec
	lib/blinkr/report.html.slim
I have Angular working as far as generating the page, but filtering does not work yet.
Everything works, I want to do a bit more on the filtering though.
We're now checking site local links against the sitemap and issuing
warnings if they are not in the sitemap, basically meaning they haven't
been checked.

RubyMine keeps messing up the errors template, so I have to keep fixing
it.
browser.hydra.run if @config.browser == 'typhoeus'
puts "Loaded #{page_count} pages using #{browser.name}."
puts 'Analyzing pages'
analyze context, TyphoeusWrapper.new(@config, context)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you creating a new Typhoeus Hydra here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was running with Typhoeus instead of PhantomJS things were blowing up because it wasn't actually created, IIRC. Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was running with Typhoeus instead of PhantomJS things were blowing up because it wasn't actually created, IIRC.

I don't really understand what that means, in the context of this bit of code, but it doesn't answer the question I was asking - namely why are you creating a new TyphoeusWrapper (and hence Hydra)?

Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.

I also can't really understand what you are talking about here. PhantomJS is used to fetch a page and render it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was running with Typhoeus instead of PhantomJS things were blowing up because > > it wasn't actually created, IIRC.

I don't really understand what that means, in the context of this bit of code, but it doesn't answer the question I was asking - namely why are you creating a new TyphoeusWrapper (and hence Hydra)?

I think this would be better done in the links extension (and any other extension that needs to make additional calls). Currently the link extension is the only one that makes additional calls. Before there wasn't actually a typhoeus variable so it went in nil into the analyze method and was causing things to blow up.

Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.

I also can't really understand what you are talking about here. PhantomJS is used to fetch a page and render it.

Currently PhantomJS fetches and renders the page, it then saves off any errors (resource and javascript) it finds while it renders the page (this info is in the snap.js file). This is SIGNIFICANTLY longer than simply calling a GET / HEAD because there are other requests that are then made while rendering the additional links.

I feel like it would be better to use Typhoeus or Manticore (if done on JRuby, which I've seen has fewer errors actually and more configuration than Typhoeus) instead of the browser the initial fetch was using.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the link extension is the only one that makes additional calls. Before there wasn't actually a typhoeus variable so it went in nil into the analyze method and was causing things to blow up.

Gotcha, sorry. Ok, so I think it's important we use the same typhoeus instance all the same. So instead of creating a new one here, create one at the beginning and assign it to browser if needed, and then reuse here...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently PhantomJS fetches and renders the page, it then saves off any errors (resource and javascript) it finds while it renders the page (this info is in the snap.js file). This is SIGNIFICANTLY longer than simply calling a GET / HEAD because there are other requests that are then made while rendering the additional links.

This won't work. You need to render the page, not just load it. Think about a page like the upstream projects page, the buzz page etc. where most of the links only appear once the JS has run.

I feel like it would be better to use Typhoeus or Manticore (if done on JRuby, which I've seen has fewer errors actually and more configuration than Typhoeus) instead of the browser the initial fetch was using.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm talking about is in the Links extension. You don't need to render the child pages (pages linked to) from the parent page.

Here's the current sample run (hybrid phantomjs to start, then link extension with typhoeus)

/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:00:23-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #<Blinkr::Extensions::Links:0x000000027d5a80>, #<Blinkr::Extensions::JavaScript:0x000000027d5a30>, #<Blinkr::Extensions::Resources:0x000000027d5990>, #<Blinkr::Extensions::ImgAlt:0x000000027d5968>, #<Blinkr::Extensions::ATitle:0x000000027d5940>, #<Blinkr::Extensions::InlineCss:0x000000027d58f0>, #<Blinkr::Extensions::EmptyAHref:0x000000027d58c8>, #<Blinkr::Extensions::Meta:0x000000027d5878>, #<JBoss::Developer::Blinkr::BadText:0x000000027d5788>, #<JBoss::Developer::Blinkr::Searchisko:0x000000027d5710>
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.
Analyzing pages
----------------------
 47 links to check 
----------------------
Loading http://developerblog.redhat.com via typhoeus (attempt 2 of 10)
Loaded http://www.redhat.com/en/about/privacy-policy via typhoeus 
Processed 1 of 17
Loading http://developerblog.redhat.com via typhoeus (attempt 3 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 4 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 5 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 6 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 7 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 8 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 9 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 10 of 10)
Loading http://developerblog.redhat.com via typhoeus failed
Loaded http://developerblog.redhat.com via typhoeus 
Processed 2 of 17
Loaded http://www.redhat.com/contact/sales.html via typhoeus 
Processed 3 of 17
Loaded http://www.redhat.com/en/services/training via typhoeus 
Processed 4 of 17
Loaded http://www.redhat.com/about/subscription/ via typhoeus 
Processed 5 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via typhoeus 
Processed 6 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via typhoeus 
Processed 7 of 17
Loaded http://www.redhat.com/en/about/terms-use via typhoeus 
Processed 8 of 17
Loaded https://www.redhat.com via typhoeus 
Processed 9 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via typhoeus 
Processed 10 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via typhoeus 
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via typhoeus 
Processed 12 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via typhoeus 
Processed 13 of 17
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via typhoeus 
Processed 14 of 17
Loaded http://issues.jboss.org via typhoeus 
Processed 15 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via typhoeus 
Processed 16 of 17
Loaded http://community.jboss.org via typhoeus 
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 19.347403277 Seconds

Here's the same run with using just phantomjs:

/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:05:05-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #<Blinkr::Extensions::Links:0x000000037f1770>, #<Blinkr::Extensions::JavaScript:0x000000037f16a8>, #<Blinkr::Extensions::Resources:0x000000037f1658>, #<Blinkr::Extensions::ImgAlt:0x000000037f15b8>, #<Blinkr::Extensions::ATitle:0x000000037f1590>, #<Blinkr::Extensions::InlineCss:0x000000037f1568>, #<Blinkr::Extensions::EmptyAHref:0x000000037f14f0>, #<Blinkr::Extensions::Meta:0x000000037f1478>, #<JBoss::Developer::Blinkr::BadText:0x000000037f13d8>, #<JBoss::Developer::Blinkr::Searchisko:0x000000037f1298>
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.
Analyzing pages
----------------------
 47 links to check 
----------------------
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via phantomjs 
Processed 1 of 17
Loaded http://developerblog.redhat.com via phantomjs 
Processed 2 of 17
Loaded http://www.redhat.com/about/subscription/ via phantomjs 
Processed 3 of 17
Loaded http://www.redhat.com/contact/sales.html via phantomjs 
Processed 4 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via phantomjs 
Processed 5 of 17
Loaded http://community.jboss.org via phantomjs 
Processed 6 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via phantomjs 
Processed 7 of 17
Loaded http://issues.jboss.org via phantomjs 
Processed 8 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via phantomjs 
Processed 9 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via phantomjs 
Processed 10 of 17
Loaded https://www.redhat.com via phantomjs 
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via phantomjs 
Processed 12 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via phantomjs 
Processed 13 of 17
Loaded http://www.redhat.com/en/services/training via phantomjs 
Processed 14 of 17
Loaded http://www.redhat.com/en/about/privacy-policy via phantomjs 
Processed 15 of 17
Loaded http://www.redhat.com/en/about/terms-use via phantomjs 
Processed 16 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via phantomjs 
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 2 Minutes 5.853693367999995 Seconds

I'm pretty sure what we want is the first one, yes?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you need to use phantomjs to load the pages on the site, then typhoeus to load any links from that page, to check they don’t result in error pages. That is what should be happening today, except for the fact that you introduced a bug ;-)

c67917c c67917c

On 9 Jun 2015, at 21:10, Jason Porter notifications@github.com wrote:

In lib/blinkr/engine.rb #12 (comment):

@@ -51,25 +58,24 @@ def run
puts "#{response.code} #{response.status_message} Unable to load page #{url} #{'(' + response.return_message + ')' unless response.return_message.nil?}"
end
end

  •  typhoeus.hydra.run if @config.browser == 'typhoeus'
    
  •  analyze context, typhoeus
    
  •  puts "Loaded #{page_count} pages using #{browser.name}. Performed #{typhoeus.count} requests using typhoeus."
    
  •  context.pages.reject! { |url, page| page.errors.empty? }
    
  •  unless @config.export.nil?
    
  •    File.open(@config.export, 'w') do |file| 
    
  •      file.write(context.to_json)
    
  •    end
    
  •  end
    
  •  puts 'Executing Typhoeus::Hydra.run, this could take awhile' if @config.browser == 'typhoeus'
    
  •  browser.hydra.run if @config.browser == 'typhoeus'
    
  •  puts "Loaded #{page_count} pages using #{browser.name}."
    
  •  puts 'Analyzing pages'
    
  •  analyze context, TyphoeusWrapper.new(@config, context)
    
    What I'm talking about is in the Links extension. You don't need to render the child pages (pages linked to) from the parent page.

Here's the current sample run (hybrid phantomjs to start, then link extension with typhoeus)

/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:00:23-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #Blinkr::Extensions::Links:0x000000027d5a80, #Blinkr::Extensions::JavaScript:0x000000027d5a30, #Blinkr::Extensions::Resources:0x000000027d5990, #Blinkr::Extensions::ImgAlt:0x000000027d5968, #Blinkr::Extensions::ATitle:0x000000027d5940, #Blinkr::Extensions::InlineCss:0x000000027d58f0, #Blinkr::Extensions::EmptyAHref:0x000000027d58c8, #Blinkr::Extensions::Meta:0x000000027d5878, #JBoss::Developer::Blinkr::BadText:0x000000027d5788, #JBoss::Developer::Blinkr::Searchisko:0x000000027d5710
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.

Analyzing pages

47 links to check

Loading http://developerblog.redhat.com via typhoeus (attempt 2 of 10)
Loaded http://www.redhat.com/en/about/privacy-policy via typhoeus
Processed 1 of 17
Loading http://developerblog.redhat.com via typhoeus (attempt 3 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 4 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 5 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 6 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 7 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 8 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 9 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 10 of 10)
Loading http://developerblog.redhat.com via typhoeus failed
Loaded http://developerblog.redhat.com via typhoeus
Processed 2 of 17
Loaded http://www.redhat.com/contact/sales.html via typhoeus
Processed 3 of 17
Loaded http://www.redhat.com/en/services/training via typhoeus
Processed 4 of 17
Loaded http://www.redhat.com/about/subscription/ via typhoeus
Processed 5 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via typhoeus
Processed 6 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via typhoeus
Processed 7 of 17
Loaded http://www.redhat.com/en/about/terms-use via typhoeus
Processed 8 of 17
Loaded https://www.redhat.com via typhoeus
Processed 9 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via typhoeus
Processed 10 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via typhoeus
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via typhoeus
Processed 12 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via typhoeus
Processed 13 of 17
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via typhoeus
Processed 14 of 17
Loaded http://issues.jboss.org via typhoeus
Processed 15 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via typhoeus
Processed 16 of 17
Loaded http://community.jboss.org via typhoeus
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 19.347403277 Seconds
Here's the same run with using just phantomjs:

/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:05:05-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #Blinkr::Extensions::Links:0x000000037f1770, #Blinkr::Extensions::JavaScript:0x000000037f16a8, #Blinkr::Extensions::Resources:0x000000037f1658, #Blinkr::Extensions::ImgAlt:0x000000037f15b8, #Blinkr::Extensions::ATitle:0x000000037f1590, #Blinkr::Extensions::InlineCss:0x000000037f1568, #Blinkr::Extensions::EmptyAHref:0x000000037f14f0, #Blinkr::Extensions::Meta:0x000000037f1478, #JBoss::Developer::Blinkr::BadText:0x000000037f13d8, #JBoss::Developer::Blinkr::Searchisko:0x000000037f1298
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.

Analyzing pages

47 links to check

Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via phantomjs
Processed 1 of 17
Loaded http://developerblog.redhat.com via phantomjs
Processed 2 of 17
Loaded http://www.redhat.com/about/subscription/ via phantomjs
Processed 3 of 17
Loaded http://www.redhat.com/contact/sales.html via phantomjs
Processed 4 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via phantomjs
Processed 5 of 17
Loaded http://community.jboss.org via phantomjs
Processed 6 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via phantomjs
Processed 7 of 17
Loaded http://issues.jboss.org via phantomjs
Processed 8 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via phantomjs
Processed 9 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via phantomjs
Processed 10 of 17
Loaded https://www.redhat.com via phantomjs
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via phantomjs
Processed 12 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via phantomjs
Processed 13 of 17
Loaded http://www.redhat.com/en/services/training via phantomjs
Processed 14 of 17
Loaded http://www.redhat.com/en/about/privacy-policy via phantomjs
Processed 15 of 17
Loaded http://www.redhat.com/en/about/terms-use via phantomjs
Processed 16 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via phantomjs
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 2 Minutes 5.853693367999995 Seconds
I'm pretty sure what we want is the first one, yes?


Reply to this email directly or view it on GitHub https://github.com/pmuir/blinkr/pull/12/files#r32057702.

link.query = nil
unless context.pages.keys.include?(link.to_s) || context.pages.keys.include?((link.to_s + '/'))
locations.each do |location|
location[:page].errors << Blinkr::Error.new({:severity => :warning,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right way to do this. It's an interesting idea for a new plugin, but the links plugin should check that links work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are for site internal links. The thought here is that all web pages that are site internal should be in the sitemap. If they're not, that's a problem. If they are in the site map then we've already checked it because we rendered it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is definitely a situation where there will be pages on the same domain that aren't in the sitemap. This happened e.g. on jboss.org

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this patch those are warnings, we could certainly make them errors, or we could the idea all together, your call as lead of the project :)

@LightGuard
Copy link
Collaborator Author

What else are we missing on this to merge? Put back the link stuff?

@pmuir
Copy link
Owner

pmuir commented Jun 23, 2015

An answer here could be to do a combo. First check the sitemap. If it's not there, check the URL using typhoeus. WDYT?

@LightGuard
Copy link
Collaborator Author

Yeah, I could do that.

@LightGuard
Copy link
Collaborator Author

Let me make sure everything is looking good then I'll commit the change.

For internal links we check against site map, create a warning if it
isn't there and then also check it with typhoeus.
@LightGuard
Copy link
Collaborator Author

Pushed, take a look @pmuir

@pmuir
Copy link
Owner

pmuir commented Jul 6, 2015

It looks good, but calling that var non_internal_links is a bit confusing, perhaps?

pmuir added a commit that referenced this pull request Jul 6, 2015
@pmuir pmuir merged commit d8ff459 into pmuir:master Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants