-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Angular merge #12
Angular merge #12
Conversation
OpenStructs are horrible for performance. I'm using a regular class now.
Adding myself to the gemspec. Cleaning up a few issues from RubyMine. Changing method names so they don't override Ruby built-in methods.
Here are some basic updates, nothing special, but I ran it and perf is looking much better!
Not sure if this is actually going to be working for us or not.
Conflicts: blinkr.gemspec lib/blinkr/report.html.slim
I have Angular working as far as generating the page, but filtering does not work yet.
Everything works, I want to do a bit more on the filtering though.
We're now checking site local links against the sitemap and issuing warnings if they are not in the sitemap, basically meaning they haven't been checked. RubyMine keeps messing up the errors template, so I have to keep fixing it.
browser.hydra.run if @config.browser == 'typhoeus' | ||
puts "Loaded #{page_count} pages using #{browser.name}." | ||
puts 'Analyzing pages' | ||
analyze context, TyphoeusWrapper.new(@config, context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you creating a new Typhoeus Hydra here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was running with Typhoeus instead of PhantomJS things were blowing up because it wasn't actually created, IIRC. Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was running with Typhoeus instead of PhantomJS things were blowing up because it wasn't actually created, IIRC.
I don't really understand what that means, in the context of this bit of code, but it doesn't answer the question I was asking - namely why are you creating a new TyphoeusWrapper (and hence Hydra)?
Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.
I also can't really understand what you are talking about here. PhantomJS is used to fetch a page and render it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was running with Typhoeus instead of PhantomJS things were blowing up because > > it wasn't actually created, IIRC.
I don't really understand what that means, in the context of this bit of code, but it doesn't answer the question I was asking - namely why are you creating a new TyphoeusWrapper (and hence Hydra)?
I think this would be better done in the links extension (and any other extension that needs to make additional calls). Currently the link extension is the only one that makes additional calls. Before there wasn't actually a typhoeus
variable so it went in nil into the analyze
method and was causing things to blow up.
Also in the analyses portion I don't think it makes sense to do things like the links extension with PhantomJS. I think eventually it'll make more sense to break out the additional functionality that the PhantomJS wrapper has (JavaScript, and something else, don't recall what right now) into it's own extension.
I also can't really understand what you are talking about here. PhantomJS is used to fetch a page and render it.
Currently PhantomJS fetches and renders the page, it then saves off any errors (resource and javascript) it finds while it renders the page (this info is in the snap.js file). This is SIGNIFICANTLY longer than simply calling a GET / HEAD because there are other requests that are then made while rendering the additional links.
I feel like it would be better to use Typhoeus or Manticore (if done on JRuby, which I've seen has fewer errors actually and more configuration than Typhoeus) instead of the browser the initial fetch was using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the link extension is the only one that makes additional calls. Before there wasn't actually a typhoeus variable so it went in nil into the analyze method and was causing things to blow up.
Gotcha, sorry. Ok, so I think it's important we use the same typhoeus instance all the same. So instead of creating a new one here, create one at the beginning and assign it to browser if needed, and then reuse here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently PhantomJS fetches and renders the page, it then saves off any errors (resource and javascript) it finds while it renders the page (this info is in the snap.js file). This is SIGNIFICANTLY longer than simply calling a GET / HEAD because there are other requests that are then made while rendering the additional links.
This won't work. You need to render the page, not just load it. Think about a page like the upstream projects page, the buzz page etc. where most of the links only appear once the JS has run.
I feel like it would be better to use Typhoeus or Manticore (if done on JRuby, which I've seen has fewer errors actually and more configuration than Typhoeus) instead of the browser the initial fetch was using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm talking about is in the Links extension. You don't need to render the child pages (pages linked to) from the parent page.
Here's the current sample run (hybrid phantomjs to start, then link extension with typhoeus)
/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:00:23-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #<Blinkr::Extensions::Links:0x000000027d5a80>, #<Blinkr::Extensions::JavaScript:0x000000027d5a30>, #<Blinkr::Extensions::Resources:0x000000027d5990>, #<Blinkr::Extensions::ImgAlt:0x000000027d5968>, #<Blinkr::Extensions::ATitle:0x000000027d5940>, #<Blinkr::Extensions::InlineCss:0x000000027d58f0>, #<Blinkr::Extensions::EmptyAHref:0x000000027d58c8>, #<Blinkr::Extensions::Meta:0x000000027d5878>, #<JBoss::Developer::Blinkr::BadText:0x000000027d5788>, #<JBoss::Developer::Blinkr::Searchisko:0x000000027d5710>
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.
Analyzing pages
----------------------
47 links to check
----------------------
Loading http://developerblog.redhat.com via typhoeus (attempt 2 of 10)
Loaded http://www.redhat.com/en/about/privacy-policy via typhoeus
Processed 1 of 17
Loading http://developerblog.redhat.com via typhoeus (attempt 3 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 4 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 5 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 6 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 7 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 8 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 9 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 10 of 10)
Loading http://developerblog.redhat.com via typhoeus failed
Loaded http://developerblog.redhat.com via typhoeus
Processed 2 of 17
Loaded http://www.redhat.com/contact/sales.html via typhoeus
Processed 3 of 17
Loaded http://www.redhat.com/en/services/training via typhoeus
Processed 4 of 17
Loaded http://www.redhat.com/about/subscription/ via typhoeus
Processed 5 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via typhoeus
Processed 6 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via typhoeus
Processed 7 of 17
Loaded http://www.redhat.com/en/about/terms-use via typhoeus
Processed 8 of 17
Loaded https://www.redhat.com via typhoeus
Processed 9 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via typhoeus
Processed 10 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via typhoeus
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via typhoeus
Processed 12 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via typhoeus
Processed 13 of 17
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via typhoeus
Processed 14 of 17
Loaded http://issues.jboss.org via typhoeus
Processed 15 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via typhoeus
Processed 16 of 17
Loaded http://community.jboss.org via typhoeus
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 19.347403277 Seconds
Here's the same run with using just phantomjs:
/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:05:05-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #<Blinkr::Extensions::Links:0x000000037f1770>, #<Blinkr::Extensions::JavaScript:0x000000037f16a8>, #<Blinkr::Extensions::Resources:0x000000037f1658>, #<Blinkr::Extensions::ImgAlt:0x000000037f15b8>, #<Blinkr::Extensions::ATitle:0x000000037f1590>, #<Blinkr::Extensions::InlineCss:0x000000037f1568>, #<Blinkr::Extensions::EmptyAHref:0x000000037f14f0>, #<Blinkr::Extensions::Meta:0x000000037f1478>, #<JBoss::Developer::Blinkr::BadText:0x000000037f13d8>, #<JBoss::Developer::Blinkr::Searchisko:0x000000037f1298>
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.
Analyzing pages
----------------------
47 links to check
----------------------
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via phantomjs
Processed 1 of 17
Loaded http://developerblog.redhat.com via phantomjs
Processed 2 of 17
Loaded http://www.redhat.com/about/subscription/ via phantomjs
Processed 3 of 17
Loaded http://www.redhat.com/contact/sales.html via phantomjs
Processed 4 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via phantomjs
Processed 5 of 17
Loaded http://community.jboss.org via phantomjs
Processed 6 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via phantomjs
Processed 7 of 17
Loaded http://issues.jboss.org via phantomjs
Processed 8 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via phantomjs
Processed 9 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via phantomjs
Processed 10 of 17
Loaded https://www.redhat.com via phantomjs
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via phantomjs
Processed 12 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via phantomjs
Processed 13 of 17
Loaded http://www.redhat.com/en/services/training via phantomjs
Processed 14 of 17
Loaded http://www.redhat.com/en/about/privacy-policy via phantomjs
Processed 15 of 17
Loaded http://www.redhat.com/en/about/terms-use via phantomjs
Processed 16 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via phantomjs
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 2 Minutes 5.853693367999995 Seconds
I'm pretty sure what we want is the first one, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, you need to use phantomjs to load the pages on the site, then typhoeus to load any links from that page, to check they don’t result in error pages. That is what should be happening today, except for the fact that you introduced a bug ;-)
On 9 Jun 2015, at 21:10, Jason Porter notifications@github.com wrote:
In lib/blinkr/engine.rb #12 (comment):
@@ -51,25 +58,24 @@ def run
puts "#{response.code} #{response.status_message} Unable to load page #{url} #{'(' + response.return_message + ')' unless response.return_message.nil?}"
end
end
typhoeus.hydra.run if @config.browser == 'typhoeus'
analyze context, typhoeus
puts "Loaded #{page_count} pages using #{browser.name}. Performed #{typhoeus.count} requests using typhoeus."
context.pages.reject! { |url, page| page.errors.empty? }
unless @config.export.nil?
File.open(@config.export, 'w') do |file|
file.write(context.to_json)
end
end
puts 'Executing Typhoeus::Hydra.run, this could take awhile' if @config.browser == 'typhoeus'
browser.hydra.run if @config.browser == 'typhoeus'
puts "Loaded #{page_count} pages using #{browser.name}."
puts 'Analyzing pages'
What I'm talking about is in the Links extension. You don't need to render the child pages (pages linked to) from the parent page.analyze context, TyphoeusWrapper.new(@config, context)
Here's the current sample run (hybrid phantomjs to start, then link extension with typhoeus)
/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:00:23-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #Blinkr::Extensions::Links:0x000000027d5a80, #Blinkr::Extensions::JavaScript:0x000000027d5a30, #Blinkr::Extensions::Resources:0x000000027d5990, #Blinkr::Extensions::ImgAlt:0x000000027d5968, #Blinkr::Extensions::ATitle:0x000000027d5940, #Blinkr::Extensions::InlineCss:0x000000027d58f0, #Blinkr::Extensions::EmptyAHref:0x000000027d58c8, #Blinkr::Extensions::Meta:0x000000027d5878, #JBoss::Developer::Blinkr::BadText:0x000000027d5788, #JBoss::Developer::Blinkr::Searchisko:0x000000027d5710
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.Analyzing pages
47 links to check
Loading http://developerblog.redhat.com via typhoeus (attempt 2 of 10)
Loaded http://www.redhat.com/en/about/privacy-policy via typhoeus
Processed 1 of 17
Loading http://developerblog.redhat.com via typhoeus (attempt 3 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 4 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 5 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 6 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 7 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 8 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 9 of 10)
Loading http://developerblog.redhat.com via typhoeus (attempt 10 of 10)
Loading http://developerblog.redhat.com via typhoeus failed
Loaded http://developerblog.redhat.com via typhoeus
Processed 2 of 17
Loaded http://www.redhat.com/contact/sales.html via typhoeus
Processed 3 of 17
Loaded http://www.redhat.com/en/services/training via typhoeus
Processed 4 of 17
Loaded http://www.redhat.com/about/subscription/ via typhoeus
Processed 5 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via typhoeus
Processed 6 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via typhoeus
Processed 7 of 17
Loaded http://www.redhat.com/en/about/terms-use via typhoeus
Processed 8 of 17
Loaded https://www.redhat.com via typhoeus
Processed 9 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via typhoeus
Processed 10 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via typhoeus
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via typhoeus
Processed 12 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via typhoeus
Processed 13 of 17
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via typhoeus
Processed 14 of 17
Loaded http://issues.jboss.org via typhoeus
Processed 15 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via typhoeus
Processed 16 of 17
Loaded http://community.jboss.org via typhoeus
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 19.347403277 Seconds
Here's the same run with using just phantomjs:/home/jporter/projects/ruby/blinkr/bin/blinkr -c /home/jporter/projects/developer.redhat.com/_config/blinkr.yaml -u http://developers.redhat.com -v
Started at 2015-06-09T14:05:05-06:00
Loaded custom pipeline from ../_ext/blinkr/blinkr.rb
Pipeline: #Blinkr::Extensions::Links:0x000000037f1770, #Blinkr::Extensions::JavaScript:0x000000037f16a8, #Blinkr::Extensions::Resources:0x000000037f1658, #Blinkr::Extensions::ImgAlt:0x000000037f15b8, #Blinkr::Extensions::ATitle:0x000000037f1590, #Blinkr::Extensions::InlineCss:0x000000037f1568, #Blinkr::Extensions::EmptyAHref:0x000000037f14f0, #Blinkr::Extensions::Meta:0x000000037f1478, #JBoss::Developer::Blinkr::BadText:0x000000037f13d8, #JBoss::Developer::Blinkr::Searchisko:0x000000037f1298
Loading sitemap from /home/jporter/projects/developer.redhat.com/_tmp/sitemap.xml
Fetching 1 pages from sitemap
Loaded page http://developers.redhat.com/faq/
Loaded 1 pages using phantomjs.Analyzing pages
47 links to check
Loaded https://issues.jboss.org/secure/CreateIssue!default.jspa via phantomjs
Processed 1 of 17
Loaded http://developerblog.redhat.com via phantomjs
Processed 2 of 17
Loaded http://www.redhat.com/about/subscription/ via phantomjs
Processed 3 of 17
Loaded http://www.redhat.com/contact/sales.html via phantomjs
Processed 4 of 17
Loaded http://www.redhat.com/resourcelibrary/articles/articles-red-hat-enterprise-linux-renewal-guide via phantomjs
Processed 5 of 17
Loaded http://community.jboss.org via phantomjs
Processed 6 of 17
Loaded http://docs.jivesoftware.com/jive/6.0/community_user/index.jsp via phantomjs
Processed 7 of 17
Loaded http://issues.jboss.org via phantomjs
Processed 8 of 17
Loaded https://confluence.atlassian.com/display/JIRA/JIRA+Documentation via phantomjs
Processed 9 of 17
Loaded https://community.jboss.org/emailPasswordToken%21input.jspa via phantomjs
Processed 10 of 17
Loaded https://www.redhat.com via phantomjs
Processed 11 of 17
Loaded https://access.redhat.com/security/team/contact/ via phantomjs
Processed 12 of 17
Loaded https://engage.redhat.com/forms/consulting-contact-sales via phantomjs
Processed 13 of 17
Loaded http://www.redhat.com/en/services/training via phantomjs
Processed 14 of 17
Loaded http://www.redhat.com/en/about/privacy-policy via phantomjs
Processed 15 of 17
Loaded http://www.redhat.com/en/about/terms-use via phantomjs
Processed 16 of 17
Loaded http://www.redhat.com/en/about/all-policies-guidelines via phantomjs
Processed 17 of 17
Wrote report to _tmp/blinkr/index.html
Total time: 2 Minutes 5.853693367999995 Seconds
I'm pretty sure what we want is the first one, yes?—
Reply to this email directly or view it on GitHub https://github.com/pmuir/blinkr/pull/12/files#r32057702.
Is is mostly for jRuby usage
link.query = nil | ||
unless context.pages.keys.include?(link.to_s) || context.pages.keys.include?((link.to_s + '/')) | ||
locations.each do |location| | ||
location[:page].errors << Blinkr::Error.new({:severity => :warning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right way to do this. It's an interesting idea for a new plugin, but the links plugin should check that links work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are for site internal links. The thought here is that all web pages that are site internal should be in the sitemap. If they're not, that's a problem. If they are in the site map then we've already checked it because we rendered it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is definitely a situation where there will be pages on the same domain that aren't in the sitemap. This happened e.g. on jboss.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this patch those are warnings, we could certainly make them errors, or we could the idea all together, your call as lead of the project :)
What else are we missing on this to merge? Put back the link stuff? |
An answer here could be to do a combo. First check the sitemap. If it's not there, check the URL using typhoeus. WDYT? |
Yeah, I could do that. |
Let me make sure everything is looking good then I'll commit the change. |
For internal links we check against site map, create a warning if it isn't there and then also check it with typhoeus.
Pushed, take a look @pmuir |
It looks good, but calling that var non_internal_links is a bit confusing, perhaps? |
This looks fairly large, but what we're doing here is adding in AngularJS to the report for templating and filtering.
Along the way I switched to an actual error class instead of the OpenStruct and also generating a JSON file to add to the report.