Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect even more links from even more sources #1422

Merged
merged 1 commit into from May 18, 2019

Conversation

da2x
Copy link
Contributor

@da2x da2x commented Apr 27, 2019

Detailed description

  • Support area, base, blockquote, object, and source elements.
  • Support srcset attribute on img and source elements (for responsive images).
  • Support poster attribute on the video element.
  • Support ping attribute on a and area elements.
  • Support HTML+RDFa global about and resource attributes.

To do

  • A test including an image srcset attribute (I still can’t run the tests locally)

@denisdefreyne
Copy link
Member

For the tests: I did a bit of digging and found

I think with therubyracer commented-out in the Gemfile, the tests might start to work. (Some other tests might fail, but at least it’s a start.)

@da2x
Copy link
Contributor Author

da2x commented Apr 28, 2019

So the mixed content filter depends on resource_uris_in_file only checking document embeds. I didn’t quite understand the difference between it and hrefs_in_file when looking at the code. OK, so all elements but a and area (unless they point to a document under the base_uri or the base element’s URI) can affect mixed content and the GLOBAL_ATTRs don’t either.

I’m inclined to say this is out of scope for this PR as the mixed content checker didn’t really do its job too well prior to this change; and this change will let the internal and external link checkers do a way better job.

@denisdefreyne
Copy link
Member

@da2x I’m rather confused by the original code myself now. :( (I assure you that this does not happen very often.)

LinkCollector would also better be named URLCollector I believe, since it collects URLs (for things to check). It might be pedantic, but URLs ≠ links, and I think the distinction is important in this case.

@denisdefreyne
Copy link
Member

More potential pedantry: tags ≠ elements, even though the LinkCollector refers to elements as tags.

@da2x
Copy link
Contributor Author

da2x commented Apr 28, 2019

It’s actually a URI collector and not a URL collector to be even more pedantic as it picks up on all URI schemes.

Anyhow, I’m unsure how to proceed with this. How about dropping the mixed content checker altogether? The current version wouldn’t pick up on a bunch of stuff yet I can’t find any bugs relating to it. To me, this suggests that no one is using it. Update: I can’t find any mentions of it outside the docs and this repo anywhere on the web either.

The mixed content checker could also be changed to check all links to the site conf base_uri domain under the http scheme. This would also pick up on internal links that could downgrade security. This wouldn’t pick up on external embeds, however.

@denisdefreyne
Copy link
Member

denisdefreyne commented Apr 29, 2019

I’d like to keep the mixed_content check; it’s mentioned on the web site (https://nanoc.ws/doc/testing/ near the bottom).

Would it make sense to duplicate LinkCollector into one class for collecting URIs for the link checker, and another one for mixed content? The mixed-content one might still not be working great, but at least you’d have the freedom to structure URICollector the way you want and let it do what you need.

@da2x
Copy link
Contributor Author

da2x commented Apr 29, 2019

I’ll change things so resource_uris_in_file (for :mixed_content) returns embeds and hrefs_in_file returns everything (for :ilinks and :elinks). I’ll then fix al three checkers to use the right one. As a bonus this should speed up each check as each check will only need one pass through all the output files.

@da2x da2x force-pushed the patch-4 branch 3 times, most recently from 878c447 to e543679 Compare April 29, 2019 15:33
@da2x
Copy link
Contributor Author

da2x commented Apr 29, 2019

Current status: Everything should be working including the mixed content checker (except downgrade links but those were already not tested). The internal_link checker now handles deeply-nested relativized links and more variations of same-directory file links.

I still can’t run the tests locally even when commenting out therubyracer. I’m on Fedora so gcc hardening is applied, however. I don’t want to blindly writing more tests, however.

@ddfreyne, what is the failing test? I can’t tell from the output. The same tests failed in #1424.

I believe this should be at a mergeable state now.

Update: I got tests running but with a really high random failure rates from tens of tests. Trying to work out those issues now.

@da2x
Copy link
Contributor Author

da2x commented Apr 30, 2019

So there was a number of issues higher than zero. I’ve addressed those now and added tests for new link collection sources, and changed the ilinks tests to work with URIs from the link collector.

I don’t know whats with test_nanoc_cruby25 and the CCI output doesn’t give me much to work with. test_nanoc_cruby24 times out after ten minutes but there are no details on what it got hung up on. Both also failed in the #1424 hotfix with the same CCI output.

@denisdefreyne
Copy link
Member

Yeah, the tests on CI started to become flaky and I have not investigated in detail yet. I plan on doing that soon!

@da2x
Copy link
Contributor Author

da2x commented May 1, 2019

OK, one more go at this. test_nanoc_cruby24 and test_nanoc_cruby25 still not giving me any usable information.

@denisdefreyne
Copy link
Member

Whoa, test_nanoc_cruby24 and test_nanoc_cruby25 don’t seem to do anything… but not consistently. I’ve rescheduled test runs because it seems that sometimes, they do work. I haven’t been able to figure out why, and this is the first time I notice it.

@denisdefreyne
Copy link
Member

Very odd — it seems that this particular branch fails consistently, but only on CI. I can’t make it fail locally.

I created a branch to test out this weird behavior (#1433), but that branch also does not fail, and I can’t make it fail even when re-running over and over.

@da2x
Copy link
Contributor Author

da2x commented May 4, 2019

Lets run this branch with trace then.

TODO: Don’t merge without reverting 05c5d19

@da2x
Copy link
Contributor Author

da2x commented May 4, 2019

OK, so that gave us no new information.

@da2x
Copy link
Contributor Author

da2x commented May 4, 2019

I rebased this branch again to no avail. I managed to setup a Ruby 2.5 environment locally and the tests ran just fine.

@ddfreyne Any idea why the #1424 branch failed in the same way? There is really no reason why that branch should have fail. Could you try restarting the tests in that branch for comparison?

Could it be that CCI just doesn’t like branches created by me? 😉

@denisdefreyne
Copy link
Member

I’m also stumped. I’ve never seen this behavior — I cannot figure out why rake would silently exit.

@denisdefreyne
Copy link
Member

I SSH’d into the test container (which you can do with CircleCI), changed the test configuration to be verbose, and got this:

$ bundle exec rake test --verbose --trace
** Invoke test (first_time)
** Invoke spec (first_time)
** Execute spec
/usr/local/bin/ruby -I/home/circleci/project/vendor/bundle/gems/rspec-core-3.8.0/lib:/home/circleci/project/vendor/bundle/gems/rspec-support-3.8.0/lib /home/circleci/project/vendor/bundle/gems/rspec-core-3.8.0/exe/rspec --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb

  1) Nanoc::Checking::Checks::ExternalLinks found has no issues
     Failure/Error: base_uri = URI::File.build([nil, filename])

     NameError:
       uninitialized constant URI::File
     # ./lib/nanoc/extra/link_collector.rb:88:in `uris_in_file'
     # ./lib/nanoc/extra/link_collector.rb:64:in `hrefs_in_file'
     # ./lib/nanoc/extra/link_collector.rb:43:in `block in filenames_per_href'
     # ./lib/nanoc/extra/link_collector.rb:78:in `block in grouped_filenames'
     # ./lib/nanoc/extra/link_collector.rb:77:in `each'
     # ./lib/nanoc/extra/link_collector.rb:77:in `grouped_filenames'
     # ./lib/nanoc/extra/link_collector.rb:43:in `filenames_per_href'
     # ./lib/nanoc/checking/checks/external_links.rb:14:in `run'
     # ./spec/nanoc/checking/checks/external_links_spec.rb:41:in `block (3 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot.rb:40:in `block (2 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:31:in `block (4 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:6:in `__nanoc_core_chdir'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:31:in `block (3 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:30:in `block (2 levels) in <top (required)>'

… and a handful of similar failures.


FYI, I changed the :spec task in Rakefile to say

RSpec::Core::RakeTask.new(:spec) do |t|
  t.verbose = true
end

@denisdefreyne
Copy link
Member

uninitialized constant URI::File does not make sense to me, because Nanoc does require uri, and then URI::File becomes available…

@denisdefreyne
Copy link
Member

@da2x Ohhhh, URI::File does not exist in Ruby 2.5.x — it’s new in Ruby 2.6. That would explain…

@denisdefreyne
Copy link
Member

The “test summary” is supposed to list all the failed tests, but that is clearly not quite working the way I expected…

@da2x
Copy link
Contributor Author

da2x commented May 5, 2019

Thanks, trying URI::Generic instead. Seems to handle Windows paths as well so should be fine.

What about test_nanoc_cruby24?

@denisdefreyne
Copy link
Member

Can you apply the changes from #1434 to your branch? That should get you actual test results!

@da2x da2x force-pushed the patch-4 branch 2 times, most recently from f975bf5 to c2915d2 Compare May 5, 2019 16:47
* Support area, base, blockquote, object, and source elements.
* Support srcset attribute on img and source elements (for responsive images).
* Support ping attribute on a and area elements.
* Support poster attribute on the video element.
* Support cite attribute on the blockquote element.
* Support HTML+RDFa global about and resource attributes.
@da2x
Copy link
Contributor Author

da2x commented May 5, 2019

All green!

So there is a small behavioral difference in Ruby2.4 compared to 2.5 and 2.6 that I’ll just leave in. URI::Generic isn’t aware of the file URI scheme and blindly merges the protocol-level slashes. So it returns file:/X instead of file:///X and file:/c:/X instead of file://c:/X on Windows. Everything else works as expected, notably the improved/web-platform-not-filesystem-like handling of deeply nested relative paths, but you’ll need to be aware of the difference when handling raw output from the LinkCollector. I’ve changed one test to account for that difference.

@denisdefreyne denisdefreyne merged commit ab03501 into nanoc:master May 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants