Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect even more links from even more sources #1422

Merged
merged 1 commit into from May 18, 2019

Conversation

@da2x
Copy link
Contributor

commented Apr 27, 2019

Detailed description

  • Support area, base, blockquote, object, and source elements.
  • Support srcset attribute on img and source elements (for responsive images).
  • Support poster attribute on the video element.
  • Support ping attribute on a and area elements.
  • Support HTML+RDFa global about and resource attributes.

To do

  • A test including an image srcset attribute (I still can’t run the tests locally)
@ddfreyne

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

For the tests: I did a bit of digging and found

I think with therubyracer commented-out in the Gemfile, the tests might start to work. (Some other tests might fail, but at least it’s a start.)

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented Apr 28, 2019

So the mixed content filter depends on resource_uris_in_file only checking document embeds. I didn’t quite understand the difference between it and hrefs_in_file when looking at the code. OK, so all elements but a and area (unless they point to a document under the base_uri or the base element’s URI) can affect mixed content and the GLOBAL_ATTRs don’t either.

I’m inclined to say this is out of scope for this PR as the mixed content checker didn’t really do its job too well prior to this change; and this change will let the internal and external link checkers do a way better job.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

@da2x I’m rather confused by the original code myself now. :( (I assure you that this does not happen very often.)

LinkCollector would also better be named URLCollector I believe, since it collects URLs (for things to check). It might be pedantic, but URLs ≠ links, and I think the distinction is important in this case.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

More potential pedantry: tags ≠ elements, even though the LinkCollector refers to elements as tags.

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented Apr 28, 2019

It’s actually a URI collector and not a URL collector to be even more pedantic as it picks up on all URI schemes.

Anyhow, I’m unsure how to proceed with this. How about dropping the mixed content checker altogether? The current version wouldn’t pick up on a bunch of stuff yet I can’t find any bugs relating to it. To me, this suggests that no one is using it. Update: I can’t find any mentions of it outside the docs and this repo anywhere on the web either.

The mixed content checker could also be changed to check all links to the site conf base_uri domain under the http scheme. This would also pick up on internal links that could downgrade security. This wouldn’t pick up on external embeds, however.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

I’d like to keep the mixed_content check; it’s mentioned on the web site (https://nanoc.ws/doc/testing/ near the bottom).

Would it make sense to duplicate LinkCollector into one class for collecting URIs for the link checker, and another one for mixed content? The mixed-content one might still not be working great, but at least you’d have the freedom to structure URICollector the way you want and let it do what you need.

@da2x da2x force-pushed the da2x:patch-4 branch from e46b524 to 98f1d75 Apr 29, 2019

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

I’ll change things so resource_uris_in_file (for :mixed_content) returns embeds and hrefs_in_file returns everything (for :ilinks and :elinks). I’ll then fix al three checkers to use the right one. As a bonus this should speed up each check as each check will only need one pass through all the output files.

@da2x da2x force-pushed the da2x:patch-4 branch 3 times, most recently from 878c447 to e543679 Apr 29, 2019

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

Current status: Everything should be working including the mixed content checker (except downgrade links but those were already not tested). The internal_link checker now handles deeply-nested relativized links and more variations of same-directory file links.

I still can’t run the tests locally even when commenting out therubyracer. I’m on Fedora so gcc hardening is applied, however. I don’t want to blindly writing more tests, however.

@ddfreyne, what is the failing test? I can’t tell from the output. The same tests failed in #1424.

I believe this should be at a mergeable state now.

Update: I got tests running but with a really high random failure rates from tens of tests. Trying to work out those issues now.

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

So there was a number of issues higher than zero. I’ve addressed those now and added tests for new link collection sources, and changed the ilinks tests to work with URIs from the link collector.

I don’t know whats with test_nanoc_cruby25 and the CCI output doesn’t give me much to work with. test_nanoc_cruby24 times out after ten minutes but there are no details on what it got hung up on. Both also failed in the #1424 hotfix with the same CCI output.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

Yeah, the tests on CI started to become flaky and I have not investigated in detail yet. I plan on doing that soon!

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2019

OK, one more go at this. test_nanoc_cruby24 and test_nanoc_cruby25 still not giving me any usable information.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

Whoa, test_nanoc_cruby24 and test_nanoc_cruby25 don’t seem to do anything… but not consistently. I’ve rescheduled test runs because it seems that sometimes, they do work. I haven’t been able to figure out why, and this is the first time I notice it.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

Very odd — it seems that this particular branch fails consistently, but only on CI. I can’t make it fail locally.

I created a branch to test out this weird behavior (#1433), but that branch also does not fail, and I can’t make it fail even when re-running over and over.

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2019

Lets run this branch with trace then.

TODO: Don’t merge without reverting 05c5d19

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2019

OK, so that gave us no new information.

@da2x da2x force-pushed the da2x:patch-4 branch from 05c5d19 to eb444ef May 4, 2019

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2019

I rebased this branch again to no avail. I managed to setup a Ruby 2.5 environment locally and the tests ran just fine.

@ddfreyne Any idea why the #1424 branch failed in the same way? There is really no reason why that branch should have fail. Could you try restarting the tests in that branch for comparison?

Could it be that CCI just doesn’t like branches created by me? 😉

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

I’m also stumped. I’ve never seen this behavior — I cannot figure out why rake would silently exit.

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

I SSH’d into the test container (which you can do with CircleCI), changed the test configuration to be verbose, and got this:

$ bundle exec rake test --verbose --trace
** Invoke test (first_time)
** Invoke spec (first_time)
** Execute spec
/usr/local/bin/ruby -I/home/circleci/project/vendor/bundle/gems/rspec-core-3.8.0/lib:/home/circleci/project/vendor/bundle/gems/rspec-support-3.8.0/lib /home/circleci/project/vendor/bundle/gems/rspec-core-3.8.0/exe/rspec --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb

  1) Nanoc::Checking::Checks::ExternalLinks found has no issues
     Failure/Error: base_uri = URI::File.build([nil, filename])

     NameError:
       uninitialized constant URI::File
     # ./lib/nanoc/extra/link_collector.rb:88:in `uris_in_file'
     # ./lib/nanoc/extra/link_collector.rb:64:in `hrefs_in_file'
     # ./lib/nanoc/extra/link_collector.rb:43:in `block in filenames_per_href'
     # ./lib/nanoc/extra/link_collector.rb:78:in `block in grouped_filenames'
     # ./lib/nanoc/extra/link_collector.rb:77:in `each'
     # ./lib/nanoc/extra/link_collector.rb:77:in `grouped_filenames'
     # ./lib/nanoc/extra/link_collector.rb:43:in `filenames_per_href'
     # ./lib/nanoc/checking/checks/external_links.rb:14:in `run'
     # ./spec/nanoc/checking/checks/external_links_spec.rb:41:in `block (3 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot.rb:40:in `block (2 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:31:in `block (4 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:6:in `__nanoc_core_chdir'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:31:in `block (3 levels) in <top (required)>'
     # /home/circleci/project/common/spec/spec_helper_foot_core.rb:30:in `block (2 levels) in <top (required)>'

… and a handful of similar failures.


FYI, I changed the :spec task in Rakefile to say

RSpec::Core::RakeTask.new(:spec) do |t|
  t.verbose = true
end
@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

uninitialized constant URI::File does not make sense to me, because Nanoc does require uri, and then URI::File becomes available…

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

@da2x Ohhhh, URI::File does not exist in Ruby 2.5.x — it’s new in Ruby 2.6. That would explain…

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 4, 2019

The “test summary” is supposed to list all the failed tests, but that is clearly not quite working the way I expected…

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2019

Thanks, trying URI::Generic instead. Seems to handle Windows paths as well so should be fine.

What about test_nanoc_cruby24?

@ddfreyne

This comment has been minimized.

Copy link
Member

commented May 5, 2019

Can you apply the changes from #1434 to your branch? That should get you actual test results!

@da2x da2x force-pushed the da2x:patch-4 branch 2 times, most recently from f975bf5 to c2915d2 May 5, 2019

Switch LinkCollector to work with URIs and collect more of them
* Support area, base, blockquote, object, and source elements.
* Support srcset attribute on img and source elements (for responsive images).
* Support ping attribute on a and area elements.
* Support poster attribute on the video element.
* Support cite attribute on the blockquote element.
* Support HTML+RDFa global about and resource attributes.

@da2x da2x force-pushed the da2x:patch-4 branch from c2915d2 to d98ab46 May 5, 2019

@da2x

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2019

All green!

So there is a small behavioral difference in Ruby2.4 compared to 2.5 and 2.6 that I’ll just leave in. URI::Generic isn’t aware of the file URI scheme and blindly merges the protocol-level slashes. So it returns file:/X instead of file:///X and file:/c:/X instead of file://c:/X on Windows. Everything else works as expected, notably the improved/web-platform-not-filesystem-like handling of deeply nested relative paths, but you’ll need to be aware of the difference when handling raw output from the LinkCollector. I’ve changed one test to account for that difference.

@ddfreyne ddfreyne merged commit ab03501 into nanoc:master May 18, 2019

21 checks passed

ci/circleci: check_style_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: setup_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: setup_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: setup_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: test_guard_nanoc_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: test_guard_nanoc_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: test_guard_nanoc_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_core_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_core_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_core_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_external_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_external_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_external_cruby26 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_live_cruby24 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_live_cruby25 Your tests passed on CircleCI!
Details
ci/circleci: test_nanoc_live_cruby26 Your tests passed on CircleCI!
Details
codecov/patch Coverage not affected when comparing 29795b7...d98ab46
Details
codecov/project 97.75% (+<.01%) compared to 29795b7
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.