Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip patterns ending with # do not seem to work. #4

Closed
chalin opened this issue Dec 16, 2016 · 2 comments
Closed

Skip patterns ending with # do not seem to work. #4

chalin opened this issue Dec 16, 2016 · 2 comments
Assignees
Labels

Comments

@chalin
Copy link
Collaborator

chalin commented Dec 16, 2016

I've been testing the skip pattern

/angular/guide/server-communication#

over site-webdev.

Here is part of the debug output:

Crawl will start on the following URLs: [http://localhost:4001/]
Crawl will check pages only on URLs satisfying: {http://localhost:4001/**}
Crawl will skip links that match patterns: UrlSkipper</angular/api/.*apiFilter, data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg', /angular/api/, /angular/guide/router(\.html)?($|#), /angular/guide/change-log.html$, /angular/cookbook/, /angular/guide/appmodule.html$, /angular/guide/server-communication#, /angular/api/static-assets/fonts, /angular/api/(docs|examples)/, /angular/api/.*/index/>
Crawl will check the following servers (and their robots.txt) first: {localhost:4001}
...

http://localhost:4001/angular/guide/server-communication
- (533:18) 'RxJS Obs..' => http://localhost:4001/angular/guide/server-communication#rxjs (HTTP 200 but missing anchor)
- (535:18) 'Enabling..' => http://localhost:4001/angular/guide/server-communication#enable-rxjs-operators (HTTP 200 but missing anchor)
...

Stats:
   14465 links
     331 destination URLs
     347 URLs ignored
      12 warnings
       0 errors

It should be skipping .../server-communication#rxjs.

@filiph
Copy link
Owner

filiph commented Dec 16, 2016

Ok, note to self. We're currently decoupling fragments (#anchor) from new destination URLs so that linkcheck tries to access physical URLs only once. (Otherwise, it would assume the different URLs of /path#anchor1 and /path#anchor2 both need checking.)

But skipping according to fragment should work. So we need to move the skipping logic a bit higher up, before we decouple the fragment from the URL. We still need to make sure we create the Destination.

@filiph filiph self-assigned this Dec 16, 2016
@filiph filiph added the bug label Dec 16, 2016
@filiph
Copy link
Owner

filiph commented Dec 17, 2016

Fixed in 1.0.1 by d213f2c. Tried this on site-webdev and it seems to work well now. Please do report any irregularities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants