Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check all links #44

Closed
crandmck opened this issue Sep 8, 2016 · 13 comments
Closed

Check all links #44

crandmck opened this issue Sep 8, 2016 · 13 comments

Comments

@crandmck
Copy link
Contributor

crandmck commented Sep 8, 2016

We need to run a comprehensive link-check on the site to identify broken links.

This was referenced Sep 8, 2016
@doublemarked
Copy link
Contributor

Hey Rand - I've given the links a pass. There are two types of problems, 404's and links to valid pages but with an invalid fragment identifier. I've stuck everything in the following spreadsheet:

https://docs.google.com/spreadsheets/d/1cXtr617d_FiK2dDernkWkrJ8tFZRXw3ptqWpyQe1zTw/edit#gid=1949899933

Give it a look at let me know if you have any questions. There are a lot of dupes in the first sheet from broken links in header/footer (feeds.xml, Command-line-reference.html). I believe there are only ~50 unique 404s. I've only processed the LB2 docs.

@crandmck
Copy link
Contributor Author

crandmck commented Sep 8, 2016

Awesome! I'll get on it ASAP.

@bajtos bajtos added the #tob label Sep 9, 2016
@crandmck
Copy link
Contributor Author

Hi @doublemarked I ran thru that spreadsheet, and fixed all the links--well, I think so anyway. Many of the breakages were due to a couple broken links in the nav sidebar and in include templates, so that knocked out a bunch out.

If you would like, please run the link check again, and we'll see what's left! :-)

Thanks for your help!

@doublemarked
Copy link
Contributor

@crandmck Updated the spreadsheet: https://docs.google.com/spreadsheets/d/1cXtr617d_FiK2dDernkWkrJ8tFZRXw3ptqWpyQe1zTw/edit#gid=487594605

From the looks of it, just a handful of 404s remain but a lot of identifiers need to be fixed.

@doublemarked
Copy link
Contributor

doublemarked commented Sep 16, 2016

Let me share with you how I'm generating these lists. I'm still happy to rerun things too, just let me know.

For the broken links I'm using the following tool: http://peacockmedia.software/mac/integrity-plus/

For the bad identifiers I hacked together the following (execute from within _site/doc/en/lb2):

  1. Find all identifiers and map them into a list of valid urls w/ identifiers: for i in *.html; do export FILE=$i; perl -ne '/id="([^"]+?)"/ && print "$ENV{FILE}#$1\n"' $i; done > valid.txt
  2. Find all links that reference other docs w/ identifiers and filter out valid ones: for i in *.html; do export FILE=$i; perl -ne '/href="\/doc\/en\/lb2\/([^"#]+?)#([^"]+?)"/ && !fgrep -l "$1#$2" valid.txt && print "$ENV{FILE}\t$1#$2\n"' $i; done > invalid.txt

@crandmck
Copy link
Contributor Author

Awesome, thanks @doublemarked. I'll try to get to this over the weekend.

@crandmck
Copy link
Contributor Author

Thanks again @doublemarked, I fixed all the links identified in the Google doc. And I fixed a few others I found myself. ...

Thanks alsofor providing how you did this--could be useful in the future. I didn't have a chance to run it myself, so if you want to do so again, that would be great. At this point, I hope that most of the broken links have been fixed!

@doublemarked
Copy link
Contributor

@crandmck getting closer, but a few things remain. Spreadsheet updated w/ round 3 data!

@crandmck
Copy link
Contributor Author

OK, done! Third time is charm, I hope... (fingers crossed)

@doublemarked
Copy link
Contributor

The 404s from the crawler are unchanged. However, it looks like all the bad identifiers are fixed :)

@crandmck
Copy link
Contributor Author

crandmck commented Sep 21, 2016

Ha ha, I missed that whole tab!

So, the broken link to feed.xml was on every page. Fixed that with an edit to _config.yml.

I fixed all the other links, except for:

Anyway, I'm going to go ahead and close this issue, because I think we're pretty close to being done. If there are still some broken links, please either reopen or open an new issue.

THANK YOU @doublemarked for your help!!

@crandmck crandmck removed the #tob label Sep 21, 2016
@doublemarked
Copy link
Contributor

@crandmck ok all sounds good!

Just one response regarding,

A number of pages at the bottom of the sheet (which I marked in bold), that are not in the repo, e.g. http://loopback.io/doc/en/lb2/server/views/projects.ejs This may be some kind of artifact of the link checker?

Does not appear to be an artifact of the link checker. These are bad relative links to things that are probably supposed to be off-site or something. For example the projects.ejs one - it's caused by this link from readmes/loopback-example-access-control.md (included by Tutorial-access-control.md):

Sets the [`POST /projects` route to to render `projects.ejs` when credentials are valid](server/views/projects.ejs) and [renders `index.ejs`](https://github.com/strongloop/loopback-example-access-control/blob/master/server/views/index.ejs) when credentials are invalid

@crandmck
Copy link
Contributor Author

These are bad relative links to things that are probably supposed to be off-site or something.

Aha! Thanks for clarification. This makes more sense.

This is caused by relative links in included READMEs (2nd bullet in my previous comment). So, we'll address those in each README. Basically: a relative link in a README to another file in the repo works fine in GitHub, but not when the document is used anywhere else (e.g. npm or loopback.io).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants