Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to tell where a broken link is linked from #10

Open
alexwlchan opened this issue Feb 17, 2017 · 2 comments
Open

No way to tell where a broken link is linked from #10

alexwlchan opened this issue Feb 17, 2017 · 2 comments

Comments

@alexwlchan
Copy link
Contributor

alexwlchan commented Feb 17, 2017

Right now, I can use http-crawler to tell me about links that return non-20x errors. That could be for two reasons:

  1. The page should exist, and it’s broken (in which case I should fix it)
  2. The page doesn’t exist, and there’s a page with an incorrect link (in which case I should change it)

In the latter case, it’s hard to find the source of the broken link from http-crawler’s current output. It would be useful if it could tell me how it found a given link, so I can check the page that’s providing the link.

(Edited, I rushed the first draft of this issue.)

@inglesp
Copy link
Owner

inglesp commented Feb 26, 2017

Hey @alexwlchan. I agree this would be useful. Do you have any thoughts about what a good API for this might be?

@alexwlchan
Copy link
Contributor Author

alexwlchan commented Feb 26, 2017

Do you have any thoughts about what a good API for this might be?

I’m not sure.

In my fork (commit 0004f24), I decided to just dump a JSON representation of the entire "how seen" tree to disk. That works in a pinch, but it’s not very elegant.

Alternatively, you could subclass Response, then add an extra field how_seen. That’s not ideal either – we may find another link to a page after checking whether it’s live, and there’s no way to go back in time and update that field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants