Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken links in the documentation #25024

Closed
5 tasks done
lesteve opened this issue Nov 24, 2022 · 14 comments
Closed
5 tasks done

Fix broken links in the documentation #25024

lesteve opened this issue Nov 24, 2022 · 14 comments
Labels
Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted

Comments

@lesteve
Copy link
Member

lesteve commented Nov 24, 2022

A follow-up of #23631.

If you want to work on this, please:

  • do one Pull Request per link
  • add a comment in this issue saying which link you want to tackle so that different people can work on this issue in parallel
  • mention this issue (#25024) in your Pull Request description so that progress on this issue can more easily be tracked

Possible solutions for a broken link include:

  • find a replacement for the broken link. In case of links to articles, being able to link to a resource where the article is openly accessible (rather than behind a paywall) would be nice.
  • The link can be added to the linkcheck_ignore variable:
    linkcheck_ignore = [
    . This is the only thing to do for example when:
    • the link is broken with no replacement (for example in testimonials some companies were acquired and their website does not exist)
    • the link works fine in a browser but is flagged as broken by make linkcheck tool. This may happen because some websites are trying to prevent bots to scrape the content of their website

Something that may be useful in the complicated cases is to search on the Internet Archive for the broken link. You may be able to look at the old content and it may help you to find an appropriate link replacement.

List of broken links from a make linkcheck local run:

  • https://devguide.python.org/triaging/#becoming-a-member-of-the-python-triage-team governance.rst
    Anchor 'becoming-a-member-of-the-python-triage-team' not found
    
  • https://pymc-devs.github.io/pymc/ related_projects.rst
    404 Client Error: Not Found for url: https://pymc-devs.github.io/pymc/
    
  • https://tminka.github.io/papers/logreg/minka-logreg.pdf/ modules/linear_model.rst
    404 Client Error: Not Found for url: https://tminka.github.io/papers/logreg/minka-logreg.pdf/
    
  • [ ] https://pkgs.alpinelinux.org/packages?name=py3-scikit-learn install.rst
    HTTPSConnectionPool(host='pkgs.alpinelinux.org', port=443): Read timed out. (read timeout=10)
    
  • https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf modules/clustering.rst
    404 Client Error: Not Found for url: https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
    
  • [ ] https://www.iro.umontreal.ca/~pift6266/A06/refs/backprop_old.pdf modules/neural_networks_supervised.rst
    HTTPSConnectionPool(host='www.iro.umontreal.ca', port=443): Max retries exceeded with url: /~pift6266/A06/refs/backprop_old.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fda35c47790>, 'Connection to www.iro.umontreal.ca timed out. (connect timeout=10)'))
    
  • https://github.com/joblib/threadpoolctl/#setting-the-maximum-size-of-thread-pools computing/parallelism.rst
    Anchor 'setting-the-maximum-size-of-thread-pools' not found
    
@lesteve lesteve added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Nov 24, 2022
@jasonjg
Copy link
Contributor

jasonjg commented Nov 25, 2022

@jasonjg
Copy link
Contributor

jasonjg commented Nov 25, 2022

Working on:

https://developers.google.com/open-source/

@lesteve
Copy link
Member Author

lesteve commented Nov 25, 2022

Working on:

developers.google.com/open-source

@jasonjg no idea why but rerunning make linkcheck the developers.google.com link is not flagged as broken anymore, I have updated the issue description.

I will merge your PR #25036 in any case, I find it a little bit better to update the link in this case.

@shrankhla20
Copy link
Contributor

shrankhla20 added a commit to shrankhla20/scikit-learn that referenced this issue Nov 25, 2022
@jasonjg
Copy link
Contributor

jasonjg commented Nov 25, 2022

@jasonjg no idea why but rerunning make linkcheck the developers.google.com link is not flagged as broken anymore, I have updated the issue description.

Not sure either, however status code 301 was being returned for developers.google.com/open-source and redirected to opensource.google

@shrankhla20
Copy link
Contributor

shrankhla20 added a commit to shrankhla20/scikit-learn that referenced this issue Nov 25, 2022
Fixed broken link for multiclass spectral clustering yu-shi reference 
scikit-learn#25024 issue
shrankhla20 added a commit to shrankhla20/scikit-learn that referenced this issue Nov 25, 2022
@ka00ri
Copy link
Contributor

ka00ri commented Nov 26, 2022

Working on Multiclass spectral clustering, 2003 in line 206 of _spectral.py

@gu1show
Copy link
Contributor

gu1show commented Nov 28, 2022

Working on: https://pymc-devs.github.io/pymc/.
There are no link to the project in related_projects.rst.

UPD:
I didn't find other links in their files.

@lesteve
Copy link
Member Author

lesteve commented Nov 29, 2022

Working on: pymc-devs.github.io/pymc.
There are no link to the project in related_projects.rst.

This has been already fixed in #25027, I have updated the description and ticked the associated box.

@gu1show
Copy link
Contributor

gu1show commented Nov 29, 2022

OK, but what with the last two links. There are no in the files you wrote.

@lesteve
Copy link
Member Author

lesteve commented Nov 29, 2022

The last two links are in doc/install.rst and doc/modules/neural_networks_supervised.rst

git grep is quite useful in this kind of cases, for example if I am looking for the second link with backprop_old in it:

❯ git grep backprop_old 
doc/modules/neural_networks_supervised.rst:      <https://www.iro.umontreal.ca/~pift6266/A06/refs/backprop_old.pdf>`_

@gu1show
Copy link
Contributor

gu1show commented Nov 29, 2022

The last two links work correctly.

@lesteve
Copy link
Member Author

lesteve commented Nov 30, 2022

Indeed not sure why they were flagged as broken by make linkcheck. I updated the issue description to cross them out.

I added another one https://github.com/joblib/threadpoolctl/#setting-the-maximum-size-of-thread-pools that is a valid link and that needs to be added to linkcheck_ignore as explained in the issue description. You are welcome to work on it if you want!

@lesteve
Copy link
Member Author

lesteve commented Dec 1, 2022

I reran make linkcheck and there are no broken links anymore, thanks a lot to everyone who worked on this issue!

@lesteve lesteve closed this as completed Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted
Projects
None yet
Development

No branches or pull requests

5 participants