Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken links in notebooks. #3702

Open
kmcnaught opened this issue Mar 15, 2017 · 16 comments
Open

Broken links in notebooks. #3702

kmcnaught opened this issue Mar 15, 2017 · 16 comments

Comments

@kmcnaught
Copy link

A few notebooks have web links that are not parsed correctly. Links missing http[s]://www. are opened as paths relative to the notebook server. Emails need 'mailto:' if they are intended to be a clickable link.

The following is a list of notebook files and 'bad' links.

doc/ipython-notebooks/clustering/GMM.ipynb
github.com/karlnapf
herrstrathmann.de

doc/ipython-notebooks/clustering/.ipynb_checkpoints/GMM-checkpoint.ipynb
github.com/karlnapf
herrstrathmann.de

ipython-notebooks/evaluation/xval_modelselection.ipynb
github.com/karlnapf
herrstrathmann.de

ipython-notebooks/statistical_testing/mmd_two_sample_testing.ipynb
github.com/karlnapf
herrstrathmann.de
soumyajitde.cse@gmail.com
github.com/lambday

@karlnapf
Copy link
Member

Thanks for reporting.
Nice entrance task for GSoC students

@ghost
Copy link

ghost commented Mar 15, 2017

@karlnapf I would like to work on this. I'm planning to apply for real world applications project. This doesn't come under entrance for any other project right?

@karlnapf
Copy link
Member

The entrance tasks are not really project related, especially not those easy ones

@ghost
Copy link

ghost commented Mar 25, 2017

@karlnapf I think this can be closed now

@karlnapf
Copy link
Member

What about you run a linkchecker and then we close? :)

@ghost
Copy link

ghost commented Mar 26, 2017

I'm not sure I follow

@karlnapf
Copy link
Member

I meant: you could run an automated tool that verifies the links in all the notebooks in the repository. If broken ones are found, you send another patch, otherwise we close this issue.

@ghost
Copy link

ghost commented Mar 26, 2017

Sounds interesting. I'll get right to it.

@rahul13ramesh
Copy link

Here is a list of broken-links, that are not yet fixed

https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225

@karlnapf
Copy link
Member

Thanks for that! Really useful to have that list!

@bhavukkalra
Copy link
Contributor

Were all the links verified?
Could you please provide the list of broken links above. It doesn't seems to be opening(might be a broken link)
i.e
https://gist.github.com/Red-devilz/67dee8c8afc2502202b16466ff6da225
(list of broken links)(not opening)

@vigsterkr
Copy link
Member

vigsterkr commented Mar 10, 2020

@bhavukkalra the easiest is that you open the notebooks and try to open the links... of course there's a smarter way to do it. basically do a regex (for http://....) on the notebooks, get out the links and try to fetch them with curl or wget or any other command line tool, and if the status code is not 200 then it's a broken link

@vigsterkr
Copy link
Member

and in fact if you write that shell script then could you please share it to this issue, coz then we can actually integrate that check into our CI ;)

@bhavukkalra
Copy link
Contributor

Sure.
A script that extracts link from notebooks in a file.
Run curl command on them and check if the links are broken(by printing links also conveying if it is broken or not).
Could you please confirm..
also can i do this with a python script instead of a shell script?
or is it a necessity?

@vigsterkr
Copy link
Member

@bhavukkalra yes... but no need to generate a file. just parse the notebooks, get the links, test them and print the ones that are broken. and of course you can use python for this, whichever is the easiest for you

@bhavukkalra
Copy link
Contributor

I was successfully able to extract links from a ipython file.
but the curl command seems to be working on made up(broken links) as well and not giving satisfactory results.
for example for the link --- https://www.shoguntoolbox.org/api/latest/classshogun_1_1DenseFatures.htm
should we use external libraries for this for example
https://pypi.org/project/LinkChecker/
or do we want to restrict using external libraries and make this from scratch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants