Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some URLs are reported invalid but are in fact OK #581

Closed
camilleroux opened this issue Sep 21, 2020 · 3 comments
Closed

Some URLs are reported invalid but are in fact OK #581

camilleroux opened this issue Sep 21, 2020 · 3 comments

Comments

@camilleroux
Copy link

Here are some URLs that are really working in Firefox but are reported as invalid :

 https://doi.org/10.1007/s10072-020-04447-w is an invalid URL
 https://www.researchgate.net/profile/Dan_Ward9/publication/340539075_Sampling_Bias_Explaining_Wide_Variations_in_COVID-19_Case_Fatality_Rates/links/5ea67986299bf11256128426/Sampling-Bias-Explaining-Wide-Variations-in-COVID-19-Case-Fatality-Rates.pdf  is an invalid URL
 https://www.sciencedirect.com/science/article/pii/S1477893920303124 is an invalid URL
 https://doi.org/10.1038/s41586-020-2349-y is an invalid URL
 https://www.sciencedirect.com/science/article/pii/S0001879119301265?casa_token=h2tOm3iqTmYAAAAA:oNK0sZvDVXxE2O1Sr46BiC2zGLnhQ1kQt1DIhX0c-jq6yZsh7fw78cOK0t2zVkqcguscB4BdZ-c is an invalid URL
 https://kjhrm.sljol.info/articles/10.4038/kjhrm.v12i1.41/galley/76/download/ is an invalid URL
 https://doi.org/10.3201/eid2607.200885 is an invalid URL
 http://faculty.washington.edu/jdb/articles/Brown%20(2012,%20PSPB,%20motives).pdf is an invalid URL
 https://www.sciencedirect.com/science/article/pii/S0006291X20314169 is an invalid URL
 https://www.medrxiv.org/content/10.1101/2020.04.07.20045617v1 is an invalid URL
 https://obgyn.onlinelibrary.wiley.com/doi/full/10.1002/ijgo.13182 is an invalid URL
 https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death  is an invalid URL
 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7170399/  is an invalid URL
 https://doi.org/10.1016/S1473-3099(20)30287-5 is an invalid URL
 https://www.medrxiv.org/content/10.1101/2020.03.05.20031815v1.full.pdf is an invalid URL
@mazzystr
Copy link

curl shows https://doi.org/10.1007/s10072-020-04447-w to be a redirect

$ curl -vvvv  https://doi.org/10.1007/s10072-020-04447-w | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.67.72.147...
* TCP_NODELAY set
* Connected to doi.org (172.67.72.147) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, [no content] (0):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2187 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=CA; L=San Francisco; O=Cloudflare, Inc.; CN=doi.org
*  start date: Aug  8 00:00:00 2020 GMT
*  expire date: Aug  8 12:00:00 2021 GMT
*  subjectAltName: host "doi.org" matched cert's "doi.org"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
* Using Stream ID: 1 (easy handle 0x560c68bed740)
} [5 bytes data]
* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
> GET /10.1007/s10072-020-04447-w HTTP/2
> Host: doi.org
> User-Agent: curl/7.61.1
> Accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, [no content] (0):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [230 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [230 bytes data]
* TLSv1.3 (IN), TLS app data, [no content] (0):
{ [1 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
} [5 bytes data]
* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
* TLSv1.3 (IN), TLS app data, [no content] (0):
{ [1 bytes data]
* TLSv1.3 (IN), TLS app data, [no content] (0):
{ [1 bytes data]
< HTTP/2 302 <-------------------------------------------------------------------------------
< date: Wed, 30 Sep 2020 22:21:34 GMT
< content-type: text/html;charset=utf-8
< content-length: 187
< set-cookie: __cfduid=d02d2de43e78bed33e2f9a78579db12081601504494; expires=Fri, 30-Oct-20 22:21:34 GMT; path=/; domain=.doi.org; HttpOnly; SameSite=Lax; Secure
< vary: Accept
< location: http://link.springer.com/10.1007/s10072-020-04447-w
< expires: Wed, 30 Sep 2020 22:38:59 GMT
< cf-cache-status: DYNAMIC
< cf-request-id: 0582b45aaa0000093576b17200000001
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< server: cloudflare
< cf-ray: 5db156711a4a0935-SEA
<
{ [187 bytes data]
* TLSv1.3 (IN), TLS app data, [no content] (0):
{ [1 bytes data]
100   187  100   187    0     0    979      0 --:--:-- --:--:-- --:--:--   979
* Connection #0 to host doi.org left intact
<html><head><title>Handle Redirect</title></head>
<body><a href="http://link.springer.com/10.1007/s10072-020-04447-w">http://link.springer.com/10.1007/s10072-020-04447-w</a></body></html>

@tomtastic
Copy link

Similar issue with https://www.namesilo.com/, which is behind CloudFlare and wants cookies enabled to query without generating :

  *  External link https://www.namesilo.com/ failed: 403 No error
htmlproofer 3.14.1 | Error:  HTML-Proofer found 1 failure!

@gjtorikian
Copy link
Owner

I tried some of these and they worked fine. I did also get a 403 for some others, and, if it's because of a lack of cookies like @tomtastic reports, then there's nothing that can be done. This project isn't a full-on browser so any way to query a URL would be impossible if the host server has deemed it so.

But yeah some more information (full build logs, environment which is being run under) would help debug it a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants