New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: static-checks: Try multiple user agents #8592
CI: static-checks: Try multiple user agents #8592
Conversation
When I run this locally, all Kata doc URLs are reported as valid apart from https://developer.amd.com/sev/ which returns HTTP 301 ("Moved Permanently"). That should be fine since we request following redirects and in fact Interestingly, I really hope we don't need to cycle between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jodh-intel thanks for fixing this! I confirmed it works locally with the exception of https://developer.amd.com/sev/.
I am going to port 2 commits to the tests repo, the curl and multiple user agents one (d20822c and 404162c).
Note: We will not be maintaining 2 separate static checks scripts beyond these changes because it will be running from GHA after #8595. However, I am porting them in favor of getting #6826 merged and in case testing takes longer than expected.
d20822c
to
f2ee761
Compare
/test |
Only attempt to build the markdown checker if it doesn't already exist. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Break up a long line as little to make it easier to read. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Check that the `check_url()` parameters have a value. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Declare and then define a couple of variables separately. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Split the call to `curl` in the URL checker out into a new `run_url_check_cmd()` function to make `check_url()` slightly clearer. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Remove some extraneous whitespace. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Removed the Azure Portal URL (https://portal.azure.com) since this causes problems with our static checks script: that URL returns HTTP 403 ("Forbidden") when queried using command-line tools like `curl(1)`, which is used by the static check script. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make the URL checker cycle through a list of user agent values until we hit one the remote server is happy with. This is required since, unfortunately, we really, really want to check these URLs, but some sites block clients based on their `User-Agent` (UA) request header value. And of course, each site is different and can change its behaviour at any time. Our strategy therefore is to try various UA's until we find one the server accepts: - No explicit UA (use `curl`'s default) - Explicitly no UA. - A blank UA. - Partial UA values for various CLI tools. - Partial UA values for various console web browsers. - Partial UA for Emacs's built-in browser. - The existing UA which is used as a "last ditch" attempt where the UA implies multiple platforms and browser. > **Notes:** > > - The "partial UA" values specify specify the UA "product" but not the > UA "product version": we specify `foo` and not `foo/1.2.3`). We do > this since most sites tested appear to not care about the version. > This is as expected given that the version is strictly optional (see `[*]`). > > - We now log all errors and display an error summary if none of the UAs > worked, in addition to the simple list of the URLs we believe to be > invalid. This should make future debugging simpler. `[*]` - https://www.rfc-editor.org/rfc/rfc9110#section-10.1.5 Fixes: kata-containers#8553. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
f2ee761
to
5d085a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
branch updated (and simplified) - ptal! ;)
Simpler indeed. :-) Thanks @jodh-intel !
Make the URL checker cycle through a list of user agent values until we
hit one the remote server is happy with.
This is required since, unfortunately, we really, really want to check
these URLs, but some sites block clients based on their
User-Agent
(UA) request header value. And of course, each site is different and can
change its behaviour at any time.
Our strategy therefore is to try various UA's until we find one the
server accepts:
curl
's default)[*]
- https://www.rfc-editor.org/rfc/rfc9110#section-10.1.5Fixes: #8553.
Signed-off-by: James O. D. Hunt james.o.hunt@intel.com