-
Notifications
You must be signed in to change notification settings - Fork 346
Test Elasticsearch's links #2631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Elasticsearch links are in a funny, elasticsearch specific spot in a json file. This digs them out of the json file in the most perl way I could think of. But it's compatible with the link checker. And checks the links!
|
@gtback does it look right now? Well, as right as something without a lot of maintenance can be. |
gtback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me and the tests are passing, so IMO this is fine to merge. I suggested adding a few clarifying comments so people in the future (maybe even us!) don't have to scrutinize Perl code more than necessary.
It's likely possible to write some integration tests for this, like we have for Kibana link checking, but if you'd rather just get this merged I won't object 😉 .
| my $extractor = sub { | ||
| my $contents = shift; | ||
| return sub { | ||
| while ( $contents =~ m!"([^"\#]+)(?:\#([^"]+))?"!g ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing filenames out of JSON with Regular expressions is fun :-), but not any worse than what we're already doing with TypeScript for Kibana.
If I had to summarize what this did, I would say it:
- pulls out any quoted elements in the JSON, splitting off any URL fragment (part after the "#" if it exists)
- If the path contains "html" consider it a path to a file that should get checked.
Is that about right? Is it worth adding a comment with a bit of this detail?
This is probably robust enough. Looking at the current file the keys are all UPPER_SNAKE_CASE so won't match the lower case "html", correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. a comment is good. I could pull in a json parsing library easy enough. but then I'd have to emit this into a line per url to check and use a url parser to see it because the link checker really wants that. I think. It does feel dirty. But you got it right. Including the SHOUTING_SNAKE_CASE defense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions:
- Would a different format in the ES source make this more robust? I used JSON mainly because JSON is easy in ES, but we could do something else for sure. YAML permits unquoted strings for instance. Or, have we run any Gradle stuff at this point? If so we could have a Gradle task that extracts just the bits you need into a more pleasant format.
- Does this mean we do not validate the fragment ID (the bit after the
#)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line per link in a file at the top level would amazing. But we can deal. I really could have imported a json parser, but it would have made more work in perl which I'm not familiar with.
We are certainly supposed to be testing the fragment. We should be selecting it out here, but I admit to having fought with the silly regex for longer than I'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are certainly supposed to be testing the fragment.
Ok that's good enough for me :) I admit I gave up trying to parse the regex myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I thought I'd just try this in elastic/elasticsearch#94794 elastic/elasticsearch#94815 and it looks like the docs build passes even though I used an invalid fragment. It does check the bit before the # tho.
Edit: opened a dedicated test PR at elastic/elasticsearch#94815
Yeah. I felt bad not writing them. but not that bad.... |
|
@elasticmachine test this please |
The Elasticsearch links are in a funny, elasticsearch specific spot in a json file. This digs them out of the json file in the most perl way I could think of. But it's compatible with the link checker. And checks the links!