Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 22, 2023

The Elasticsearch links are in a funny, elasticsearch specific spot in a json file. This digs them out of the json file in the most perl way I could think of. But it's compatible with the link checker. And checks the links!

The Elasticsearch links are in a funny, elasticsearch specific spot in a
json file. This digs them out of the json file in the most perl way I
could think of. But it's compatible with the link checker. And checks
the links!
@nik9000 nik9000 requested a review from gtback March 22, 2023 21:20
@nik9000
Copy link
Member Author

nik9000 commented Mar 23, 2023

@gtback does it look right now? Well, as right as something without a lot of maintenance can be.

Copy link
Member

@gtback gtback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me and the tests are passing, so IMO this is fine to merge. I suggested adding a few clarifying comments so people in the future (maybe even us!) don't have to scrutinize Perl code more than necessary.

It's likely possible to write some integration tests for this, like we have for Kibana link checking, but if you'd rather just get this merged I won't object 😉 .

my $extractor = sub {
my $contents = shift;
return sub {
while ( $contents =~ m!"([^"\#]+)(?:\#([^"]+))?"!g ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing filenames out of JSON with Regular expressions is fun :-), but not any worse than what we're already doing with TypeScript for Kibana.

If I had to summarize what this did, I would say it:

  • pulls out any quoted elements in the JSON, splitting off any URL fragment (part after the "#" if it exists)
  • If the path contains "html" consider it a path to a file that should get checked.

Is that about right? Is it worth adding a comment with a bit of this detail?

This is probably robust enough. Looking at the current file the keys are all UPPER_SNAKE_CASE so won't match the lower case "html", correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. a comment is good. I could pull in a json parsing library easy enough. but then I'd have to emit this into a line per url to check and use a url parser to see it because the link checker really wants that. I think. It does feel dirty. But you got it right. Including the SHOUTING_SNAKE_CASE defense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  • Would a different format in the ES source make this more robust? I used JSON mainly because JSON is easy in ES, but we could do something else for sure. YAML permits unquoted strings for instance. Or, have we run any Gradle stuff at this point? If so we could have a Gradle task that extracts just the bits you need into a more pleasant format.
  • Does this mean we do not validate the fragment ID (the bit after the #)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One line per link in a file at the top level would amazing. But we can deal. I really could have imported a json parser, but it would have made more work in perl which I'm not familiar with.

We are certainly supposed to be testing the fragment. We should be selecting it out here, but I admit to having fought with the silly regex for longer than I'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are certainly supposed to be testing the fragment.

Ok that's good enough for me :) I admit I gave up trying to parse the regex myself.

Copy link
Contributor

@DaveCTurner DaveCTurner Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I thought I'd just try this in elastic/elasticsearch#94794 elastic/elasticsearch#94815 and it looks like the docs build passes even though I used an invalid fragment. It does check the bit before the # tho.

Edit: opened a dedicated test PR at elastic/elasticsearch#94815

@nik9000
Copy link
Member Author

nik9000 commented Mar 23, 2023

It's likely possible to write some integration tests for this, like we have for Kibana link checking, but if you'd rather just get this merged I won't object wink .

Yeah. I felt bad not writing them. but not that bad....

@nik9000
Copy link
Member Author

nik9000 commented Mar 24, 2023

@elasticmachine test this please

@nik9000 nik9000 merged commit 698f9dd into master Mar 27, 2023
nik9000 added a commit that referenced this pull request Mar 27, 2023
nik9000 added a commit that referenced this pull request Mar 27, 2023
nik9000 added a commit that referenced this pull request Mar 27, 2023
@gtback gtback deleted the test_es_links branch March 28, 2023 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants