Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetPage, ref and relref improvements #4796

Closed
wants to merge 6 commits into from
Closed

GetPage, ref and relref improvements #4796

wants to merge 6 commits into from

Conversation

vassudanagunta
Copy link
Contributor

@vassudanagunta vassudanagunta commented May 30, 2018

Below are a number of issues related to how one looks up or links to pages in Hugo.

  • Problems:
    • #4727: GetPage, rel and relref accept ambiguous refs without warning.
    • #4147 and #4130: ref and relref need to support all kinds of pages.
      Changes introduced in v0.33 make this a must as the use of refs rather than hardcoding is necessary for dynamic handling of changed URLs.
    • #4726: GetPage neither respects nor needs its kind arg. #4652 points out it isn't needed.
  • Nice-to-haves:
    • #4728: Option to support unix-style path semantics.

There was some discussion of this on the Hugo forum.

These issues beg to be resolved together rather than piecemeal

If you start working on the above issues, you quickly discover how much the solutions overlap or have dependencies with each other.

This is an opportunity to provide authors an improvement in how they author links and dependencies between content:

  • simple and consistent: Page.GetPage, Page.Ref and Page.RelRef, ref and relref should all take the same single arg with the same semantics.
  • natural: unix path semantics are well understood. The web is built on it. It is the most natural way to reference hierarchically organized content, but also works equally well for flat content with the use of relative paths. In fact, within a section, everything looks flat with relative paths, no matter how deeply nested the section is.
  • reliable: Authors manually check links before publishing. Once it is published, they will not check again. Authors need confidence that link targets don't change out from under them. Relative links are perfectly succinct while being unambiguous. But if ambiguous and "mostly stable" shorthand or even wiki-links are to be supported, authors should be warned when a link is ambiguous or when a change is made that causes a link to become ambiguous. The ambiguity detection in this PR provides that.
  • self-contained sections: Links between elements within a section of self-contained content should not have any coupling or dependencies on anything outside the section, including the location of the publish context of the section itself. Relative links provide that. Complex websites rely heavily on this principle, as does complex code (encapsulation).

Summary of Changes

As always i've broken down the changes into semantically discrete commits that are human parsable and can be reasoned about individually, and submitted them individually so tests are run against each.

Please see the comments on the individual commits and the linked issues for details. But here is a quick summary of the outward changes:

  • Site.GetPage, ref and relref now work across all pages. The kind is ignored.

  • There is a new Page.GetPage that, like ref and relref , takes a single ref arg. All three operate on a Page context and interpret the ref arg on that context in exactly the same way. Site.GetPage is supported for backward compatibility, but Page.GetPage should be used instead.

  • In addition to all of the previous types of refs you could pass to these methods, you can now pass absolute and relative refs with unix path semantics. e.g.:

    • /section/subsection/page.md
    • sibling-page.md
    • .. (parent section)
    • ../page-in-parent-section.md

    Absolute or relative refs are a way, and in some cases the only way, to avoid ambiguity. For example, if there are multiple content files named sibling-page.md, because of the continued support for shortcut refs you have to use a unix absolute path, or if you need a relative path, force the ref to be interpreted as such and not as a ambiguous shortcut ref with ./sibling-page.md.

    Site.GetPage has no page context so does not support relative refs; Use Page.GetPage.

  • If called with an ambiguous ref, an error with a helpful message will be generated, e.g.:

    The reference "help.md" in "/tutorial/_index.md" resolves to more than one page.
    Use either an absolute path (begins with "/") or relative path to the content
    directory target.
    

I ran benchmarks at three points (see commit comments). Performance differences swung both ways and are probably within the margin of benchmark error.

TODO

All the existing tests pass. That's pretty solid coverage. But I will add some additional tests once there is buy-off of this PR.

@bep
Copy link
Member

bep commented May 30, 2018

I have one important comment: Site.getPage needs to work as before, so adjusting million tests to some new format is counter-productive (it is much better that they stay as is for now to confirm the existing behaviour).

@vassudanagunta
Copy link
Contributor Author

vassudanagunta commented May 30, 2018

@bep I get your point and that's how I started to approach it, but then it didn't make sense. The only two callers outside of the tests are Site.GetPage and Site.refLink, both of which I need to accept a possible error return value, and both of which will not bother passing a kind arg. It doesn't make sense for the the prod code calling one version of getPage and the tests calling another.

Nearly all of the test changes are pure semantics preserving call signature changes: not passing in kind and accepting second error return value (ignoring it). It's pretty easy to scan the diff and see that.

As I'll explain in the full PR desc: all the commits will be discrete and easy to reason about.

Let me get all the commits in and then judge, please. I spent a lot of time on this to make it easier for you, literally 10 or more times the amount of time I spent on the implementation in my fork.

@bep
Copy link
Member

bep commented May 30, 2018

OK, I will wait, but I will repeat myself (because I think it's important):

This PR is (I think) about adding .Page.GetPage. It is not about changing the signature of .Site.getPage or .Site.GetPage. So to me, it would make sense to (at least now) keep the tests for .Site.getPage as-is (with kind argument).

As it is now you will need to write new tests for the tests you rewrite. Which does not sound optimal.

But up to you.

@vassudanagunta
Copy link
Contributor Author

@bep, The new Page.GetPage may have been your focus, but it wasn't mine. That method is rather a small natural consequence of the things I think are important: a simple, consistent, natural, reliable way to author links and dependencies between content, in a way that allows sections of content to be self-contained.

Please read the These issues beg to be resolved together rather than piecemeal section that I just added to the PR description above. If you agree with that section, then we can proceed with a code review where I address your implementation concerns. Please let me know.

@bep
Copy link
Member

bep commented Jun 1, 2018

All the existing tests pass. That's pretty solid coverage.

I see a lot of test adjustments related to this, which in my head isn't the same as having the existing tests pass.

My number one concern about this is breaking sites in the wild. Are there known situations where the "old" will not work? I.e. you would have to adjust your content/templates to get it right.

@bep
Copy link
Member

bep commented Jun 1, 2018

Note that the implementation itself looks solid (I have only quickly read through it), but it would be much easier to get an opinion on the side effects if you kept the old tests as is. I.e. created a new .Site.getNewPage or something.

@vassudanagunta
Copy link
Contributor Author

vassudanagunta commented Jun 2, 2018

@bep,

My number one concern about this is breaking sites in the wild. Are there known situations where the "old" will not work? I.e. you would have to adjust your content/templates to get it right.

The only known cases where anyone will have to make adjustments are:

  1. They use an ambigous ref. But the error message will make that clear and easy to fix:

    The reference "help.md" in "/tutorial/_index.md" resolves to more than one page. Use either an absolute path (begins with "/") or relative path to the content directory target.
    
  2. They rely on calls such as Site.GetPage "page" "name of section" to return nothing. That would be very strange and I think unsupported reliance. The ambiguity detection covers the case where a section and page have the same name.


I see a lot of test adjustments related to this, which in my head isn't the same as having the existing tests pass.

  1. All but two of the "adjustments" are trivial (see #4 and #5 below). The adjusted tests are not testing getPage, GetPage or the shortcodes, but are testing site builds, section nesting and other stuff.

    The likelihood of all those nuanced tests calling an updated getPage that returns the wrong page and still passing is virtually nil, so I'm certain those tests passing mean exactly what they did before. Biting the bullet and having these tests call the new getPage leaves the codebase cleaner, and the tests more transparent.


it would be much easier to get an opinion on the side effects if you kept the old tests as is. I.e. created a new .Site.getNewPage or something.

Yes, that's what I would have done by default, to make the diffs as minimal as possible (for your sake, not mine, since I know exactly what I did at each step). But...

  1. The old Site.getPage would call Site.getPageNew as follows (simplified for clarity):

    getPage (kind string, sections ...string) {
        if kind == "home" {sections = "/"}
        p, _ := getPageNew(nil, path.Join(sections))
        return p
    }

    This logic is exactly the logic in the existing/old getPage to covert the args to an index key. Nearly all of the changed lines in the tests are a trivial and exact inlining of this logic so they can call the new getPage directly, e.g. no kind arg, "home" is "/", pass a single path arg, pass nil for page context, ignore any returned error, e.g.:

    -   p := s.getPage(KindHome)
    +   p, _ := s.getPage(nil, "/")
    
    -   p := s.getPage(KindTaxonomy, "tags", "tag1")
    +   p, _ := s.getPage(nil, "tags/tag1")
    
  2. Besides these trivial adjustments, there are only a few necessary changes:

    • There are exactly two existing tests that had change getPage calls because they triggered ambiguity detection. This is as expected. I'll point this two out with code review.
    • I added some test cases topage_collections_test and site_test to specifically test the new functionality. These are the ONLY test files with significant changes. page_collections_test is also where there are existing tests that call Site.GetPage with the "legacy" args. I will add more, for additional confidence of backward compatibility.

I believe doing it this way leaves the code in better shape, at the cost of you spending 10 minutes (I think that's a fair estimate if you use a smart diff) to scan the test diffs to confirm they are trivial per #4 above.

How about I first add the test coverage I think should be added, then use the review tool to point explain the non-trivial test changes, and then, if you decide you still want me to do as you suggest, I'll do it with no more pushback? It will consume more of my time, but probably less than further debate :)

@vassudanagunta
Copy link
Contributor Author

vassudanagunta commented Jun 5, 2018

@bep https://github.com/upend/IF_MS_BUYS_GITHUB_IMMA_OUT and it's popularity has been taking up my free time. Would you mind reading my last comment and making a decision. I'll do what you decide if I have the time. I'd like to do it before I leave GitHub. At which point you will have gotten rid of me 😉

@bep
Copy link
Member

bep commented Jun 5, 2018

@vassudanagunta I'm on vacation where there is sun, so this issue is not on the top of my list ... And please stop posting that GitHub vs MS propaganda in the Hugo forums.

@vassudanagunta
Copy link
Contributor Author

vassudanagunta commented Jun 5, 2018

@bep, why do you have to be so unkind? I posted it once, and said I'd understand if it was deemed off-topic, and didn't protest or repost when it was shut down. In another interaction, rather than respecting a difference of opinion, you dismiss mine as being "dramatic" and essentially accuse me of being dishonest. And above, you dismiss my beliefs as "propaganda". And I have more experiences like this with you. Why do I bother contributing? Do you realize there is nothing in it for me? I don't need the hassle or the unkindness.

@bep
Copy link
Member

bep commented Jun 5, 2018

@vassudanagunta I asked you to stay on topic. I even said, "can you please". I see "politeness" and nothing "unkind" in that. I'm going to stop here.

@bep bep changed the title WIP: GetPage, ref and relref renovations WIP: GetPage, ref and relref improvements Jun 11, 2018
@bep
Copy link
Member

bep commented Jun 19, 2018

@vassudanagunta what is the status of this and the other WIP pull request you have open?

@vassudanagunta
Copy link
Contributor Author

vassudanagunta commented Jun 19, 2018

@bep I spent a lot of time explaining this PR in its description and then in a reply to your concerns. Likewise on the description of the other one. What more do you want me to say?

I did neither PR or any of the others I've submitted for my own benefit. If they are not valuable please just say so. I don't want to waste any more time. It's rather discouraging. It is so much cheaper and more efficient to just use my own fork.

@bep
Copy link
Member

bep commented Jun 19, 2018

No, just wanted to chime in and see if you are still active on GitHub,

@vassudanagunta
Copy link
Contributor Author

I'm minimizing my activity here. If there is a project that I'd like to help, and I have no other option, I'll do it through here. My own projects will move elsewhere.

@stp-ip
Copy link

stp-ip commented Jun 25, 2018

Thanks for all the effort. This would be a great addition. I just wanted to jump in and give my appreciation for the efforts and looking forward to see this integrated.

@bep
Copy link
Member

bep commented Jun 25, 2018

I will get back to this once I get some other major work out of the way ... should not be too long. This in general looks very good.

Unifies the core internal page index for all page kinds. This enables
the `ref` and `relref` shortcodes to support all pages kinds (#4147),
supports a new `GetPage` method with simplified signature (#4726 and
the way for the implementation of unix path semantics.

Fixes #4147

BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      132178709     131702904     -0.36%
BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     246968331     248262520     +0.52%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      149885185     149436896     -0.30%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     284289590     276022981     -2.91%
Lays the groundwork for ambiguity detection and unix path semantics by
changing the method signature for the core internal page lookups method
used by public API such as the `.GetPage` method and the `ref` and
`relref` shortcodes in advance of the more involved follow on commits.
The change has no functional impact.
Detects ambigous page lookup or link attempts made with ambiguous
references and produces an error with a helpful message when they
occur.

NOTE: Two preexisting tests fail because of ambiguity with no
unambiguous ref available. This will be addressed in the next step
of the overhaul.

Fixes #4727

BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      130331810     132914349     +1.98%
BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     244420821     243607037     -0.33%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      149811675     148979860     -0.56%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     272842157     280753633     +2.90%
Gives every page, including virtual ones, a unique, absolute ref.
Absolute refs are the absolute path rooted in the content directory,
with unix path form and semantics.

Used to unambiguously lookup (e.g. .GetPage) or links (e.g. ref/relref)
to any page when an absolute (as opposed to relative) reference makes
sense.

First half of fix for #4728.
Enables page lookups (e.g. .GetPage) or links (e.g. ref/relref) using
paths relative to the current page. Relative paths have unix path form
and semantics.

Used to unambiguously lookup (e.g. .GetPage) or links (e.g. ref/relref)
to any page when a relative (as opposed to absolute) reference makes
sense.

Fixes #4728
Add page relative .GetPage method to template API, enabling relative
page lookups from templates.

Has simple method signature and semantics consistent with ref/relref.
It does away with the `kind` param of `.Site.GetPage` and only takes a
simple ref string. This method should be used going forward instead of
`.Site.GetPage`, which may be deprecated sometime in the future.

Fixes #4726
Fixes #4652

BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      130232053     128963144     -0.97%
BenchmarkSiteBuilding/TOML,num_langs=1,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     248140348     246210344     -0.78%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=500,num_tags=1,tags_per_page=5,shortcodes,render-4      149372926     151167558     +1.20%
BenchmarkSiteBuilding/TOML,num_langs=3,num_root_sections=5,num_pages=1000,num_tags=1,tags_per_page=5,shortcodes,render-4     273565538     278713990     +1.88%
@vassudanagunta
Copy link
Contributor Author

git rebase; push -f 2-month old commits

@vassudanagunta vassudanagunta changed the title WIP: GetPage, ref and relref improvements GetPage, ref and relref improvements Jul 16, 2018
@bep
Copy link
Member

bep commented Jul 16, 2018

git rebase; push -f 2-month old commits

Not sure I get it. Is it a problem to push 2 months old commits?

@bep bep mentioned this pull request Jul 17, 2018
@bep
Copy link
Member

bep commented Jul 17, 2018

Will finish this in #4955

@github-actions
Copy link

github-actions bot commented Feb 3, 2022

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants