Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filename clashes between resources and functions. #12453

Merged
merged 1 commit into from Mar 29, 2023

Conversation

dixler
Copy link
Contributor

@dixler dixler commented Mar 17, 2023

Description

Docs paths are lowercased so functions and resources may have name clashes. This PR adds behavior to docs generation to prefix docs with mod, res, fn if there is a name conflict between documentation types in that order.

Fixes #10495

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have run make changelog and committed the changelog/pending/<file> documenting my change
  • Yes, there are changes in this PR that warrants bumping the Pulumi Service API version

@pulumi-bot
Copy link
Contributor

pulumi-bot commented Mar 17, 2023

Changelog

[uncommitted] (2023-03-29)

Bug Fixes

  • [docs] Fix filename clashes between resources and functions on case-insensitive filesystems in docsgen.
    #12453

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch 2 times, most recently from c3c4e24 to c3119f5 Compare March 17, 2023 19:24
@dixler dixler requested review from Zaid-Ajaj and a team March 17, 2023 19:34
@dixler dixler marked this pull request as ready for review March 17, 2023 19:34
@dixler dixler requested review from Frassle and justinvp March 17, 2023 19:53
@cnunciato cnunciato self-requested a review March 21, 2023 17:35
@cnunciato
Copy link
Member

We should be careful here as it looks like these changes would result in API docs pages living at different URLs, which would result in a whole lot of 404s without a solid redirect strategy.

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch from c3119f5 to 0768ffb Compare March 22, 2023 17:32
Copy link
Contributor

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match what we discussed in Slack, but precedence-based conflict resolution is also reasonable as long as the conflicts can only be across those boundaries.

However, unless I'm misreading this, we could also see conflicts if two functions both have similar names with different casing: GetName and getName. This will use "fn-getname" for one and "getname" for the other, and I don't know if we have a guarantee that which function gets the prefix is always the same.

Comment on lines 1763 to 1768
var namePrefixed string
for _, prefix := range prefixes {
namePrefixed = prefix + name
if _, exists := seen[namePrefixed]; exists {
// Unset namePrefixed because it isn't valid.
namePrefixed = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Set the variable and then unset it like that is a bit off in terms of flow.
How about:

var found bool
for _, prefix := range prefixes {
  candidate := prefix + name
  if _, exists := seen[candidate]; exists {
    continue
  }
  name = candidate
  found = true
  seen[name] = struct{}
  break
}

if !found {
  return
}

The target is set only if the candidate is valid.

(My example doesn't have a namePrefixed because we don't use the original "name" variable again after this point, and the prefixed version effectively replaces it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar names with different casing: GetName and getName.

I feel like this pattern of naming doesn't exist in practice (and shouldn't exist in practice) because it's not idiomatic in any supported language. I would expect if someone created a provider with these names in the schema, we should encourage them to get a little more creative with their names. I would expect we could issue guidance around naming conventions in the schema to mitigate this concern. Offering guidance feels like something we should do to maintain the ecosystem's health and ensure quality in our providers.

Just my 2c, feel free to ignore me :)

Copy link
Contributor

@abhinav abhinav Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this pattern of naming doesn't exist in practice (and shouldn't exist in practice) because it's not idiomatic in any supported language

You'd think so! It's definitely not idiomatic, but it's not unsupported—in that it's not impossible to do. There's nothing stopping one from doing this.
Story time: A service schema system I maintained in the past made the same assumption. It normalized getName to GetName because it was generating Go code. Some users started reporting broken code specifically because they had two fields with different casing for the same name. The reasons of why they did that was not important, but a workaround was necessary. (We had the ability for fields to be annotated. We basically added the ability for you to set an annotation on conflicting fields to specify an alternative Go-level name.)

In general, if we're taking user input from one set of possibilities and normalizing it into a smaller set of possibilities, we have to account for conflicts.

Comment on lines 1773 to 1777
if name != "" && namePrefixed == "" {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a silent failure if by some chance or bug, we cannot get a unique name for something.
We should return an error here, and have the addFile callers fail if an error is returned.

Copy link
Contributor Author

@dixler dixler Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a hard failure on this could change the behavior for downstream users of this such as the registry. I'm not sure if we should modify the behavior in this way.

Copy link
Contributor

@abhinav abhinav Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't fail here, at least a warning is merited.
Silently skipping files will just make these conflicts harder to debug for users.
A bug report that includes the warning in the log output will be easier for us too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's in codegen, I think it's okay if we fail in the registry. If that happens, the Docs team will open a fix or a P1, and we can resolve that before they upgrade.

A warning might also suffice here; does anyone review the logs during registrygen? I think the answer is no. If it were a manual task, then we'd guarantee eyes on the warning.

Again, just my 2c. Not my hill to die on. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a failure would do too. A warning is slightly extra steps, but it's still more than silence.
Silently ignoring the failure is the only case I'd push back against.

@dixler
Copy link
Contributor Author

dixler commented Mar 22, 2023

However, unless I'm misreading this, we could also see conflicts if two functions both have similar names with different casing: GetName and getName. This will use "fn-getname" for one and "getname" for the other, and I don't know if we have a guarantee that which function gets the prefix is always the same.

From my understanding, this indeterminate behavior already existed, but this PR disambiguates only across module, resource, and function docs.

I think that the number of potential URLs being changed in the registry by this PR could be high and I'm not sure if we want to stack that change in behavior on top of it for the SEO reasons that @cnunciato mentions.

@abhinav
Copy link
Contributor

abhinav commented Mar 22, 2023

I think that the number of potential URLs being changed in the registry by this PR could be high and I'm not sure if we want to stack that change in behavior on top of it [..]

I'm not suggesting we rename all entities.
In Slack, the rough direction we were discussing was:
For all files that conflict, rename and disambiguate them, but only them.
This PR diverges from that slightly in that the first of the files in a conflict group will not be renamed, but the ones that follow will be renamed—provided that they're a different kind (function, module, etc.).
I don't have a strong preference for the originally discussed behavior ("rename all conflicting files") versus what this PR implements right now.

I'm suggesting we increase the scope of this change from the conflict from the original bug report to conflicts in these file names in general.
I suspect it's not much more added complexity to change the conflict resolution code to append a 1, 2, 3 suffix for same-kind conflicts. For example:

seen := make(map[string]struct{})
nameFor := func(name, kind string) (result string) {
    // Always put the final result into the map.
    defer func() { seen[result] = struct{}{} }()

    if _, ok := seen[ok]; !ok {
        return name // no conflict
    }

    name := kind + "-" + name // pkg-foo
    if _, ok := seen[ok]; !ok {
        return name
    }

    // Still conflicts.
    // Might have "FooBar" and "Foobar" in the same package.
    i := 2
    for {
        candidate := name + strconv.Itoa(i)
        if _, ok := seen[candidate]; !ok {
            return candidate
        }
        i++
    }
}

This will guarantee a unique lower case name for all cases—no possibility of error (which also makes the other error discussion moot).
If there's no conflict, the original name is preferred.
If there's a cross-kind conflict, pkg-foo, fn-foo is preferred.
If there's an in-kind conflict, the first will use fn-foo, and the next fn-foo1, fn-foo2, and so on.

The last piece for this conflict resolution is this: deterministic ordering.
You already implemented cross-kind determinism: module, resource, function.
I'm suggesting we also need to handle in-kind determinism.
To do that, we just need a guarantee that the iteration order of mod.resources and mod.functions is fixed.
If we don't have that guarantee from our input contract, then we can enforce it by sorting them before iterating.
e.g.:

resources := mod.resources
sort.Slice(resources, func(i, j int) bool {
  return resourceName(resources[i]) < resourceName(resources[j])
})
for _, r := range resources {
  // ...

This will guarantee that we always generate files for resources in a fixed order,
so we always encounter conflicting lower case names in the same order,
and so we always resolve them to the same unique names.

@dixler
Copy link
Contributor Author

dixler commented Mar 22, 2023

I suspect it's not much more added complexity to change the conflict resolution code to append a 1, 2, 3 suffix for same-kind conflicts. For example:

I like this change, but it suffers from the potential for a provider upgrade inserting a new conflict and changing the ordering.

foobar -> foobar-1 -> foobar-1
fooBar -> foobar-2 -> foobar-2
          // New resource was added in a provider upgrade and changes th
          Foobar     ->  foobar-3 // foobar-3 URL now goes from FooBar to Foobar.
FooBar -> foobar 3 -> foobar-4

I think it's worth adding the log message and postponing this until we're aware of a user running into this issue for same-kind conflicts because this does make decisions around API and registry docs URL naming

@Frassle
Copy link
Member

Frassle commented Mar 22, 2023

I think it's worth adding the log message and postponing this until we're aware of a user running into this issue for same-kind conflicts because this does make decisions around API and registry docs URL naming

I'm happy with postponing that part of the change. By rights schema's shouldn't even allow functions or types to differ just on case (and mine and Daniels idea for new naming system would enforce that).

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch from 0768ffb to 68e3774 Compare March 22, 2023 20:32
@dixler dixler requested a review from abhinav March 22, 2023 20:54
@@ -0,0 +1,43 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a conflicting module to this as well to check that also works as expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look good?

pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
@dixler dixler changed the title Fix filename clashes between resources and functions on case-insensitive filesystems. Fix filename clashes between resources and functions. Mar 23, 2023
@abhinav
Copy link
Contributor

abhinav commented Mar 23, 2023

I think it's worth adding the log message and postponing this until we're aware of a user running into this issue for same-kind conflicts because this does make decisions around API and registry docs URL naming

That sounds like a plan.
We can defer trying to solve all conflicts until folks actually run into it.

By rights schema's shouldn't even allow functions or types to differ just on case (and mine and Daniels idea for new naming system would enforce that).

👍

Copy link
Contributor Author

@dixler dixler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming is hard so I'd appreciate name suggestions.

I think the approach is good and I just need to:

  • correctness/writing tests
  • tidying up the names

This is a 2000+ line file with globals and a lot of stateful stuff.

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch 2 times, most recently from 752c40d to 4d85a15 Compare March 27, 2023 06:33
pkg/codegen/docs/gen.go Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
Copy link
Contributor

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying logic is a bit messy, but this change looks fine.

pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
@dixler
Copy link
Contributor Author

dixler commented Mar 28, 2023

Ok. I've added a comment to better label what interface{} is expecting. I've also added a resource with a conflicting module to my docs-collision test. Going to squash and merge. If it looks good to you @Frassle, feel free to bors merge it.

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch from f58ec54 to dcec674 Compare March 28, 2023 02:18
@dixler dixler requested a review from Frassle March 28, 2023 02:18
@cnunciato
Copy link
Member

cnunciato commented Mar 28, 2023

Before we merge, would y'all object to running a diff of the filesystem before and after a docs build with these changes? Just want to be sure we err on the side of caution here; I’d like it to be clear what actually changes as a result of this change, and ideally get redirects in place if it seems warranted. I'm happy to take a swing at that now if you think this is close to being done.

pkg/codegen/docs/gen.go Show resolved Hide resolved
pkg/codegen/docs/gen.go Outdated Show resolved Hide resolved
@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch from 6d1d5c3 to f82aa62 Compare March 29, 2023 15:10
@cnunciato
Copy link
Member

If it helps, here's a draft PR (feel free to push to it if you like) that runs a full build based a commit hash from this repo: pulumi/docs#8787

@dixler dixler force-pushed the dixler/10495/case-insensitive-docsgen branch from f82aa62 to e8d68a7 Compare March 29, 2023 19:56
name.

Docs URLs are case-insensitive so modules, functions and resources may have
name collisions. If a filename is taken by another class of document, it
will be prefixed with "mod", "res", "fn" and docs will point to this new
unique link.

The priority is as follows:
1. module
2. resource
3. function
@dixler
Copy link
Contributor Author

dixler commented Mar 29, 2023

I've talked with @cnunciato about potential docs changes and we don't see anything different. I don't see any significant outstanding work on this and I have an approval so I'm going to merge this PR.

Feel free to bors cancel the merge if any concerns arise. 🙂

@dixler
Copy link
Contributor Author

dixler commented Mar 29, 2023

bors merge

@bors
Copy link
Contributor

bors bot commented Mar 29, 2023

Build succeeded:

@bors bors bot merged commit 7e8df99 into master Mar 29, 2023
49 of 54 checks passed
@bors bors bot deleted the dixler/10495/case-insensitive-docsgen branch March 29, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants