Tweak diagnostics with invalid UTF-8 so they can pass over the wire #237
Hey @apparentlymart 👋 On the surface, this change seems reasonable for inclusion. I'm going to label this as a bug fix, since we should certainly provide a better experience than raising a relatively meaningless error to practitioners in this situation.
Would you mind adding a
```release-note:bug tfprotov5: Prevented diagnostic "string field contains invalid UTF-8" errors by rewriting with valid replacement characters ``` ```release-note:bug tfprotov6: Prevented diagnostic "string field contains invalid UTF-8" errors by rewriting with valid replacement characters ```
It may also be worth noting that more recent versions of terraform-plugin-go (v0.10.0), terraform-plugin-sdk (v2.18.0), terraform-plugin-framework (v0.10.0), and terraform-provider-azurerm (v3.14.0) should also have additional diagnostics logging that occurs before the response is sent: #203
Maybe those newer logs can provide some additional insight while we go through the motions to getting this through the various releases to get it in front of practitioners.
Personally, I'm also wondering if this is a good time as any to introduce native Go fuzz testing for a few code paths in this Go module so we can try to tease out any additional oddities or panics. I'll create a placeholder issue since I thought I had already done that in the past.
Oh yes... sorry I forgot about the changelog automation in this repository. I've added a changelog file to my commit.
It does look like those additional log lines would help unmask whatever is going on in hashicorp/terraform-provider-azurerm#17563, by printing the same diagnostic information in the log stream before trying to send it over the wire.
My intent with this PR was to try to get at least a partial error message to the user in the UI even if the provider's message formatting is buggy, and so I think this change is still valuable in that sense. Now that we have the logging you mentioned I could also see the argument that it might be better to let this fail and have the user report a bug -- otherwise the encoding errors are unlikely to ever get reported and fixed -- but I think I personally would err on the side of trying to give the end-user as much information as possible to debug with, even if that does mean that some mostly-cosmetic encoding bugs in providers will go unfixed.
If we were to rule that it's better to raise an error for that case then I would still suggest swapping for a more specific error which explicitly declares it as a provider bug, so that the end-user knows to just report it and not waste time trying to debug it locally.
Ultimately I'll leave this call up to you! I'm happy to take whichever path you think best.
A correct provider should only ever return valid UTF-8 strings as the diagnostic Summary or Detail, but since diagnostics tend to be describing unexpected situations and are often derived from errors in downstream libraries it's possible that a provider might incorrectly return incorrect garbage as part of a diagnostic message. The protobuf serializer rejects non-UTF8 strings with a generic message that is unhelpful to end-users: string field contains invalid UTF-8 Here we make the compromise that it's better to make a best effort to return a diagnostic that is probably only partially invalid so that the end user has a chance of still getting some clue about what problem occurred. The new helper functions here achieve that by replacing any invalid bytes with a correctly-encoded version of the Unicode Replacement Character, which will then allow the string to pass over the wire protocol successfully and hopefully end up as an obviously-invalid character in the CLI output or web UI that's rendering the diagnostics. This does introduce some slight additional overhead when returning responses, but it should be immaterial for any response that doesn't include any diagnostics, relatively minor for responses that include valid diagnostics, and only markedly expensive for a diagnostic string with invalid bytes that will therefore need to be re-encoded on a rune-by-rune basis.
Good call on those extra test cases! I needed to add them to both copies of the logic (across both protocol versions) so I did it locally in my editor rather than by accepting your suggestions, and so I decided to add new test cases rather than replacing the existing ones since it was easier to do than it would be in the suggestions UI.
Note that I don't have write access to this repository, so if this looks good I'll have to ask you to merge it for me. 😀
Somehow when I was first working on this I didn't notice that there's a stdlib function
I'm not going to open a follow-up PR to change that now because it seems like unnecessary churn just to get almost-the-same functionality a different way, but I did want to note it here in case someone looks back at this in future and wonders why we have a hand-written "to valid UTF8" function. 😀
(The subtle difference between what I wrote and what the stdlib does is that my function turns each invalid byte into a separate unicode replacement character, whereas stdlib collects many consecutive invalid bytes and replaces them all together with a single unicode replacement character. Both of those are consistent with the goal of this PR, but if we did swap this out we'd probably need to tweak a couple tests to account for this difference.)