Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inherit encoding from parent process in worker daemons on Linux #25319

Merged
merged 5 commits into from
Jun 7, 2023

Conversation

lptr
Copy link
Member

@lptr lptr commented Jun 7, 2023

The LANG, LANGUAGE and LC_* (including LC_ALL) environment variables are now passed to worker API daemon processes on Unix. This fixes a problem where on Linux worker daemons would try to access the file system with the default ASCII encoding.

We are passing as few parameters as possible, though, to avoid making worker processes unnecessarily incompatible.

There is also a @SupportsNonAsciiPaths annotation for integration tests that should work when they are executed with file paths containing non-ASCII characters. These tests are executed in the teșt files (notice the U+0219 character) instead of the usual test files folder.

(The intention is to run most integration tests with such a non-ASCII path to shake out more bugs, and annotate the ones that cannot parse non-ASCII paths with @DoesNotSupportNonAsciiPaths. But this is a next step.)

Fixes #25198
Fixes #13843

@lptr lptr added this to the 8.3 RC1 milestone Jun 7, 2023
@lptr lptr self-assigned this Jun 7, 2023
@lptr lptr force-pushed the lptr/workers/pass-encoding-to-worker-daemons branch from ed2a48b to a5868cc Compare June 7, 2023 08:15
@lptr lptr force-pushed the lptr/workers/pass-encoding-to-worker-daemons branch from a5868cc to 8bcc651 Compare June 7, 2023 08:16
@gradle gradle deleted a comment from lptr Jun 7, 2023
@gradle gradle deleted a comment from lptr Jun 7, 2023
@lptr lptr marked this pull request as ready for review June 7, 2023 08:43
@lptr lptr requested a review from a team as a code owner June 7, 2023 08:43
@lptr lptr changed the title Fix worker daemon encoding Inherit encoding from parent process in worker daemons Jun 7, 2023
if (!OperatingSystem.current().isUnix()) {
return ImmutableMap.of();
}
Map<String, Object> environment = forkOptions.getEnvironment();
Copy link
Member

@asodja asodja Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that forkOptions.getEnvironment() returns immutable map, so removal of entries could throw an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think this is meaningful, we can try to improve things in a separate PR.

This PR might be backported as far as 8.1, so I intentionally kept it as small as I could.

Copy link
Member

@asodja asodja Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if user passes immutable map to forOptions, then this will blow up. Or user can't set a map that is accessed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. I thought you meant we should make it immutable. I think you're right, let me fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, let's leave it as is. This code was already calling forkOptions.setEnvironment() in the constructor, which would have blown up with an ImmutableJavaForkOptions. So we aren't making things worse here.

This should be improved, but let's do that in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the current semantic of that method: it returns a Map but potentially by modifying the original Map. Which means in some cases the assignment is not needed. That feels like a code smell to me. I would rather have a clear new Map being built always.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright then. Should be fixed now.

@lptr lptr force-pushed the lptr/workers/pass-encoding-to-worker-daemons branch from 0bbb3f9 to fb2690c Compare June 7, 2023 08:52
@gradle gradle deleted a comment from lptr Jun 7, 2023
@lptr lptr changed the title Inherit encoding from parent process in worker daemons Inherit encoding from parent process in worker daemons on Linux Jun 7, 2023
@lptr
Copy link
Member Author

lptr commented Jun 7, 2023

@bot-gradle cancel

@gradle gradle deleted a comment from lptr Jun 7, 2023
@gradle gradle deleted a comment from lptr Jun 7, 2023
@bot-gradle
Copy link
Collaborator

I've cancelled the pre-tested commit build.

@lptr lptr requested a review from eskatos June 7, 2023 09:50
@lptr
Copy link
Member Author

lptr commented Jun 7, 2023

@bot-gradle test and merge please.

@bot-gradle
Copy link
Collaborator

Your PR is queued. See the queue page for details.

@bot-gradle
Copy link
Collaborator

I've triggered a build for you. Click here to see all failures if there's any.

@bot-gradle bot-gradle merged commit 597189f into master Jun 7, 2023
20 checks passed
@bot-gradle bot-gradle deleted the lptr/workers/pass-encoding-to-worker-daemons branch June 7, 2023 11:16
bot-gradle added a commit that referenced this pull request Jun 15, 2023
… Linux and macOS

The overarching goal here is to shake out more encoding problems throughout Gradle by running all our integration tests in a way that the tested code has to deal with non-ASCII characters in file paths. This PR takes a step towards that goal by forcing all our non-Windows integration tests to use such a path.

To keep the scope manageable, this PR does not force non-ASCII paths for Windows. That needs to be enabled in a followup PR where we can deal with Windows-specific encoding problems.

The idea is similar to how we add a space to the `build/tmp/test files` directory's name where all the test output is typically located; this time we replace the `s` in `test files` with an `ŝ` (see [U+015D](https://www.compart.com/en/unicode/U+015D)). Importantly this character is not part of ASCII nor any ISO-8859-X codepage, and cannot be represented by a single byte. (See [Wikipedia](https://en.wikipedia.org/wiki/ŝ)).

There is also an escape hatch for tests that for some reason can't support Unicode paths; these need to be tagged with `@DoesNotSupportNonAsciiPaths`. We have such offenders today:

  - Checkstyle fails because of this bug: checkstyle/checkstyle#13012
  - Java 6 wrapper tests fail because Java 6 barfs on non-ASCII characters in the path

Most of the problems that had to be fixed for Unixes come from the fact that `URI.toString()` does not encode non-ASCII characters, and some tools can't parse string representations of URIs with non-ASCII characters in them.

So some of the `URI`s are now converted to strings using `toASCIIString()` instead. This is the canonical form of a URI, and is the intended way to go when the string form is passed to places where we can't ensure that it will be read with the right encoding (see https://www.w3.org/Addressing/URL/3_URI_Choices.html)

There are followups:
- #25316
- #25322

This PR is a followup to the daemon encoding fix in:
- #25319

Co-authored-by: Lóránt Pintér <lorant@gradle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants