Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Fallback to UTF-8 when reading generated source file #9052
Idea how to fix #9022.
Just fallback to
The problem in #9022 is that the system charset is
However, I think the
What do you think?
referenced this pull request
Feb 20, 2019
I would prefer to consistently use the same charset when generating/writing and reading the file. Do you have time to experiment with the second option described? I didn't had time to investigate, but I have the impression this is exactly what commons-io's
@marcospereira Yes, I also think this is was
However, I think for this issue the problem is not which charset is used to write the file, but which charset is used to read the file. Because, like I explained, the
So the real problem is that we don't know what a source file's encoding is at the time we read it. Now I guess what is happening in the specific case is that there is a source file that got generated/written as UTF-8 (but not by by the route compiler), meaning it contained chars that
Do you understand what I want to say?
One more thing:
With this pull request, this is still the case. We just fall back to UTF-8 to handle files that are not generated by the routes compiler but some other component and hope that UTF-8 can handle that file.
@hey Matthias, yes, got your explanation.
You are right that in most of the cases we will use the same charset (system default one) to write and read the file, but then there is this fallback. My point, which reading now was not properly explained, is that why not use utf-8 instead of going to system default and falling back to utf-8 if it fails?
@marcospereira I had another look at the issue and pushed a fix that - I am pretty sure - should fix the problem. Please just read the java comment I added to the fix, it should explain everything.
However, here are some more explanations.
Reading/writing the file with
What's now different in
However, that means in the past the routes compiler did read files with "wrong" charsets without failing and ended up with garbled strings - in which it was looking for the "
So both the
Coming to your statement:
Actually that is wrong,
Of course a solution would be to set up our own
And now to the