Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Fix unicode exceptions #156
Thank you for these changes. I think that overriding the open() function is an elegant way to handle the Python 2 vs 3 differences.
But this solution only hides any errors and doesn't actually support other encodings. Resilience against encoding errors is good, but I am deferring this PR until a general strategy for dealing with source encodings can be developed. See also my comment at #148 (comment) for more context.
I think they are two parts that can be considered almost independently. The one thing is to override the default source encoding (inferred by Python from the locale) and the other is how to treat errors. This PR is solely about to handle encoding errors. Note that even if it would be possible to specify a source encoding it would be not sufficient for our use case. We have quite a bit of legacy and 3rd party code that are all compiled together. Because of that there are unfortunately multiple different encodings in the code base.
This PR proposes to use the
I think this is a good trade-off. This way gcovr can read incorrectly encoded source files, process the correctly encoded parts and write back the result with the same byte representation. The other options are basically only to either replace the offending bytes or to bail out. IMHO neither of them look attractive.
So I think it is still valuable to have this kind of error handling even if the source encoding can be properly specified.
An alternative approach has been implemented in #256, so I'm closing this PR. Still, many thanks for putting this prototype forward as it helped the discussion! If you feel that the currently implemented solution does not support a specific use case, please open a new issue.