Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent handling of Encodings #782

Merged
merged 26 commits into from
Jul 2, 2021
Merged

Consistent handling of Encodings #782

merged 26 commits into from
Jul 2, 2021

Conversation

AshesITR
Copy link
Collaborator

@AshesITR AshesITR commented Mar 16, 2021

Implements #752

fixes #541

Added TODO comments to all places that need changing to not miss anything.
Feel free to add more TODO comments if I missed some place.

  • add TODO markers in all effected places
  • add encoding to settings
  • respect encoding in all places where necessary
  • update documentation
  • write tests for all encoding-specific behaviour with at least one non-UTF-8 and non-system encoding
  • add NEWS bullet

@AshesITR AshesITR added the feature a feature request or enhancement label Mar 16, 2021
@AshesITR AshesITR added this to the 3.0.0 milestone Mar 16, 2021
@AshesITR AshesITR linked an issue Mar 16, 2021 that may be closed by this pull request
@AshesITR
Copy link
Collaborator Author

The test file is GPL 2 licensed. I'll replace the file by a self-made one to avoid license violations of spatialprobit.

handle invalid encoding in parse_exclusions() and get_source_file()
suppress linting of file if invalid encoding detected

also had to fix deprecated warnings caused by R 4.1.0 and testthat::expect_equal() with functions
@AshesITR AshesITR marked this pull request as ready for review July 1, 2021 09:23
@AshesITR AshesITR changed the title [WIP] Consistent handling of Encodings Consistent handling of Encodings Jul 1, 2021
@@ -77,6 +87,36 @@ get_source_expressions <- function(filename, lines = NULL) {
# an error that does not use R_ParseErrorMsg
if (is.na(message_info$line)) {

if (grepl("invalid multibyte character in parser at line", e$message, fixed = TRUE)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to increase robustness is what we've done in data.table:

https://github.com/Rdatatable/data.table/blob/ed72e398df76a0fcfd134a4ad92356690e4210ea/inst/tests/tests.Rraw#L106-L140

Basically, keep a database (maybe populated at onAttach()) of error messages as generated by R itself.

That way we don't need to change anything if R updates its own message.

Just a thought, no need to do so in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. That would also allow us to remove the set_lang() construct currently in use to force english error messages.

R/lint.R Outdated Show resolved Hide resolved
R/settings.R Outdated Show resolved Hide resolved
The file is read in using the \code{encoding} setting.
This setting found by taking the first valid result from the following locations
\enumerate{
\item The \code{encoding} key from the usual lintr configuration settings.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to ?default_settings here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?default_settings needs a little love I think. Currently it contains not much more than "An object of class list of length 12".
Maybe that would be a good place in general to document settings?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked ?default_settings kind of at random, more that "from the usual lintr configuration settings" won't be helpful if this is the first help page a user sees -- some link to help about what we mean there is needed.

If we don't have such a page yet then yes, let's use ?default_settings :)

Copy link
Collaborator

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty weak when it comes to encodings, but the API & testing look basically good to me. Good work!

@MichaelChirico MichaelChirico merged commit 8cd6add into master Jul 2, 2021
@MichaelChirico MichaelChirico deleted the feature/encoding branch July 2, 2021 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent handling of Encoding Issue using nchar for non-UTF-8 source files
2 participants