9469 accented letters and other utf8 characters in Stata ingest #9582
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This a 5 line fix for the Stata ingest plugin that I made some weeks ago when looking into a report from a user (#9469). Making a (draft) pr from the branch so that it's not forgotten.
The problem is straightforward, accented characters are garbled in the variable metadata labels (both the variable-level and the category value labels). (Only in the metadata! The values in the tab-delimited files are saved properly).
Which issue(s) this PR closes:
Closes #9469
Special notes for your reviewer:
Suggestions on how to test this:
A Stata file from the remote dataset from the original user report can be used for testing: https://data.aussda.at/file.xhtml?fileId=472&version=3.0 The file has both types of labels with accented characters in them.
The test is to ingest the file and look at the variable labels as exported in the DDI.
before:
after:
etc.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: