9469 accented letters and other utf8 characters in Stata ingest #9582

landreev · 2023-05-09T18:17:30Z

What this PR does / why we need it:

This a 5 line fix for the Stata ingest plugin that I made some weeks ago when looking into a report from a user (#9469). Making a (draft) pr from the branch so that it's not forgotten.

The problem is straightforward, accented characters are garbled in the variable metadata labels (both the variable-level and the category value labels). (Only in the metadata! The values in the tab-delimited files are saved properly).

Which issue(s) this PR closes:

Closes #9469

Special notes for your reviewer:

Suggestions on how to test this:

A Stata file from the remote dataset from the original user report can be used for testing: https://data.aussda.at/file.xhtml?fileId=472&version=3.0 The file has both types of labels with accented characters in them.

The test is to ingest the file and look at the variable labels as exported in the DDI.
before:

<labl level="variable">laufende Nummer des Gespr��chsturns</labl>

after:

<labl level="variable">laufende Nummer des Gesprächsturns</labl>

etc.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

… it safer. But still trivial enough for a fast-track treatment. (#9469)

…es in the dta reader. (#9469)

github-actions · 2023-05-09T18:23:44Z

📦 Pushed preview application image as

ghcr.io/gdcc/dataverse:9469-stata-labels-utf8

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

qqmyers

LGTM

landreev added 3 commits March 31, 2023 17:08

ok, it's not going to be literally a "2 line fix" - I decided to play…

a4306a7

… it safer. But still trivial enough for a fast-track treatment. (#9469)

so, yeah, here's the second part of the fix, in addition to the 2 lin…

615f67f

…es in the dta reader. (#9469)

Merge branch 'develop' into 9469-stata-labels-utf8

0b664d3

qqmyers approved these changes May 10, 2023

View reviewed changes

landreev marked this pull request as ready for review May 10, 2023 14:52

qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label May 10, 2023

qqmyers added this to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation May 10, 2023

qqmyers added this to the 5.14 milestone May 10, 2023

scolapasta moved this from Ready for Review ⏩ to Ready for QA ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 10, 2023

kcondon moved this from Ready for QA ⏩ to QA ✅ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 11, 2023

kcondon self-assigned this May 11, 2023

kcondon merged commit c21b223 into develop May 12, 2023
19 of 20 checks passed

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA ✅ to Done 🚀 May 12, 2023

kcondon deleted the 9469-stata-labels-utf8 branch May 12, 2023 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9469 accented letters and other utf8 characters in Stata ingest #9582

9469 accented letters and other utf8 characters in Stata ingest #9582

landreev commented May 9, 2023

github-actions bot commented May 9, 2023

qqmyers left a comment

9469 accented letters and other utf8 characters in Stata ingest #9582

9469 accented letters and other utf8 characters in Stata ingest #9582

Conversation

landreev commented May 9, 2023

github-actions bot commented May 9, 2023

qqmyers left a comment

Choose a reason for hiding this comment