Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9469 accented letters and other utf8 characters in Stata ingest #9582

Merged
merged 3 commits into from
May 12, 2023

Conversation

landreev
Copy link
Contributor

@landreev landreev commented May 9, 2023

What this PR does / why we need it:

This a 5 line fix for the Stata ingest plugin that I made some weeks ago when looking into a report from a user (#9469). Making a (draft) pr from the branch so that it's not forgotten.

The problem is straightforward, accented characters are garbled in the variable metadata labels (both the variable-level and the category value labels). (Only in the metadata! The values in the tab-delimited files are saved properly).

Which issue(s) this PR closes:

Closes #9469

Special notes for your reviewer:

Suggestions on how to test this:

A Stata file from the remote dataset from the original user report can be used for testing: https://data.aussda.at/file.xhtml?fileId=472&version=3.0 The file has both types of labels with accented characters in them.

The test is to ingest the file and look at the variable labels as exported in the DDI.
before:

<labl level="variable">laufende Nummer des Gespr��chsturns</labl>

after:

<labl level="variable">laufende Nummer des Gesprächsturns</labl>

etc.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@github-actions
Copy link

github-actions bot commented May 9, 2023

📦 Pushed preview application image as

ghcr.io/gdcc/dataverse:9469-stata-labels-utf8

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@landreev landreev marked this pull request as ready for review May 10, 2023 14:52
@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label May 10, 2023
@qqmyers qqmyers added this to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation May 10, 2023
@qqmyers qqmyers added this to the 5.14 milestone May 10, 2023
@scolapasta scolapasta moved this from Ready for Review ⏩ to Ready for QA ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 10, 2023
@kcondon kcondon self-assigned this May 11, 2023
@kcondon kcondon merged commit c21b223 into develop May 12, 2023
19 of 20 checks passed
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA ✅ to Done 🚀 May 12, 2023
@kcondon kcondon deleted the 9469-stata-labels-utf8 branch May 12, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 3 A percentage of a sprint. 2.1 hours.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Ingest of Stata file with diacritics characters produces not correct encoded variable metadata
3 participants