Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue using create_dataset() #55

Open
sindribaldur opened this issue Oct 26, 2020 · 9 comments
Open

Encoding issue using create_dataset() #55

sindribaldur opened this issue Oct 26, 2020 · 9 comments
Assignees
Labels
info:beginner Good for newcomers pkg:api api related activities prio:asap Fix as soon as possible status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working

Comments

@sindribaldur
Copy link

sindribaldur commented Oct 26, 2020

Using a Python script with create_dataset() I created a a new dataset on demo.dataverse.org (and one more dataverse server).

api = Api(base_url = dvserver, api_token = dvtoken)
api.create_dataset("1", dsmd)

Where dsmd is the content of dataset-finch1.json as a string (and slightly modified version of it for my last test) linked in the documentation.

dsmd = """{
  "datasetVersion": {
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Dörwin's Fænches",
     .
     .
     .
"""

Everything seems to work fine but non ascii characters are not displayed (replace with ) when I open the dataverse through the browser nor when download it back with get_dataset().

I'm on Windows 10 with Python 3.6.4 and pyDataverse 0.2.1. I tried to run it as a script from the command line and in Spyder with the same result.

@skasberger skasberger self-assigned this Nov 2, 2020
@skasberger skasberger added info:beginner Good for newcomers pkg:api api related activities prio:medium status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working labels Nov 2, 2020
@skasberger
Copy link
Member

skasberger commented Nov 2, 2020

Thanks for your issue!

Can you please try this again with the latest commit from the develop branch? (see here).

Here the develop branch and the docs for it. The next release with a lot of changes will be release until the end of the year, and hopefully the problem is solved by the changes. If not, I will try to reproduce and fix it.

And please test out, when the encoding conversions/problems start to appear. Is the imported json string already wrong, or does it happen after the create_dataset() call.

@poikilotherm
Copy link
Member

Beware this might be related to IQSS/dataverse#6675 and some other issues (older & newer) I linked to from there.

I dug a bit and the plot thickens around encoding issues in Jersey. Yet it would be good to have verification that it's not the lib that's causing problems.

@sindribaldur
Copy link
Author

sindribaldur commented Nov 12, 2020

I cloned this repository and tried rerunning the script by importing src/pyDataverse/api.py and it didn't fix the issue. The json string also seems to be ok within Python.

The high number of related open issues over at IQSS/dataverse suggests it has to do with some aspects of Dataverse itself but when I create a dataset through the user interface I can enter the characters that get lost via the API upload.

@sindribaldur
Copy link
Author

I'm now creating datasets directly with curl and the instructions from here and have not come across any encoding issues.

@skasberger
Copy link
Member

Please send me the dataset json file (contact information here: stefankasberger.at).

@skasberger skasberger added this to the v0.3.0 milestone Dec 21, 2020
@skasberger
Copy link
Member

skasberger commented Jan 11, 2021

@sindribaldur Have tested your script with the latest develop version and a local Dataverse Docker instance (4.18.1). Got the same problem as you. Dataset is created, but with many special characters missing (as you can see from the screenshot). The uploaded string is formatted the same as the string inside your script. When i request the Dataset then with pyDataverse (get_dataset()), the metadata responded has the same question special characters as seen on the frontend. So the problem seems to be after the create_dataset() call.

@pdurbin Do you know of this problem? It seems to be on the Dataverse side.

And @sindribaldur: I have found one error in your JSON string in your testy.py file you have sent me. At "Þvæla Tilraun`, the apostroph at the end is missing. Just in case. :)

Screenshot_2021-01-12 �v�la Tilraun

@skasberger skasberger modified the milestones: v0.3.0, v0.4.0 Jan 11, 2021
@pdurbin
Copy link
Member

pdurbin commented Jan 21, 2021

@skasberger hi, plenty of encoding problems have surfaced over the years but I'm not aware of this being a problem on the Dataverse side. It sounds like @sindribaldur got it working with curl. @sindribaldur would you be interested in trying it from https://github.com/IQSS/dataverse-client-javascript or https://github.com/IQSS/dataverse-client-r ? Those are the other two libraries that are quite active.

@sindribaldur
Copy link
Author

@skasberger Thanks for getting back - I hope the example helps. @pdurbin Thank you, I got everything that I needed working directly with curl and guess I will continue using it like that if needed in the future. I had tried one of these packages before and also hit a wall.

@skasberger skasberger added prio:asap Fix as soon as possible and removed prio:medium labels Feb 2, 2021
@skasberger skasberger mentioned this issue Mar 14, 2021
35 tasks
@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2024

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

@pdurbin pdurbin removed this from the v0.4.0 milestone Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info:beginner Good for newcomers pkg:api api related activities prio:asap Fix as soon as possible status:confirmed Is a valid issue and will be moved forward soon. type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants