Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forms with unicode file names fail upload #196

Closed
yanokwa opened this issue Jan 17, 2021 · 5 comments · Fixed by getodk/central-frontend#427
Closed

Forms with unicode file names fail upload #196

yanokwa opened this issue Jan 17, 2021 · 5 comments · Fixed by getodk/central-frontend#427

Comments

@yanokwa
Copy link
Member

yanokwa commented Jan 17, 2021

Test file: tést.xlsx.zip

Central reports Something went wrong: there was no request.

Here's what I know so far...

  • Conversion with latest xlsform.getodk.org works, but Enketo preview fails with "Bad Request. Server URL is not a valid URL."
  • Conversion with curl to latest pyxform-http works
    • curl --request POST --data-binary @/path/to/tést.xlsx http://127.0.0.1:5000/api/v1/convert
  • Conversion with xls2xform works
  • Uploading the XML to Central works and so does the Enketo preview
  • Nothing obvious in Central logs

My gut says this is some interaction with Central and pyxform-http.

@yanokwa yanokwa changed the title Forms with unicode fail names fail upload Forms with unicode file names fail upload Jan 17, 2021
@matthew-white
Copy link
Member

When I look at the browser console, I see the error message

Failed to execute 'setRequestHeader' on 'XMLHttpRequest': String contains non ISO-8859-1 code point.

Reading about it, I think the issue has to do with the X-XlsForm-FormId-Fallback header containing a non-ASCII character.

Is a form ID allowed to contain a non-ASCII character? If so, then I think we should encode the filename before specifying it in the header. Probably percent-encoding would be easiest? Frontend would need to encode the header, and pyxform-http would need to decode it, but I'm not sure that Backend would need to change.

We generally encode form IDs throughout Central, and Enketo links in Central use tokens separate from the form ID, so hopefully this would be a one-off change.

@lognaturel
Copy link
Member

Is a form ID allowed to contain a non-ASCII character

Yes, it is.

Your analysis seems right to me and percent-encoding between frontend and pyxform-http seems like the right approach.

@matthew-white
Copy link
Member

Sounds good! @yanokwa, are you able to add percent-decoding to pyxform-http? If so, I'll plan to add percent-encoding to Frontend.

@matthew-white
Copy link
Member

As an update, as part of v1.2, we will have a new header, X-Action-Notes, that will need to be percent-encoded. Given that, I do think it makes for Frontend to percent-encode X-XlsForm-FormId-Fallback, as long as it's not hard for pyxform-http to decode it.

@lognaturel
Copy link
Member

Related: getodk/collect#4554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants