Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings #23

Open
gnumonik opened this issue Mar 21, 2024 · 0 comments
Open

Strings #23

gnumonik opened this issue Mar 21, 2024 · 0 comments
Assignees

Comments

@gnumonik
Copy link
Collaborator

Context

We need to figure out how to handle string encoding across Purus and PIR/PLC.

Motivation

From Language.PureScript.PSString

-- |
-- Strings in PureScript are sequences of UTF-16 code units, which do not
-- necessarily represent UTF-16 encoded text. For example, it is permissible
-- for a string to contain *lone surrogates,* i.e. characters in the range
-- U+D800 to U+DFFF which do not appear as a part of a surrogate pair.
--
-- The Show instance for PSString produces a string literal which would
-- represent the same data were it inserted into a PureScript source file.
--
-- Because JSON parsers vary wildly in terms of how they deal with lone
-- surrogates in JSON strings, the ToJSON instance for PSString produces JSON
-- strings where that would be safe (i.e. when there are no lone surrogates),
-- and arrays of UTF-16 code units (integers) otherwise.
--

It seems likely that with the current framework, a string may not display as intended (e.g. in part of an error message), and this may break compatibility when integrated with others tools or languages.

I haven't looked at the Parser in any detail, but I think that these PSStrings are parsed early on (so we can't just e.g. intervene whenever it translates from Text or whatever, b/c afaict it doesn't ever do that).

This is not just a minor an annoyance - if a user writes a validator in Purus, and that validator is parameterized by a string argument, or examines a string in a datum/redeemer, that validator may not work with offchain tools that do not understand the PSString encoding.

The Ask

We figure out how to handle this. @klntsky I presume you have some insights from CTL?

Acceptance Criteria

How can we decide if the task is complete?

The representation of Strings used by the compiler is compatible with the representations used by other tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants