New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ByteArray# literals #135
ByteArray# literals #135
Conversation
fcd81e7
to
a09f4f3
Compare
I like this a lot |
One proposed alternative is to use double hash syntax ``"foo"##`` to represent | ||
UTF8 ``ByteArray#``. That variant is very limited in power compared to proposed. | ||
|
||
In fact, we might introduce similar syntax for the number literals, e.g. ``120#i8``, if we get `Int8#` primitive type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the syntax that you propose above is much preferred to the "keep-adding-more-hashes" approach. I would really like to see the this syntax adopted for numeric llterals at some point as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd be nice, agreed
|
||
Therefore, code with current primitive strings won't break. | ||
|
||
*Unresolved:* Should there be a (``-Wall``) warning in this case, asking user to be explicit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see the advantage. Implicit is fine given this syntax has a long history here and avoids the nasty compatibility issues that warning generally brings.
.. code-block:: haskell | ||
|
||
"primitive"# -- Addr# in utf8 | ||
"string -- String or IsString a => a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting. I corrected others as well.
|
||
.. code-block:: haskell | ||
|
||
"hello# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No quotes?
|
||
These literals are ``[Word8]`` literals, *primitive string literal must contain only characters <= '\xFF'*. | ||
|
||
Ordinary strings, like ``"hello``, ``"Юникод"``, ``"\NUL"`` are then desugared as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No quotes after "hello".
|
||
.. highlight:: haskell | ||
|
||
This proposal is `discussed at this pull request <https://github.com/ghc-proposals/ghc-proposals/pull/134>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
134 should be 135.
Question:
How will escaping be handled? Or is that already addressed via the proposal
implicitly?
…On Tue, May 15, 2018 at 12:37 PM Nikolai Kuklin ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In proposals/0000-bytearray-literals.rst
<#135 (comment)>
:
> +This is a proposal to introduce ``ByteArray#`` and ``(# Word#, Addr# #)``
+literals, and slightly change ``Addr#`` literals. In short you'll be able
+to write
+
+.. code-block:: haskell
+
+ "Literals"#b -- ByteArray#
+ "\xef\xbb\xbf"#abytes -- Addr#
+ "Юникод"#ucp1251 -- (# Int#, Addr# #)
+
+additionally to current
+
+.. code-block:: haskell
+
+ "primitive"# -- Addr# in utf8
+ "string -- String or IsString a => a
No quotes?
------------------------------
In proposals/0000-bytearray-literals.rst
<#135 (comment)>
:
> +
+::
+
+ perl -e 'print "a\x00b\x00c\x00\x00\xd8"'|iconv -f utf16le -t utf8|hexdump -C
+
+You may append two bytes to the input, try to make correct surrogate pair!
+
+
+Primitive string without modifier
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The current primitive string
+
+.. code-block:: haskell
+
+ "hello#
No quotes?
------------------------------
In proposals/0000-bytearray-literals.rst
<#135 (comment)>
:
> +``ByteArray#`` Yes Yes No No
+====================== =========== ======= ======== ===========
+
+
+Recap: String desugaring currently
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, it's possible to create primitive ``Addr#`` string literals:
+
+.. code-block:: haskell
+
+ "hello"# -- :: Addr#
+
+These literals are ``[Word8]`` literals, *primitive string literal must contain only characters <= '\xFF'*.
+
+Ordinary strings, like ``"hello``, ``"Юникод"``, ``"\NUL"`` are then desugared as
No quotes after "hello".
------------------------------
In proposals/0000-bytearray-literals.rst
<#135 (comment)>
:
> @@ -0,0 +1,345 @@
+.. proposal-number:: Leave blank. This will be filled in when the proposal is
+ accepted.
+
+.. trac-ticket:: Leave blank. This will eventually be filled with the Trac
+ ticket number which will track the progress of the
+ implementation of the feature.
+
+.. implemented:: Leave blank. This will be filled in with the first GHC version which
+ implements the described feature.
+
+.. highlight:: haskell
+
+This proposal is `discussed at this pull request <#134>`_.
134 should be 135.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#135 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQwhrKX6k9bO76fQWJL7gKyuKc9Nauks5tywRggaJpZM4T-clu>
.
|
@cartazio what you mean by escaping? |
The proposal helpfully lists a bunch of motivations with Trac tickets. I like that it resolves a bunch of related issues. But could you add a section going through these tickets one by one and explaining how the tickets are thereby resolved? Trac #5218 speaks of
How will that desugar. Will good things happen? |
The motivation has the example
but What does "algebraic" mean in the table in the motivation section? Under "string syntax desugaring", the proposal says we'll use Why not use Would these The proposal doesn't really motivate the addition of the |
There is some good feedpack. @phadej, are you going to incoporate it into the proposal? |
I will. A bit busy with other stuff right now.
…Sent from my iPhone
On 23 Jun 2018, at 20.41, Joachim Breitner ***@***.***> wrote:
There is some good feedpack. @phadej, are you going to incoporate it into the proposal?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Any updates? A lot of people shot themselves in their feet due to |
|
||
data ShortByteString = SBS ByteArray# | ||
|
||
instance FromString ShortByteString where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are talking about the IsString
class.
@fumieval I might look into updating the proposal during holidays |
It seems to me that the proposed literals are just TH expression quasi-quoters with some very thin syntactic sugar: "abc123"#enc <=> [enc|abc123|] So i think it would be better to reuse and to extend quasi-quoters instead:
Currently GHC (via TH) supports "e", "t", "d" and "p" quasi-quoters. Add some common ones, for instance:
As usual we can export names from ghc-prim for built-in quasi-quoters in order to require an "import" to make those quasi-quoters in scope instead of polluting the global namespace.
Built-in or plugged-in quasi-quoters don't need template-haskell (which is not always available).
Redefine
and use the "raw" operations. |
@hsyl20 AFAIK QuasiQuotes cannot be used with stage1 compiler, i.e. you cannot use them in Also this proposal is not only about syntax. |
@phadej yes I know. See point (2): "Built-in or plugged-in quasi-quoters don't need template-haskell (which is not always available)." The idea is to extract quasi-quoters from TH to put them into GHC in order to generalize them and to reuse their syntax:
|
Is worth own proposal. I won't even try to sneak it in through this one. |
I won't be able to finish this proposal in a foreseeable future |
I see this has become dormant... Any chance it will be revived? |
This is a proposal to introduce
ByteArray#
and(# Word#, Addr# #)
literals, and slightly changeAddr#
literals. In short you'll be able to writeRendered