New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for radix, Unicode #247

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@brpocock
Contributor

brpocock commented Aug 12, 2016

This is a merge of the Radix & Unicode support from radishes branch, designed to be more amenable to merging with master. The tests are bundled separately.

Bundled set of changes to support reading and printing things in arbitrary numeric radix 2-36 and more.

  • Fix edge case in read-whole-file triggered by UTF-8 (bytes vs. chars) leaving #\null bytes after prelude
  • Checks output of !compile-file for presence of control characters like #\null (built-in regression-type check)
  • Defines a somewhat-useful assert.
  • Moves some character-related code from boot.lisp and char.lisp into new file early-char.lisp, removing duplication between them.
  • Provides a usable approximation of equalp
  • Binds (and, in many cases, actually uses the value of):
    • *print-escape*
    • *print-readably*
    • *print-circle*
    • *print-radix*
    • *print-base*
    • *read-base*
  • Makes a change to graphic-char-p to filter out both control codes as well as high & low surrogates and invalid code-points
  • char-name returns U+xxxx for characters without a name — ie, char-code ≥ #x100
  • name-char accepts U+xxxx notation (also u+)
  • js-escape-string properly handles escaping control characters C0 and C1, as
    • #\return → \r
    • #\tab → \t
    • #\page → \f
    • #\backspace → \b
    • #\null → \0
    • code-char 1…26 → \cA … \cZ
    • code-char 27…31 → \x1B … \x1F
    • code-char 127…159 → \u00XX
  • Some uses of #\' literals changed in source to #\apostrophe because (at least my Gnu) Emacs misread them and threw off indentation/highlighting. Sadly this is an Emacs bug, not a problem with the JSCL sources.
  • Implemented logior (needed in this branch) and logand, logxor (for completeness)
  • safe-js-name used to convert Lisp symbols into similar-looking Javascript symbols. This, in turn, is used by gvarname and genlit and some other places to produce more æsthetic “object code.” These names are (much) more verbose, like v123var_ELEMENT_COUNT_123. I assume two things: More similar names will make debugging easier; and if you care how long the file is, you will run CL-UGLIFY or some Javascript minifier over the output, which will rename everything for you.
  • Many places in the JS will throw new Error(…) rather than throw "string" to enable backtraces.
  • Some JS errors are more verbose, to aid debugging.
  • digit-char-p handles digits in base 2-36, and accepts any Unicode decimal digit or ASCII or fullwidth Latin letters.
  • digit-char works for base 2-36
  • Prelude ƒ char_to_codepoint does not break on malformed strings passed in, but may signal an error; also safe_char_upcase. (Errors in the Unicode reading could throw garbage strings at these functions, which were otherwise quite hard to debug.)
  • *standard-output* will echo to console.error earlier in the startup process so bootstrapping assertion failures can be reported.
  • potential-number-p honors *read-base*
  • write-integer honors *print-base*
  • writing character literals uses #\Space (arguably what CLHS prefers, but SBCL likes #\ instead), or all other C0/C1 controls appear as #\ . #'char-name
  • princ handles printing (aesthetically) characters and symbols (particularly keywords) correctly.
  • format:
    • decimal, quoted-char, or v arguments, : or @ modifiers captured.
    • ~a and ~_mincol_a with space-padding (only); ~a works more correctly due to princ fixes.
    • ~b, ~o, ~x printing
    • ~_n_r for 2 ≤ n ≤ 36.
    • ~_mincol_d, ~_mincol_x with space-padding.
    • ~@b, ~@d, ~@o, ~@x
    • ~:b, ~:d, ~:o, ~:x recognized but not working 100%
    • ~:r, ~@:r, ~@r print “placeholders” like #<Roman numeral 25>
    • ~% (as before)
    • ~& prints #\newline; ~1& prints #\newline; ~n& 2 ≤ n prints (1- n) newlines.
    • ~w should work, to the limits of write
    • ~c, ~:c, ~@c
    • ~e, ~f, ~g, ~$ are recognized but treated as ~s
    • ~( ~{ ~[ ~< ~> ~] ~} ~) issue warnings.
    • ~t, ~_n_t prints (1 resp. n) spaces
    • ~p, ~@p, ~:p, ~:@p
    • ~~ (as before)
    • ~:#\newline (but ~#\newline acts like ~:#\newline also)
    • ~| (but not: ~n|)
    • ~* (as before), but also ~:* (but not: ~@*)
  • Minimal with-input-from-string
  • %peek-char supports look-ahead
  • Reader accepts:
    • #\u+x or #\ + (any name-char)
    • #\b #\o #\x literals
    • #nr literals are parsed
    • #n( arrays are spotted, but signal an error if encountered.
    • *read-base* is honored
  • parse-integer accepts :start :end :radix
  • ls-readread
  • substitute-if[-not] implemented
  • substitute accepts :start :end :count, recognizes (but ignores with warning) :test-not :from-end
  • every handles walking more-seqs in parallel, eg (every #'= '(1 2 3) '(1 2 3))
  • concatenate for 'string or 'list
  • integer-to-string honors *print-base* or accepts a radix. Also takes an optional plusp to print ~@d &c.

The related test cases are in the parallel PR.

@davazp

This comment has been minimized.

Show comment
Hide comment
@davazp

davazp Aug 16, 2016

Member

Is this branch ready now? I see some white space issues but I can spend some time this afternoon to clean it up and merge it.

Member

davazp commented Aug 16, 2016

Is this branch ready now? I see some white space issues but I can spend some time this afternoon to clean it up and merge it.

@brpocock

This comment has been minimized.

Show comment
Hide comment
@brpocock

brpocock Aug 16, 2016

Contributor

I believe so. I know it's rather extensive changes all over, but became a
bit hard to disentangle any further.

On Tue, Aug 16, 2016, 02:43 David Vázquez Púa notifications@github.com
wrote:

Is this branch ready now? I see some white space issues but I can spend
some time this afternoon to clean it up and merge it.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HO2-UcA-p_buFZRTSMQGVnf4RylFPks5qgVv1gaJpZM4JjNvB
.

Contributor

brpocock commented Aug 16, 2016

I believe so. I know it's rather extensive changes all over, but became a
bit hard to disentangle any further.

On Tue, Aug 16, 2016, 02:43 David Vázquez Púa notifications@github.com
wrote:

Is this branch ready now? I see some white space issues but I can spend
some time this afternoon to clean it up and merge it.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HO2-UcA-p_buFZRTSMQGVnf4RylFPks5qgVv1gaJpZM4JjNvB
.

@brpocock

This comment has been minimized.

Show comment
Hide comment
@brpocock

brpocock Aug 16, 2016

Contributor

PS. I noticed that the code-base as a whole isn't using typical Emacs/Slime indentation, is there some combination of settings that I could use to mimic the existing style?

Contributor

brpocock commented Aug 16, 2016

PS. I noticed that the code-base as a whole isn't using typical Emacs/Slime indentation, is there some combination of settings that I could use to mimic the existing style?

@davazp

This comment has been minimized.

Show comment
Hide comment
@davazp

davazp Aug 16, 2016

Member

It should be emacs-slime. I suspect that when it doesn't it's because slime wasn't running.

It's okay to fix it although it would be better in specific commits for it.

Member

davazp commented Aug 16, 2016

It should be emacs-slime. I suspect that when it doesn't it's because slime wasn't running.

It's okay to fix it although it would be better in specific commits for it.

@davazp

This comment has been minimized.

Show comment
Hide comment
@davazp

davazp Aug 16, 2016

Member

I have questions about some of the changes, but the PR is a bit of a mess still.

I will clean it up a bit and create pull requests so we can comment them.

Member

davazp commented Aug 16, 2016

I have questions about some of the changes, but the PR is a bit of a mess still.

I will clean it up a bit and create pull requests so we can comment them.

@brpocock

This comment has been minimized.

Show comment
Hide comment
@brpocock

brpocock Aug 16, 2016

Contributor

If you've time to split it up, thanks, by all means. I'm afraid there's
quite a bit of “co-dependency” between the various bits and I shan't have
time soon to parcel it out any finer than it is, but if you have questions
on bits of it, you can inject commentary on the “diff” view here on GitHub
to discuss particular sections as well.

On Tue, Aug 16, 2016 at 6:43 PM David Vázquez Púa notifications@github.com
wrote:

I have questions about some of the changes, but the PR is a bit of a mess
still.

I will clean it up a bit and create pull requests so we can comment them.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HOyBe4xUC--s344I86AH5e6xt_O6Fks5qgj0bgaJpZM4JjNvB
.

Contributor

brpocock commented Aug 16, 2016

If you've time to split it up, thanks, by all means. I'm afraid there's
quite a bit of “co-dependency” between the various bits and I shan't have
time soon to parcel it out any finer than it is, but if you have questions
on bits of it, you can inject commentary on the “diff” view here on GitHub
to discuss particular sections as well.

On Tue, Aug 16, 2016 at 6:43 PM David Vázquez Púa notifications@github.com
wrote:

I have questions about some of the changes, but the PR is a bit of a mess
still.

I will clean it up a bit and create pull requests so we can comment them.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HOyBe4xUC--s344I86AH5e6xt_O6Fks5qgj0bgaJpZM4JjNvB
.

@@ -97,19 +96,31 @@ accumulated, in the order."
`(let ((it ,condition))
(when it ,@body)))
(defun integer-to-string (x)
(defun integer-to-string (x &optional (radix (or *print-base* 10)) plusp)

This comment has been minimized.

@davazp

davazp Aug 16, 2016

Member

integer-to-string is loaded pretty early. I will remove the case for *print-radix* and supports just the sign and the base.

The rest is responsibility for the printer. It's also easier then because we have better streams support.
.
Also, note that the radix prexis should be printed before the sign, like in #x-FF

@davazp

davazp Aug 16, 2016

Member

integer-to-string is loaded pretty early. I will remove the case for *print-radix* and supports just the sign and the base.

The rest is responsibility for the printer. It's also easier then because we have better streams support.
.
Also, note that the radix prexis should be printed before the sign, like in #x-FF

This comment has been minimized.

@brpocock

brpocock Aug 16, 2016

Contributor

I think the heart of this (with the order-of-sign fix) can/should port up into a print-number function perhaps, then, to preserve the *print-radix*/*print-base* logic between print & format?

I didn't even realize the #x-ff case, totally missed that. I should probably add a test case for that.

@brpocock

brpocock Aug 16, 2016

Contributor

I think the heart of this (with the order-of-sign fix) can/should port up into a print-number function perhaps, then, to preserve the *print-radix*/*print-base* logic between print & format?

I didn't even realize the #x-ff case, totally missed that. I should probably add a test case for that.

This comment has been minimized.

@davazp

davazp Aug 16, 2016

Member

Please, wait for the test until I rebase it :)

@davazp

davazp Aug 16, 2016

Member

Please, wait for the test until I rebase it :)

(defun %peek-char (stream)
(and (< (cdr stream) (length (car stream)))
(char (car stream) (cdr stream))))
(defun %peek-char (stream &optional (look-ahead 0))

This comment has been minimized.

@davazp

davazp Aug 16, 2016

Member

We shouldn't use look-ahead. We can't support that for arbitrary streams.

The current in-memory string representation is just a basic case for convenience, but the reader shouldn't assume that it's working on it.

I will not include the reader changes yet.

@davazp

davazp Aug 16, 2016

Member

We shouldn't use look-ahead. We can't support that for arbitrary streams.

The current in-memory string representation is just a basic case for convenience, but the reader shouldn't assume that it's working on it.

I will not include the reader changes yet.

This comment has been minimized.

@brpocock

brpocock Aug 16, 2016

Contributor

I seem to recall this needed for guessing whether to dispatch #\U+xx or #\name-char. That bit can probably be rewritten to to use its own buffer to fetch the first couple of characters itself. I don't think I resorted to this elsewhere.

For near-future use-case, I'd expect the next sources of streams would rather be the results of an XHR call, arriving as a “whole” buffer at a time, so this seemed like a “safe enough” operation for that case.

Admittedly there could be some streams in future that can't support a read-ahead at all, but I don't off-hand know of one that can accept a one-char-ahead peek and not accept a 2-char-ahead peek.

@brpocock

brpocock Aug 16, 2016

Contributor

I seem to recall this needed for guessing whether to dispatch #\U+xx or #\name-char. That bit can probably be rewritten to to use its own buffer to fetch the first couple of characters itself. I don't think I resorted to this elsewhere.

For near-future use-case, I'd expect the next sources of streams would rather be the results of an XHR call, arriving as a “whole” buffer at a time, so this seemed like a “safe enough” operation for that case.

Admittedly there could be some streams in future that can't support a read-ahead at all, but I don't off-hand know of one that can accept a one-char-ahead peek and not accept a 2-char-ahead peek.

Show outdated Hide outdated src/print.lisp
@@ -58,6 +58,7 @@
;;; compiled in the host
(defvar *source*
'(("boot" :target)
("early-char" :target)

This comment has been minimized.

@davazp

davazp Aug 16, 2016

Member

why early char?

are we actually calling any of this functions before char.lisp is loaded?

Note that the body of the macros are executed in the host, therefore the host version of the functions are used. Only :load-toplevel would force us to use early-char.lisp.

For example, if you use it in the value of a defvar, but it is not very common.

It is okay for early functions to call later functions, as long as the the early function is not called until later.

Anyway, I can also just try to put it together with char.lisp.

@davazp

davazp Aug 16, 2016

Member

why early char?

are we actually calling any of this functions before char.lisp is loaded?

Note that the body of the macros are executed in the host, therefore the host version of the functions are used. Only :load-toplevel would force us to use early-char.lisp.

For example, if you use it in the value of a defvar, but it is not very common.

It is okay for early functions to call later functions, as long as the the early function is not called until later.

Anyway, I can also just try to put it together with char.lisp.

This comment has been minimized.

@brpocock

brpocock Aug 16, 2016

Contributor

I wasn't 100% sure why some were in boot, and then redefined in char. My guess was that, the ones in boot were needed earlier-on, so rather than having two different implementations, I consolidated them here. Perhaps I was too cautious.

@brpocock

brpocock Aug 16, 2016

Contributor

I wasn't 100% sure why some were in boot, and then redefined in char. My guess was that, the ones in boot were needed earlier-on, so rather than having two different implementations, I consolidated them here. Perhaps I was too cautious.

@@ -281,6 +281,25 @@
(list (first v) (third v))))
varlist)))))))
(defmacro declare (&rest declarations)

This comment has been minimized.

@davazp

davazp Aug 16, 2016

Member

Declare isn't a macro actually.

We process declare in many forms (like lambdas). If it is missing somewhere, you can use parse-body (defined in compiler.lisp) to extract declarations and optional docstring from a body.

@davazp

davazp Aug 16, 2016

Member

Declare isn't a macro actually.

We process declare in many forms (like lambdas). If it is missing somewhere, you can use parse-body (defined in compiler.lisp) to extract declarations and optional docstring from a body.

This comment has been minimized.

@brpocock

brpocock Aug 16, 2016

Contributor

This was to bypass errors in which DECLARE was being assumed as a function during bootstrap. Agreed, it should be a special form that lambda/macro processors pull out and handle.

@brpocock

brpocock Aug 16, 2016

Contributor

This was to bypass errors in which DECLARE was being assumed as a function during bootstrap. Agreed, it should be a special form that lambda/macro processors pull out and handle.

This comment has been minimized.

@brpocock

brpocock Aug 16, 2016

Contributor

If I recall correctly, I added some (declare (ignore …)) to silence SBCL warnings in the host pass, which threw out some error about undefined function declare during the target pass.

@brpocock

brpocock Aug 16, 2016

Contributor

If I recall correctly, I added some (declare (ignore …)) to silence SBCL warnings in the host pass, which threw out some error about undefined function declare during the target pass.

@brpocock brpocock referenced this pull request Aug 18, 2016

Merged

Fix integer printing #28

brpocock added some commits Aug 12, 2016

Tests for Unicode, numeric radix
Note that these will usually crash the master branch.
Rebase from upstream master
Also, using the SAFE-JS-FUN-NAME in preference to new JSIZE function
since they serve approximately the same purpose, but the SAFE-JS-*
functions are used throughout this branch already.
Merge tests into the branch that they test.
nicode-rebase

In hindsight, it doesn't really make  sense to split the unit tests from
the code they're testing.

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
merge conflicts fix
Terribly stupid git merge markers made it in.
@brpocock

This comment has been minimized.

Show comment
Hide comment
@brpocock

brpocock Nov 1, 2016

Contributor

Rebased on current master and brought in the tests, which had previously (perhaps very foolishly) been in a separate PR.

Contributor

brpocock commented Nov 1, 2016

Rebased on current master and brought in the tests, which had previously (perhaps very foolishly) been in a separate PR.

@davazp

This comment has been minimized.

Show comment
Hide comment
@davazp

davazp Nov 1, 2016

Member

@brpocock is this PR ready to be reviewed and merged too?

Member

davazp commented Nov 1, 2016

@brpocock is this PR ready to be reviewed and merged too?

@brpocock

This comment has been minimized.

Show comment
Hide comment
@brpocock

brpocock Nov 1, 2016

Contributor

It's good to go, although the way it snaked through things — I know you'd
mentioned wanting to cut it into more bite-sized pieces, and I haven't had
a chance to really revisit it.

On Tue, Nov 1, 2016, 15:58 David Vázquez Púa notifications@github.com
wrote:

@brpocock https://github.com/brpocock is this PR ready to be reviewed
and merged too?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HO1d2QL-gVmVE4ZMdWYT-aaDdxOVWks5q55nMgaJpZM4JjNvB
.

Contributor

brpocock commented Nov 1, 2016

It's good to go, although the way it snaked through things — I know you'd
mentioned wanting to cut it into more bite-sized pieces, and I haven't had
a chance to really revisit it.

On Tue, Nov 1, 2016, 15:58 David Vázquez Púa notifications@github.com
wrote:

@brpocock https://github.com/brpocock is this PR ready to be reviewed
and merged too?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#247 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB-HO1d2QL-gVmVE4ZMdWYT-aaDdxOVWks5q55nMgaJpZM4JjNvB
.

@davazp

This comment has been minimized.

Show comment
Hide comment
@davazp

davazp Oct 4, 2017

Member

I obviously didn't have the energy to split the pull request in more handy incremental steps, so I'm closing this so it doesn't stay in the list forever.

If somebody has the time and energy, please go ahead and open a new PR and we'll revisit it.

Member

davazp commented Oct 4, 2017

I obviously didn't have the energy to split the pull request in more handy incremental steps, so I'm closing this so it doesn't stay in the list forever.

If somebody has the time and energy, please go ahead and open a new PR and we'll revisit it.

@davazp davazp closed this Oct 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment