why CL-UNICODE::CODE-POINTS-TO-UNICODE1-NAMES does not contain entries for ALL code points? #33

JVDptt · 2022-06-17T14:54:01Z

This code should print the entire (code point -> canonical name)
mapping, right ? :

 (with-hash-table-iterator ( it CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES* )
  (loop
   (multiple-value-bind (more i) (it)
    (progn
     (when (not more) (return))
      (format t "~A ~A~%" i (gethash i CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES*))
     )
    )
   )
  )

But that print-out misses some characters, such as \u2248 and \u2249 - where are they ? :
an excerpt from the print-out obtained by running the above code:
...
8682 WHITE UP ARROW FROM BAR
8788 COLON EQUAL
8789 EQUAL COLON
8804 LESS THAN OR EQUAL TO
...
Why isn't it printing \u2248 (8776) : '≈' or \u2449 (8777) : ' ≉' ?
Yet unicode-name resolves them OK :
CL-USER> (CL-UNICODE:unicode-name #\u2248)
-> "ALMOST EQUAL TO"
CL-USER> (CL-UNICODE:unicode-name #\u2249)
-> "NOT ALMOST EQUAL TO"

And they are in Unicode v1 :
CL-USER> (CL-UNICODE:age #\u2248)
-> (1 1)

So that symbol is in unicode v1, so it should be a unicode1 name, and hence in
the hash table ? What am I missing ? Why doesn't the print-out produced
by above code include #\ALMOST_EQUAL_TO ?

Just wondering what the rules for inclusion in that table were,
and if there is a more complete way of printing ALL recognized
code points and names ?

Is cl-unicode somehow checking my locale and deciding which version
of unicode names to include in the table, and omitting some because of version issues ?

It is very easy to print out a unicode table with eg. bash, not so
easy to browse it by symbol name / meaning :-)

Thanks for cl-unicode!
Best Regards,
Jason

The text was updated successfully, but these errors were encountered:

gefjon · 2022-06-17T15:08:00Z

*code-points-to-unicode1-names* is an internal variable, and shouldn't be treated as part of CL-UNICODE's interface.

That map contains only Unicode v1.0 code points, and as age is telling you, the characters you're asking about were introduced in Unicode v1.1.

If you want to print all the Unicode characters known to CL-UNICODE, you can do:

(defun print-all-unicode-chars (&optional (stream *standard-output*))
  (loop :for i :below cl-unicode:+code-point-limit+
        :for name := (cl-unicode:unicode-name i)
        :when name
          :do (format stream "~&~d ~a ~a~%" i (cl-unicode:age i) name)))

EDIT: markdown formatting

JVDptt · 2022-06-17T20:23:37Z

Many thanks, Phoebe - yes, that clarifies many things. All the best, Jason

…

On Fri, 17 Jun 2022 at 16:08, Phoebe Goldman ***@***.***> wrote: *code-points-to-unicode1-names*` is an internal variable, and shouldn't be treated as part of CL-UNICODE's interface. That map contains only Unicode v1.0 code points, and as age is telling you, the characters you're asking about were introduced in Unicode v1.1. If you want to print all the Unicode characters known to CL-UNICODE, you can do: (defun print-all-unicode-chars (&optional (stream *standard-output*)) (loop :for i :below cl-unicode:+code-point-limit+ :for name := (cl-unicode:unicode-name i) :when name :do (format stream "~&~d ~a ~a~%" i (cl-unicode:age i) name))) — Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZTWV4C5PMNLOEB5OUVEOZLVPSIFXANCNFSM5ZCREUNQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why CL-UNICODE::CODE-POINTS-TO-UNICODE1-NAMES does not contain entries for ALL code points? #33

why CL-UNICODE::CODE-POINTS-TO-UNICODE1-NAMES does not contain entries for ALL code points? #33

JVDptt commented Jun 17, 2022 •

edited

Loading

gefjon commented Jun 17, 2022 •

edited

Loading

JVDptt commented Jun 17, 2022 via email

why CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES* does not contain entries for ALL code points? #33

why CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES* does not contain entries for ALL code points? #33

Comments

JVDptt commented Jun 17, 2022 • edited Loading

gefjon commented Jun 17, 2022 • edited Loading

JVDptt commented Jun 17, 2022 via email

why CL-UNICODE::CODE-POINTS-TO-UNICODE1-NAMES does not contain entries for ALL code points? #33

why CL-UNICODE::CODE-POINTS-TO-UNICODE1-NAMES does not contain entries for ALL code points? #33

JVDptt commented Jun 17, 2022 •

edited

Loading

gefjon commented Jun 17, 2022 •

edited

Loading