Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES* does not contain entries for ALL code points? #33

Open
JVDptt opened this issue Jun 17, 2022 · 2 comments

Comments

@JVDptt
Copy link

JVDptt commented Jun 17, 2022

This code should print the entire (code point -> canonical name)
mapping, right ? :

 (with-hash-table-iterator ( it CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES* )
  (loop
   (multiple-value-bind (more i) (it)
    (progn
     (when (not more) (return))
      (format t "~A ~A~%" i (gethash i CL-UNICODE::*CODE-POINTS-TO-UNICODE1-NAMES*))
     )
    )
   )
  )

But that print-out misses some characters, such as \u2248 and \u2249 - where are they ? :
an excerpt from the print-out obtained by running the above code:
...
8682 WHITE UP ARROW FROM BAR
8788 COLON EQUAL
8789 EQUAL COLON
8804 LESS THAN OR EQUAL TO
...
Why isn't it printing \u2248 (8776) : '≈' or \u2449 (8777) : ' ≉' ?
Yet unicode-name resolves them OK :
CL-USER> (CL-UNICODE:unicode-name #\u2248)
-> "ALMOST EQUAL TO"
CL-USER> (CL-UNICODE:unicode-name #\u2249)
-> "NOT ALMOST EQUAL TO"

And they are in Unicode v1 :
CL-USER> (CL-UNICODE:age #\u2248)
-> (1 1)

So that symbol is in unicode v1, so it should be a unicode1 name, and hence in
the hash table ? What am I missing ? Why doesn't the print-out produced
by above code include #\ALMOST_EQUAL_TO ?

Just wondering what the rules for inclusion in that table were,
and if there is a more complete way of printing ALL recognized
code points and names ?

Is cl-unicode somehow checking my locale and deciding which version
of unicode names to include in the table, and omitting some because of version issues ?

It is very easy to print out a unicode table with eg. bash, not so
easy to browse it by symbol name / meaning :-)

Thanks for cl-unicode!
Best Regards,
Jason

@gefjon
Copy link

gefjon commented Jun 17, 2022

*code-points-to-unicode1-names* is an internal variable, and shouldn't be treated as part of CL-UNICODE's interface.

That map contains only Unicode v1.0 code points, and as age is telling you, the characters you're asking about were introduced in Unicode v1.1.

If you want to print all the Unicode characters known to CL-UNICODE, you can do:

(defun print-all-unicode-chars (&optional (stream *standard-output*))
  (loop :for i :below cl-unicode:+code-point-limit+
        :for name := (cl-unicode:unicode-name i)
        :when name
          :do (format stream "~&~d ~a ~a~%" i (cl-unicode:age i) name)))

EDIT: markdown formatting

@JVDptt
Copy link
Author

JVDptt commented Jun 17, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants