Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sage.libs.ecl: Fix unicode handling #30106

Closed
mkoeppe opened this issue Jul 10, 2020 · 39 comments
Closed

sage.libs.ecl: Fix unicode handling #30106

mkoeppe opened this issue Jul 10, 2020 · 39 comments

Comments

@mkoeppe
Copy link
Member

mkoeppe commented Jul 10, 2020

As a follow-up to #29278, #29280: If we use Unicode variable names in SR, declaring a domain gives an error:

sage: SR.var('π', domain='real')
RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
SystemError: <built-in method var of sage.symbolic.ring.SymbolicRing object at 0x334506908> returned a result with an error set

This comes from our ECL interface:

sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('🔥')
sage: u_symbol
<repr(<sage.libs.ecl.EclObject at 0x337e7b3c8>) failed: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 2: invalid start byte>
sage: u_symbol.python()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 2: invalid start byte

Also note:

sage: b_symbol = EclObject(bytes([166]))
sage: b_symbol
<repr(<sage.libs.ecl.EclObject at 0x337e7b058>) failed: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 0: invalid start byte>

Depends on #22191

CC: @mwageringel @nbruin @dimpase @spaghettisalat

Component: symbolics

Author: Matthias Koeppe

Branch/Commit: 59dd62b

Reviewer: Markus Wageringel

Issue created by migration from https://trac.sagemath.org/ticket/30106

@mkoeppe

This comment has been minimized.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:3

Not sure if there is a function in the ECL C API that constructs a Unicode Lisp string.
So I would just be going through CODE-CHAR as in

sage: ecl_eval('''(type-of (map 'string #'code-char '(128293))))''')
<ECL: (SIMPLE-ARRAY CHARACTER (1))>

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe added this to the sage-9.2 milestone Jul 11, 2020
@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

b6ebb52src/sage/cpython/*string*: Update documentation
6ce8f9epython_to_ecl: Handle unicode strings

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Commit: 6ce8f9e

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:7

Now

sage: SR.var('π', domain='real')
π

but more work is needed.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from 6ce8f9e to c983523

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

c983523EclObject.str: Handle unicode

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:9
sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('🔥')
sage: u_symbol
<ECL: 🔥>

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

f237935ecl_to_python: Handle unicode strings

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from c983523 to f237935

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:11
sage: u_string = EclObject('"🐙"')
sage: u_string
<ECL: "🐙">
sage: _.python()
'"🐙"'

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

6f740ebecl_eval: Handle unicode strings

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from f237935 to 6f740eb

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:13
sage: clock = ecl_eval('''(defun clock (h) (string (elt "🕛🕐🕑🕒🕓🕔🕕🕖🕗🕘🕙🕚" (mod h 12))))''')
sage: clock(3).python()
'"🕒"'

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

Author: Matthias Koeppe

@mwageringel
Copy link

comment:15

Thank you for solving this.

Should our Maxima interface support unicode with this branch now, or do you know what else is needed to make this work? The following used to give an error, but results in a crash now:

sage: var('ξ')._maxima_()

Condition of type: SIMPLE-ERROR
Cannot coerce string Cannot coerce string $_SAGE_VAR_ξ to a base-string to a base-string
No restarts available.
...
Excessive debugger depth! Probable infinite recursion!
Quitting.

Also, please add a doctest for the new functionality.

@dimpase
Copy link
Member

dimpase commented Jul 11, 2020

comment:16

Are you on ecl 20.4 ?

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:17

Replying to @mwageringel:

The following used to give an error, but results in a crash now:

sage: var('ξ')._maxima_()

Condition of type: SIMPLE-ERROR
Cannot coerce string Cannot coerce string $_SAGE_VAR_ξ to a base-string to a base-string
No restarts available.
...
Excessive debugger depth! Probable infinite recursion!
Quitting.

Yes, I can confirm this here. I'll look into this.

Also, please add a doctest for the new functionality.

Sure, will do.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:18

Replying to @dimpase:

Are you on ecl 20.4 ?

I'm still on 16.1.2.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from 6f740eb to ee97a6d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

ee97a6dFix infinite recursion when error messages are not base-strings

@dimpase
Copy link
Member

dimpase commented Jul 11, 2020

comment:20

I suggest we work on ecl 20.4, for uniformity

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:21

The above error is due to a bug in ECL 16.1.2:

> (princ-to-string 'ξ)
"Ξ"
> (setf local-table (copy-readtable nil))
> (setf (readtable-case local-table) :invert)
> :INVERT
> (let ((*readtable* local-table) (*print-case* :upcase)) (princ-to-string 'ξ))
Cannot coerce string Ξ to a base-string

I will try with the ECL upgrade now.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from ee97a6d to 2d87ee3

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

0076bc3backport Maxima fix for bug #3629
1d074e8doctest fixes
266d8c1backport ECL PR #210
12447bcreject old makeinfo
a3e0ecaadd upstream fix from MR 215
89b006badd the patch from upstream MR 214
0b77737add upstream MR 216 (to fix cygwin fork)
8ca1c0eMerge tag '9.2.beta2' into t/22191/public/packages/ecl20
f82c716Commit 75877dd8 from upstream
2d87ee3Merge commit 'f82c716fdf9c6e91a07166d36b6329a15ecfb41d' of git://trac.sagemath.org/sage into t/30106/sage_libs_ecl__fix_unicode_handling

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

828a727Add doctests

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2020

Changed commit from 2d87ee3 to 828a727

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:24

Replying to @mkoeppe:

The above error is due to a bug in ECL 16.1.2:

> (princ-to-string 'ξ)
"Ξ"
> (setf local-table (copy-readtable nil))
> (setf (readtable-case local-table) :invert)
> :INVERT
> (let ((*readtable* local-table) (*print-case* :upcase)) (princ-to-string 'ξ))
Cannot coerce string Ξ to a base-string

I will try with the ECL upgrade now.

This bug is still present in ECL 20.4.24. The above code is from Maxima's PRINT-INVERT-CASE function in commac.lisp.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

Dependencies: #22191

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 11, 2020

comment:27

Note that after commit ee97a6d, the test case from comment 17 now gives a proper error message, no longer a crash:

sage: var('ξ')._maxima_()
....: 
<repr(<sage.interfaces.maxima_lib.MaximaLibElement at 0x32c1d4828>) failed: RuntimeError: ECL says: Cannot coerce string $_SAGE_VAR_ξ to a base-string>

Fixing this ECL bug or working around it is outside of the scope of this ticket.

@mwageringel
Copy link

comment:28

Ok, thanks for the fix. There is one more place where ecl_string_to_python should be used – in print_objects:

sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('🔥')
sage: print_objects()  # crashes

Moreover, the docstring of ecl_eval should be changed into a raw string, as the examples contain backslashes.

Once that is fixed, you can set a positive review on my behalf.

@mwageringel
Copy link

Reviewer: Markus Wageringel

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 12, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

99b894aprint_objects: Handle unicode
59dd62becl_eval: Make docstring raw

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 12, 2020

Changed commit from 828a727 to 59dd62b

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 12, 2020

comment:30

Thank you!

@mkoeppe
Copy link
Member Author

mkoeppe commented Jul 12, 2020

comment:31

Follow-up ticket to keep track of the ECL issue: #30122

@vbraun
Copy link
Member

vbraun commented Jul 25, 2020

Changed branch from u/mkoeppe/sage_libs_ecl__fix_unicode_handling to 59dd62b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants