UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 #21

PeterStindberg · 2023-01-08T09:01:09Z

Hi there,

on one script I tried to optimize, I get these kind of errors (one per each attempt, the numbers seem to be arbitrarily changing, the 2912 shows up often though):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4295: ordinal not in range(128)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 200: ordinal not in range(128)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2912: ordinal not in range(128)

I tested my build process with a different script, and it ran fine, so I assume it's something in my current script the optimizer stumbles across.

My current build process:

Edit the source in VSCode
Paste the file into SL and compile
Copy the Firestorm preprocessed code
Paste the preprocessed code into a file preprocessed.lsl
Run PyOptimizer with -O +ShrinkNames option

VSCode Build Task macro:

{
    // See https://go.microsoft.com/fwlink/?LinkId=733558
    // for the documentation about the tasks.json format
    "version": "2.0.0",
    "tasks": [
        {
            "label": "optimize",
            "type": "shell",
            "command": "/usr/bin/python ${userHome}/GitHub/LSL-PyOptimizer/main.py ${workspaceFolder}/Optimized/preprocessed.lsl  -O +ShrinkNames -o ${workspaceFolder}/Optimized/optimized.lsl ",
            "problemMatcher": [],
            "group": {
                "kind": "build",
                "isDefault": true
            }
        }
    ]
}

The text was updated successfully, but these errors were encountered:

Sei-Lisa · 2023-01-08T12:29:42Z

Hi, could you please add option -y to the optimizer command line and paste the whole traceback? Also, please specify: - Your Python version (/usr/bin/python --version) - Your operating system - Output of 'locale' - The encoding of your script And if you can, please try run-tests.py and let me know if there are any errors.

PeterStindberg · 2023-01-08T13:17:33Z

Sure thing

macOS Mojave 10.14.6
Python 2.7.16

Traceback (most recent call last):
  File "/GitHub/LSL-PyOptimizer/main.py", line 782, in <module>
    ret = main(sys.argv)
  File "/GitHub/LSL-PyOptimizer/main.py", line 745, in main
    script = script_header + script_timestamp + outs.output(ts, options)
  File "/GitHub/LSL-PyOptimizer/lslopt/lsloutput.py", line 556, in output
    ret += self.OutCode(node)
  File "/GitHub/LSL-PyOptimizer/lslopt/lsloutput.py", line 504, in OutCode
    ret += self.OutCode(stmt)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2896: ordinal not in range(128)

LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL=

What do you mean with "encoding of your script"?

Sei-Lisa · 2023-01-08T14:14:47Z

Thanks for the info. Those encoding problems are starting to be a real PITA, so much so that I'm considering going full Python 3.x instead of polyglot 2.x/3.x. The problem is happening in the output module; I wasn't expecting that as a cause of problems. By encoding I mean https://en.wikipedia.org/wiki/Character_encoding - the more sophisticated editors usually let you specify which encoding to use when saving the file, or let you choose it while editing. I don't know VScode so I can't tell you where to find it, but a google search suggests that the encoding is specified in the bottom bar. If it says "UTF-8" there, then it's fine. Your system is configured as UTF-8 according to the output of locale, so it's likely that the editor is using it too. The problem has to do with a non-ASCII character most likely. Are you using non-ASCII somewhere in your program? Do you think you can remove lines from your program until you find a minimal script that reproduces the problem, and post it?

PeterStindberg · 2023-01-08T15:41:54Z

Yes, it's UTF-8.

And indeed I am using (few) non-ASCII char's. There are passages like this:

list replace = ["<","<",">",">","&rt;",">","&quote;","\"",""","\"","&","&","¢","¢","£","£","¥","¥","€","€","©","©","®","®","'","'"];

and passages like this:

msg = "*❮ [" + legacy_name + "](https://my.secondlife.com/" + legacy_name_dots +")*";

I can try to comb through them, and see what happens.

PeterStindberg · 2023-01-08T16:13:52Z

Unfortunately, that didn't solved the problem. I replaced every non-ascii character with llChar(xxxx) and ran the code through https://pages.cs.wisc.edu/~markm/ascii.html to find any non-ascii left over. The code is clean, the error message the same:

Traceback (most recent call last):
  File "/GitHub/LSL-PyOptimizer/main.py", line 782, in <module>
    ret = main(sys.argv)
  File "/GitHub/LSL-PyOptimizer/main.py", line 745, in main
    script = script_header + script_timestamp + outs.output(ts, options)
  File "/GitHub/LSL-PyOptimizer/lslopt/lsloutput.py", line 556, in output
    ret += self.OutCode(node)
  File "/GitHub/LSL-PyOptimizer/lslopt/lsloutput.py", line 504, in OutCode
    ret += self.OutCode(stmt)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2917: ordinal not in range(128)

Sei-Lisa · 2023-01-08T17:25:55Z

llChar is not going to help, because it's evaluated and replaced with a string containing the character that causes problems before the output phase, so it doesn't change anything. What I'm asking you is if you can create a minimal test case, something that you're comfortable with sharing and helps me reproduce the issue. You can trim parts of your script and replace others until you get something you can share that still causes problems, that I can look into. It would also help if you can run the run-tests.py program provided with the optimizer and report any problems you find.

PeterStindberg · 2023-01-08T19:10:26Z

oh, okay - well, since we assume it's the non-ASCII characters, a test case might be easy to put together

PeterStindberg · 2023-01-08T20:30:16Z

OK, this is the maximum stripped down that still fails:

list visitor_list_new;
list visitor_list_pos_old;
key owner;
key mygroup;
float xRight;
float xLeft;
float yNear;
float yFar;
float zLow;
float zHigh;

encode_and_send(string msg, key thisAvKey)
{
    integer msg_length;
}

key getAvatarGroup (key inAvatar)
{
    key result = NULL_KEY;
    return (result);
}

default
{
    state_entry()
    {
        mygroup = llList2Key(llGetObjectDetails(llGetKey(), [OBJECT_GROUP]), 0);
    }

    timer()
    {
        string msg;
        string legacy_name;
        string legacy_name_dots;
        integer numberOfKeys;
        integer i;
        key thisAvKey;
        vector agentpos;

        for (i = 0; i < numberOfKeys; ++i) {
            thisAvKey = llList2Key(visitor_list_new,i);

            if (TRUE) {

                agentpos = llList2Vector(llGetObjectDetails(thisAvKey, [OBJECT_POS]),0);

                // and break it down to x-y-z
                float avatarx = agentpos.x;
                float avatary = agentpos.y;
                float avatarz = agentpos.z;

                if ((avatarx <= xRight && avatarx >= xLeft && avatary <= yFar && avatary >= yNear && avatarz <= zHigh && avatarz >= zLow) && (getAvatarGroup(thisAvKey) == mygroup)) {

                } 
            }      
        }

        if (TRUE) {
            for (i = 0; i < numberOfKeys; i = i + 2) {
                if (TRUE) {
                    thisAvKey = llList2Key(visitor_list_pos_old,i);
                    legacy_name = llKey2Name(thisAvKey);
                    if (legacy_name != "") {

                        if (TRUE) {
                            msg = "*" + llChar(0x276E) + " [" + legacy_name + "](https://my.secondlife.com/" + legacy_name_dots +")*";
                            encode_and_send(msg, thisAvKey);
                        }
                    } 
                }
            }
        }
    } 
}

Sei-Lisa · 2023-01-09T10:21:45Z

That test case has been very valuable because it allowed me to reproduce the issue. I've managed to further reduce it to this minimal script: https://github.com/Sei-Lisa/LSL-PyOptimizer/blob/master/unit_tests/regression.suite/issue-21.lsl Fixed in master. Please test if it works with your complete script and please report any further problems you run into, if any. Thanks a lot for your help!

PeterStindberg · 2023-01-09T16:35:02Z

Yep, the new version works without any error message, and produces the expected output. Thank you very much!

For laymen: What was the issue?

PeterStindberg · 2023-01-09T17:01:01Z

nvm, found the commit and the explanation

Sei-Lisa closed this as completed in e3c1634 Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 #21

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 #21

PeterStindberg commented Jan 8, 2023 •

edited

Loading

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 9, 2023 via email

PeterStindberg commented Jan 9, 2023

PeterStindberg commented Jan 9, 2023

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 #21

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 #21

Comments

PeterStindberg commented Jan 8, 2023 • edited Loading

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 8, 2023 via email

PeterStindberg commented Jan 8, 2023

PeterStindberg commented Jan 8, 2023

Sei-Lisa commented Jan 9, 2023 via email

PeterStindberg commented Jan 9, 2023

PeterStindberg commented Jan 9, 2023

PeterStindberg commented Jan 8, 2023 •

edited

Loading