Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when writing to log #1

Open
mrihtar opened this issue Jun 20, 2021 · 7 comments
Open

UnicodeEncodeError when writing to log #1

mrihtar opened this issue Jun 20, 2021 · 7 comments
Labels
bug Something isn't working question Further information is requested

Comments

@mrihtar
Copy link

mrihtar commented Jun 20, 2021

Suggested solution: add encoding parameter to log open - main.py, line 258

    with output_dir.joinpath(log_name).open("w", encoding='utf-8') as log_file_ptr:
@neil-orans
Copy link

Hey @mrihtar - Can you share the binary that caused the UnicodeEncodeError? Or the full traceback?

@dperret dperret added bug Something isn't working question Further information is requested labels Jul 12, 2021
@dperret
Copy link
Collaborator

dperret commented Jul 12, 2021

The suggested fix looks straight forward enough, but it would be good to reproduce the problem before trying to fix it. @mrihtar as @neil-orans suggested, if you can share a binary or even the hash of a binary (more than one would be even better) that we can download from VirusTotal that will cause the UnicodeEncodeError, that would be very helpful so that we can reproduce the problem, and also verify that the update actually fixes it.

@mrihtar
Copy link
Author

mrihtar commented Jul 14, 2021

I've uploaded the binary to VirusTotal, hash 6831bcc0f71ad753a0d829a95fbdc55ea392830cd9a999052362bc6d42fc3d67
At the moment I don't have other binaries. Binary was built with pyinstaller 4.4 (on python 3.9.4) and unpacked with pydecipher 1.0.0, which without the suggested fix fails like this:

Traceback (most recent call last):
  File "C:\Python39\Lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python39\Lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python39\Scripts\pydecipher.exe\__main__.py", line 7, in <module>
  File "C:\Python39\lib\site-packages\pydecipher\main.py", line 259, in run
    log_file_ptr.write(log_stream.getvalue())
  File "C:\Python39\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0230' in position 1238: character maps to <undefined>

@dperret
Copy link
Collaborator

dperret commented Jul 24, 2021

Thanks for sharing that file. I am not getting that error when testing within docker containers built from the python:3.9.2 or python:3.9.6 docker images. I'll try to set up a Windows test environment to see if I can reproduce the error there. Are you using Windows 10 with python 3.9.4? Also, what versions of python-xdis and uncompyle6 are you using?

Based on the testing I've done so far, even without the UnicodeEncodeError, unfreezing this file might also require some updates to uncompyle6 and/or xdis rocky/python-uncompyle6#355 rocky/python-uncompyle6#353 rocky/python-uncompyle6#331

@mrihtar
Copy link
Author

mrihtar commented Jul 25, 2021

I am using Windows 10 21H1, python 3.9.6 x64 (just updated), xdis 5.0.11 and uncompyle6 3.7.4. I have also environment variable PYTHONIOENCODING=utf-8 . With this setup I am still getting this error, but not with my fix.

Regarding your second remark: for the final decompile (from .pyc) I am using pycdc (manually).

@theShroo
Copy link

theShroo commented Sep 13, 2024

i am decompiling a python package written in 3.7, running pydecipher in 3.12, and its working except for this error.
UnicodeEncodeError: 'charmap' codec can't encode characters in position 984-985: character maps to

where do i find that line that needs to be encoded to prevent the error? I know that the author of the code is enthusiastic about including unicode characters that i often run into issues hitting with a print statement without encoding to utf-8 first

i have found the source of the error here in the bytecode.py file, but i have no idea how to validate the content to remove non-encodable chars:

image

@theShroo
Copy link

I have identified a solution that seems to have worked.
configuring file encoding on the file object allows the file to accept the string "as is"

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants