-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Closed
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
I am runing pipreqsnb .
which requires the incremental decoder function IncrementalDecoder
from this lib, and it returns this error:
File "C:\Users\[USERNAME]\anaconda3\envs\pdfparser\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 124872: character maps to <undefined>
To resolve this, you need to specify an encoding that can handle a broader range of characters, such as utf-8, and also specify how to handle decoding errors. Here's how you can modify your IncrementalDecoder class to handle this:
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='ignore'):
super().__init__(errors=errors)
self.encoding = 'utf-8'
def decode(self, input, final=False):
try:
# Attempt to decode using utf-8
return codecs.getdecoder(self.encoding)(input, errors=self.errors)[0]
except UnicodeDecodeError:
# If decoding fails, use charmap with error handling
return codecs.charmap_decode(input, errors=self.errors)[0]
But not really sure. Hopefully this solves my issue.
CPython versions tested on:
3.13
Operating systems tested on:
Windows
Metadata
Metadata
Assignees
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error