Skip to content

kozlek/encoding_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Encoding Tools

pipeline status coverage report PyPI - Python Version

This module aims to provide a wrapper to deal with encoding in Python.

Features

Encode str to bytes

from encoding_tools import TheSoCalledGreatEncoder

encoder = TheSoCalledGreatEncoder()
encoder.load_str('hellò')
encoder.encode('latin-1')

encoded_string = encoder.encoded_data

Yes, this is a much complicated than a simple 'hellò'.encode('latin-1'), but it deals with encoding errors. By default, it will fallback to ASCII if an error is encountered.

from encoding_tools import TheSoCalledGreatEncoder

encoder = TheSoCalledGreatEncoder()
encoder.load_str('cœur')  # œ is not supported by latin-1
encoder.encode('latin-1')

encoded_string = encoder.encoded_data  # equals to b'coeur' 

If you want to force ASCII conversion, you can do it by specifying force_ascii=True when calling .encode().

Decode bytes to str

from encoding_tools import TheSoCalledGreatEncoder, GuessEncodingFailedException

encoder = TheSoCalledGreatEncoder()
encoder.load_bytes(b'hell\xf2')
try:
    encoder.decode()
except GuessEncodingFailedException as e:
    # Deal with it
    raise ValueError('Wrong input...') from e

decoded_string = encoder.decoded_data  # equals to 'hellò'
encoding = encoder.encoding  # equals to 'ISO-8859-1'

The decoder will guess encoding for you using the great Chardet library. You can as well provide the encoding if you know when you load the data: .load_bytes(b'hell\xf2', encoding='latin-1')

To change data encoding, proceed this way:

from encoding_tools import TheSoCalledGreatEncoder, GuessEncodingFailedException

encoder = TheSoCalledGreatEncoder()
encoder.load_bytes(b'hell\xf2')  # latin-1
try:
    encoder.decode()
except GuessEncodingFailedException as e:
    # Deal with it
    raise ValueError('Wrong input...') from e
    
encoder.encode('utf-8')

encoded_string = encoder.encoded_data  # equals to b'hell\xc3\xb2'
encoding = encoder.encoding  # equals to 'utf-8'

Roadmap

  • Deals with decoding errors
  • Support more encoding (test suite)

About

A package to deal with encoding.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages