Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix line endings from CRLF to LF #67

Merged
merged 1 commit into from
Jul 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
484 changes: 242 additions & 242 deletions README.md

Large diffs are not rendered by default.

162 changes: 81 additions & 81 deletions docs/advanced_search.rst
Original file line number Diff line number Diff line change
@@ -1,81 +1,81 @@
Advanced Search
===============
Charset Normalizer method ``from_bytes``, ``from_fp`` and ``from_path`` provide some
optional parameters that can be tweaked.
As follow ::
from charset_normalizer import from_bytes
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')
results = from_bytes(
my_byte_str,
steps=10, # Number of steps/block to extract from my_byte_str
chunk_size=512, # Set block size of each extraction
threshold=0.2, # Maximum amount of chaos allowed on first pass
cp_isolation=None, # Finite list of encoding to use when searching for a match
cp_exclusion=None, # Finite list of encoding to avoid when searching for a match
preemptive_behaviour=True, # Determine if we should look into my_byte_str (ASCII-Mode) for pre-defined encoding
explain=False # Print on screen what is happening when searching for a match
)
Using CharsetMatches
------------------------------
Here, ``results`` is a ``CharsetMatches`` object. It behave like a list but does not implements all related methods.
Initially, it is sorted. Calling ``best()`` is sufficient to extract the most probable result.
.. autoclass:: charset_normalizer.CharsetMatches
:members:
List behaviour
--------------
Like said earlier, ``CharsetMatches`` object behave like a list.
::
# Call len on results also work
if len(results) == 0:
print('No match for your sequence')
# Iterate over results like a list
for match in results:
print(match.encoding, 'can decode properly your sequence using', match.alphabets, 'and language', match.language)
# Using index to access results
if len(results) > 0:
print(str(results[0]))
Using best()
------------
Like said above, ``CharsetMatches`` object behave like a list and it is sorted by default after getting results from
``from_bytes``, ``from_fp`` or ``from_path``.
Using ``best()`` return the most probable result, the first entry of the list. Eg. idx 0.
It return a ``CharsetMatch`` object as return value or None if there is not results inside it.
::
result = results.best()
Calling first()
---------------
The very same thing than calling the method ``best()``.
Class aliases
-------------
``CharsetMatches`` is also known as ``CharsetDetector``, ``CharsetDoctor`` and ``CharsetNormalizerMatches``.
It is useful if you prefer short class name.
Verbose output
--------------
You may want to understand why a specific encoding was not picked by charset_normalizer. All you have to do is passing
``explain`` to True when using methods ``from_bytes``, ``from_fp`` or ``from_path``.
Advanced Search
===============

Charset Normalizer method ``from_bytes``, ``from_fp`` and ``from_path`` provide some
optional parameters that can be tweaked.

As follow ::

from charset_normalizer import from_bytes

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

results = from_bytes(
my_byte_str,
steps=10, # Number of steps/block to extract from my_byte_str
chunk_size=512, # Set block size of each extraction
threshold=0.2, # Maximum amount of chaos allowed on first pass
cp_isolation=None, # Finite list of encoding to use when searching for a match
cp_exclusion=None, # Finite list of encoding to avoid when searching for a match
preemptive_behaviour=True, # Determine if we should look into my_byte_str (ASCII-Mode) for pre-defined encoding
explain=False # Print on screen what is happening when searching for a match
)


Using CharsetMatches
------------------------------

Here, ``results`` is a ``CharsetMatches`` object. It behave like a list but does not implements all related methods.
Initially, it is sorted. Calling ``best()`` is sufficient to extract the most probable result.

.. autoclass:: charset_normalizer.CharsetMatches
:members:

List behaviour
--------------

Like said earlier, ``CharsetMatches`` object behave like a list.

::

# Call len on results also work
if len(results) == 0:
print('No match for your sequence')

# Iterate over results like a list
for match in results:
print(match.encoding, 'can decode properly your sequence using', match.alphabets, 'and language', match.language)

# Using index to access results
if len(results) > 0:
print(str(results[0]))

Using best()
------------

Like said above, ``CharsetMatches`` object behave like a list and it is sorted by default after getting results from
``from_bytes``, ``from_fp`` or ``from_path``.

Using ``best()`` return the most probable result, the first entry of the list. Eg. idx 0.
It return a ``CharsetMatch`` object as return value or None if there is not results inside it.

::

result = results.best()

Calling first()
---------------

The very same thing than calling the method ``best()``.

Class aliases
-------------

``CharsetMatches`` is also known as ``CharsetDetector``, ``CharsetDoctor`` and ``CharsetNormalizerMatches``.
It is useful if you prefer short class name.

Verbose output
--------------

You may want to understand why a specific encoding was not picked by charset_normalizer. All you have to do is passing
``explain`` to True when using methods ``from_bytes``, ``from_fp`` or ``from_path``.
40 changes: 20 additions & 20 deletions docs/miscellaneous.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
==============
Miscellaneous
==============
Convert to str
--------------
Any ``CharsetMatch`` object can be transformed to exploitable ``str`` variable.
::
my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')
# Assign return value so we can fully exploit result
result = CnM.from_bytes(
my_byte_str
).best()
# This should print '我没有埋怨,磋砣的只是一些时间。'
print(str(result))
==============
Miscellaneous
==============

Convert to str
--------------

Any ``CharsetMatch`` object can be transformed to exploitable ``str`` variable.

::

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

# Assign return value so we can fully exploit result
result = CnM.from_bytes(
my_byte_str
).best()

# This should print '我没有埋怨,磋砣的只是一些时间。'
print(str(result))
Loading