Skip to content

Commit

Permalink
Fix a potential undefined behaviour
Browse files Browse the repository at this point in the history
  • Loading branch information
hongquan committed Aug 10, 2020
1 parent 94209ae commit 058b113
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 56 deletions.
11 changes: 11 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
ViStickedWord
=============

.. image:: https://badgen.net/pypi/v/vistickedword
:target: https://pypi.org/project/vistickedword


A library to split a string of many Vietnamese words sticked together to single words. It, for example, split "khuckhuyu" to "khuc" and "khuyu".
This library is not supposed to split Vietnamese by semantics, so it won't differentiate single or compound words. It will not, for example, split "bacsitrongbenhvien" to "bac si" + "trong" + "benh vien".
If you want such a feature, please use underthesea_.
Expand Down Expand Up @@ -39,5 +43,12 @@ Usage
# Returns ('ngoan', 'ngoeo')
Credit
------

Developed by by `Nguyễn Hồng Quân <author_>`_.


.. _underthesea: https://github.com/undertheseanlp/underthesea
.. _Unidecode: https://pypi.org/project/Unidecode/
.. _author: https://quan.hoabinh.vn
111 changes: 57 additions & 54 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "vistickedword"
version = "0.9.3"
version = "0.9.4"
description = "Library to split sticked Vietnamese words"
authors = ["Nguyễn Hồng Quân <ng.hong.quan@gmail.com>"]
license = "MIT"
Expand Down
4 changes: 3 additions & 1 deletion vistickedword.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@
}


def vlen(o: Optional[str]):
def vlen(o: Optional[str]) -> int:
try:
return len(o)
except TypeError:
Expand Down Expand Up @@ -232,6 +232,8 @@ def scan_for_word(i: int, vowel_match: Match, vowel_occurences: Sequence[Match],
success = negotiate_expand_consonant(word_pos, word_positions, original_word_sequence)
if not success:
continue
else:
break
except IllegalCombination:
logger.debug("Illegal combination. Test next possible final consonant.")
continue
Expand Down

0 comments on commit 058b113

Please sign in to comment.