uniset

Pre-generated sets of Unicode code points

uniset is a module containing frozensets of Unicode code points (characters).

API

Categories

The module includes a set for all Unicode categories and subcategories except the main category "C" (other) and its subcategories "Co" (private use) and "Cn" (not assigned).

Example:

import uniset

# The letter "A" is in category "L" (letters)
assert "A" in uniset.L
# The letter "A" is also in category "Lu" (uppercase letters)
assert "A" in uniset.Lu

Whitespace

uniset.WHITESPACE contains all Unicode whitespace characters. uniset.WHITESPACE is a union of ASCII whitespace characters and the Unicode category "Zs".

import uniset

assert " " in uniset.WHITESPACE

Punctuation

uniset.PUNCTUATION contains all Unicode punctuation letters. uniset.PUNCTUATION is a union of ASCII punctuation characters and the Unicode category "P".

import uniset

assert "." in uniset.PUNCTUATION

Alternatives

unicategories also provides access to Unicode categories. The implementation is based on "range groups" and iterators, and should be faster and more memory efficient than uniset for inclusion checks.

If you need the frozenset API (unions, intersections, etc.), or the sets beyond Unicode categories (whitespace, punctuation), use uniset. Otherwise unicategories is the better option.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
tests		tests
uniset		uniset
.bumpversion.cfg		.bumpversion.cfg
.flake8		.flake8
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

tests

tests

uniset

uniset

.bumpversion.cfg

.bumpversion.cfg

.flake8

.flake8

.gitignore

.gitignore

.mypy.ini

.mypy.ini

.pre-commit-config.yaml

.pre-commit-config.yaml

LICENSE

LICENSE

README.md

README.md

generate.py

generate.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

uniset

API

Categories

Whitespace

Punctuation

Alternatives

About

Releases

Contributors 2

Languages

License

hukkin/uniset

Folders and files

Latest commit

History

Repository files navigation

uniset

API

Categories

Whitespace

Punctuation

Alternatives

About

Resources

License

Stars

Watchers

Forks

Languages