unicodeblocks – Character blocks defined in Unicode
This module supplements
unicodedata standard library module with ability
to lookup and work with Unicode blocks.
Version of module.
The version of Unicode database used in this module.
unicodeblocks.Block(name, start, end)
Normalized name of block.
The first codepoint mapped by block. Inclusive.
The last codepoint mapped by block. Inclusive.
Checks either character is in this block.
Count of codepoints mapped by Block.
Checks if both other.start and other.end are lower than self.start and self.end.
Checks if both other.start and other.end are greater than self.start and self.end.
Checks if both other.start equals to self.start and other.end equals to self.end.
Will return a
Block which maps the codepoint of chr or
None in case not
block maps the codepoint.
A dictionary-like collection of all blocks defined by Unicode.
Returns a list of names of blocks in dictionary. Use this instead of .keys() if you want names presentable to user.
Some use cases
Find block a character belongs to
>>> unicodeblocks.blockof('-') Block('Basic Latin', 0x0, 0x7f) >>> unicodeblocks.blockof('か') Block('Hiragana', 0x3040, 0x309f) >>> unicodeblocks.blockof('日') Block('CJK Unified Ideographs', 0x4e00, 0x9fff)
Number of codepoints defined in Unicode
>>> len(list(itertools.chain(*unicodeblocks.blocks.values()))) 256336
Module doesn't check if codepoints within block are assigned.
For example see
\u38D. If you care about that, you should
try to obtain their name with