Skip to content

Norm exceptions

Panos Louridas edited this page Jul 27, 2018 · 2 revisions

Norm Exceptions

From the documentation page of spaCy:

spaCy usually tries to normalise words with different spellings to a single, common spelling. This has no effect on any other token attributes, or tokenization in general, but it ensures that equivalent tokens receive similar representations. This can improve the model's predictions on words that weren't common in the training data, but are equivalent to other words – for example, "realize" and "realise", or "thx" and "thanks".

Greek language norm exceptions

The list of norm exceptions in Greek is constructed by appropriate parsing of a Greek dictionary.

Usually, dictionaries have a symbol that maps a word to another word that it is a slight variation of itself (i.e., a norm-exception). In the dictionary we parsed, this symbol was "->".

The full list can be found here. In the list, the first column contains the exceptions and the second column contains the corresponding norms.

For extending the list, please see the Contributing page of this wiki.

Clone this wiki locally