Optimize Greek language support #2658

giannisdaras · 2018-08-11T13:58:34Z

Description

This pull request aims to optimize further the Greek language support by introducing some more changes.

Types of change

The enhancement of the Greek language support is achieved by the following changes:

Addition of syntax_iterators.py file for noun chunks detection.
A (lot) more rules added to lemmatizer.
More exceptions added to lemmatizer and finally usage of them (the version before this PR does not include them in the init file, so the lemmatizer exceptions are unused).
Greek language Lemmatizer based on the rule-based technique of the default Lemmatizer in order to
optimize it using the language specific characteristics.
PEP8 and Flake8 tests for all the scripts.
Norm exceptions: removal of duplicate keys in the dictionary.
Removal of unused imports for cleaner code.

All in all, I hope that this PR will improve significantly the quality of Greek language support.

Checklist

I have submitted the spaCy Contributor Agreement.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

honnibal · 2018-08-14T00:31:30Z

Thanks! Looks great!

steremma · 2019-05-08T12:45:12Z

Awesome work @Eleni170 , thanks for the contribution! I know this is an old issue but just in case you are still active: Is sentence segmentation supported? I am having some trouble getting it to work:

>>> sp = spacy.load("el", disable=['tagger', 'ner', 'textcat'])
>>> text = "Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;" 
>>> for sentence in sp(text).sents: 
>>>     print(sentence)

out: Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;

Optimize Greek language support

d3d4c9a

ines added enhancement Feature requests and improvements lang / el Greek language data and models labels Aug 13, 2018

honnibal merged commit fe94e69 into explosion:master Aug 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Greek language support #2658

Optimize Greek language support #2658

giannisdaras commented Aug 11, 2018

honnibal commented Aug 14, 2018

steremma commented May 8, 2019

Optimize Greek language support #2658

Optimize Greek language support #2658

Conversation

giannisdaras commented Aug 11, 2018

Description

Types of change

Checklist

honnibal commented Aug 14, 2018

steremma commented May 8, 2019