Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Greek language support #2658

merged 1 commit into from Aug 14, 2018

Optimize Greek language support #2658

merged 1 commit into from Aug 14, 2018


Copy link

@giannisdaras giannisdaras commented Aug 11, 2018


This pull request aims to optimize further the Greek language support by introducing some more changes.

Types of change

The enhancement of the Greek language support is achieved by the following changes:

  1. Addition of file for noun chunks detection.
  2. A (lot) more rules added to lemmatizer.
  3. More exceptions added to lemmatizer and finally usage of them (the version before this PR does not include them in the init file, so the lemmatizer exceptions are unused).
  4. Greek language Lemmatizer based on the rule-based technique of the default Lemmatizer in order to
    optimize it using the language specific characteristics.
  5. PEP8 and Flake8 tests for all the scripts.
  6. Norm exceptions: removal of duplicate keys in the dictionary.
  7. Removal of unused imports for cleaner code.

All in all, I hope that this PR will improve significantly the quality of Greek language support.


  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.
Copy link

@honnibal honnibal commented Aug 14, 2018

Thanks! Looks great!

@honnibal honnibal merged commit fe94e69 into explosion:master Aug 14, 2018
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
continuous-integration/travis-ci/pr The Travis CI build passed
Copy link

@steremma steremma commented May 8, 2019

Awesome work @Eleni170 , thanks for the contribution! I know this is an old issue but just in case you are still active: Is sentence segmentation supported? I am having some trouble getting it to work:

>>> sp = spacy.load("el", disable=['tagger', 'ner', 'textcat'])
>>> text = "Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;" 
>>> for sentence in sp(text).sents: 
>>>     print(sentence)

out: Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants