Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ISO-639 language codes #106

Merged
merged 5 commits into from Mar 30, 2018

Conversation

Projects
None yet
2 participants
@c-w
Copy link
Contributor

commented Mar 26, 2018

Currently, the language of the text to summarize has to be specified as a language name like "german" or "french". However, many tools such as Apache Tika output ISO-639 language codes which makes it difficult to integrate sumy with the wider natural language processing ecosystem.

This commit ensures that sumy can understand language codes passed as ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter format (e.g. "ger" or "fra").

Resolves #96

Add support for ISO-639 language codes
Currently, the language of the text to summarize has to be specified as
a language name like "german" or "french". However, many tools such as
Apache Tika output ISO-639 language codes which makes it difficult to
integrate sumy with the wider natural language processing ecosystem.

This commit ensures that sumy can understand language codes passed as
ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter
format (e.g. "ger" or "fra").

Resolves #96
@miso-belica
Copy link
Owner

left a comment

Thanks for the code. I really appreciate it. But please take a look on my comments.

Show resolved Hide resolved setup.py Outdated
Show resolved Hide resolved sumy/utils.py
Show resolved Hide resolved sumy/utils.py Outdated
Show resolved Hide resolved sumy/__main__.py Outdated
@c-w

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2018

Thanks for the review. Addressed all the comments.

@miso-belica
Copy link
Owner

left a comment

Thanks a lot. Good work 👍

@miso-belica miso-belica merged commit 6ac1616 into miso-belica:dev Mar 30, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@c-w c-w deleted the c-w:support-iso639-language-codes branch Mar 30, 2018

@c-w

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2018

Thanks for the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.