Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ISO-639 language codes #106

Merged
merged 5 commits into from Mar 30, 2018
Merged

Add support for ISO-639 language codes #106

merged 5 commits into from Mar 30, 2018

Conversation

@c-w
Copy link
Contributor

@c-w c-w commented Mar 26, 2018

Currently, the language of the text to summarize has to be specified as a language name like "german" or "french". However, many tools such as Apache Tika output ISO-639 language codes which makes it difficult to integrate sumy with the wider natural language processing ecosystem.

This commit ensures that sumy can understand language codes passed as ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter format (e.g. "ger" or "fra").

Resolves #96

Currently, the language of the text to summarize has to be specified as
a language name like "german" or "french". However, many tools such as
Apache Tika output ISO-639 language codes which makes it difficult to
integrate sumy with the wider natural language processing ecosystem.

This commit ensures that sumy can understand language codes passed as
ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter
format (e.g. "ger" or "fra").

Resolves #96
Copy link
Owner

@miso-belica miso-belica left a comment

Thanks for the code. I really appreciate it. But please take a look on my comments.

setup.py Outdated Show resolved Hide resolved
sumy/utils.py Show resolved Hide resolved
sumy/utils.py Outdated Show resolved Hide resolved
sumy/__main__.py Outdated Show resolved Hide resolved
@c-w
Copy link
Contributor Author

@c-w c-w commented Mar 30, 2018

Thanks for the review. Addressed all the comments.

Copy link
Owner

@miso-belica miso-belica left a comment

Thanks a lot. Good work 👍

@miso-belica miso-belica merged commit 6ac1616 into miso-belica:dev Mar 30, 2018
1 check passed
@c-w c-w deleted the support-iso639-language-codes branch Mar 30, 2018
@c-w
Copy link
Contributor Author

@c-w c-w commented Mar 30, 2018

Thanks for the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants