Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Add support for ISO-639 language codes #106
Currently, the language of the text to summarize has to be specified as a language name like "german" or "french". However, many tools such as Apache Tika output ISO-639 language codes which makes it difficult to integrate sumy with the wider natural language processing ecosystem.
This commit ensures that sumy can understand language codes passed as ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter format (e.g. "ger" or "fra").