Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malay language support #12602

Merged
merged 7 commits into from May 17, 2023
Merged

Malay language support #12602

merged 7 commits into from May 17, 2023

Conversation

khursani8
Copy link
Contributor

@khursani8 khursani8 commented May 7, 2023

Description

Add Malay language

Types of change

new feature

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@adrianeboyd adrianeboyd added the lang / ms Malay language data and models label May 8, 2023
@adrianeboyd
Copy link
Contributor

Thanks for the PR! An initial question about the tokenizer exceptions list: would it make sense for Indonesian and Malay to use the exact same list of exceptions? This looks like it's just a copy of the Indonesian list, so Malay could import this list from Indonesian rather than copying it if there aren't any differences.

@khursani8
Copy link
Contributor Author

Hi, currently the exceptions list not perfect, some of the Indonesian exceptions can be use for Malay and some might need to be remove in future. I think better separate it so that in future easy to maintain

@adrianeboyd
Copy link
Contributor

Thanks for the info! It's fine to maintain a separate list.

Are you planning on updating the exceptions list for Malay for this PR? It's a little unexpected for users to get default exceptions that weren't customized for this language.

@khursani8
Copy link
Contributor Author

ok, I will update the list within this week

@adrianeboyd
Copy link
Contributor

adrianeboyd commented May 17, 2023

Thanks, this looks like a good initial version for all the defaults!

@adrianeboyd adrianeboyd merged commit 873c16a into explosion:master May 17, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / ms Malay language data and models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants