-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TASK] Another language support for GoLLIE (more specifically Vietnamese) #9
Comments
Hi @NoAtmosphere0! I believe the easiest way to achieve this would be by fine-tuning one of the GoLLIE checkpoints with a Vietnamese dataset. Both Wikiann and Polyglot NER seem like the best candidates since they use the same labels as CoNLL03. To fine-tune your model with either of these datasets, you should:
A significant concern here is the proficiency of LLaMA2/CodeLLaMA in Vietnamese. The model might not be very adept for that particular language, and unfortunately, there's a limited selection of multilingual LLMs available. |
Hi @ikergarcia1996! Thank you for your prompt response and helpful instructions. We will follow the steps that you have outlined in your response to train GoLLIE and also keep in mind your concerns about the proficiency of LLaMA2/CodeLLaMA in Vietnamese. We will keep you updated on our progress by not closing this issue and let you know if we have any questions or need any further assistance. Thanks again for your support! |
@NoAtmosphere0 Did you had any progress on that? |
Hi GoLLIE research team, I am currently in a group of Vietnamese university students who want to present your paper for an upcoming seminar in our "Introduction to Natural Language Processing" course. Our task is to summarize and explain the contents of your paper to our fellow students and lecturers.
To make it easier to understand for our classmates, we are interested in training GoLLIE using Vietnamese datasets. If it's possible, we would greatly appreciate it if you could provide us with some instructions on how to proceed with this. We sincerely enjoyed reading your paper and believe that it would greatly benefit our presentation.
Here are some datasets for the named-entity-recognition subtask that I found on Hugging Face:
We would be extremely grateful if you could provide us with any guidance or assistance on our endeavor. Please feel free to reach out if you have any questions or require more information from us. We are more than willing to cooperate to make this collaboration successful.
The text was updated successfully, but these errors were encountered: