Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish clinical flair embeddings #2292

Closed
matirojasg opened this issue Jun 3, 2021 · 8 comments
Closed

Spanish clinical flair embeddings #2292

matirojasg opened this issue Jun 3, 2021 · 8 comments
Labels
question Further information is requested wontfix This will not be worked on

Comments

@matirojasg
Copy link

Hi, in the research group where I work (http://pln.cmm.uchile.cl/grav/en), we have trained flair embeddings in Spanish in the clinical context.

We have fine-tuned an existing LM (es-forward/es-backward provided by @iamyihwa), and we have trained (>1 week) on a Chilean clinical dataset (created by the same group) with around 50 million words. We have good perplexity values, and when generating random text, it generates text close to natural language.

We would like to know the steps to follow to upload these models to the site. Do we have to test it on the NER task for some medical dataset in Spanish and show you the results?

To our knowledge, there is no flair embedding model in Spanish in the clinical context :)

If you want to know more about what this corpus is about, you can see the paper published last year: https://www.aclweb.org/anthology/2020.clinicalnlp-1.32/

@matirojasg matirojasg added the question Further information is requested label Jun 3, 2021
@alanakbik
Copy link
Collaborator

Hello @matirojasg thanks for offering to add the models, people would surely find this useful!

The standard way would be to put the model on a server and do a pull request to add the auto-downloading functionality to the FlairEmbeddings class. If you like, I can put your models on our faculty server and also do the pull request (this is how we've been mostly doing it). Alternatively you can do the PR and/or put the models on your own server. Both great for me!

@matirojasg
Copy link
Author

Thank you for the quick response.

"If you want, I can put your models on our faculty server and also do the pull request (that's how we've been doing it most of the time)." I prefer this option.

Do I share the files with you by drive? Which files do you need exactly?

@alanakbik
Copy link
Collaborator

Hello @matirojasg yes if you send me a mail with a link to a drive folder where models are, I can put them on our server! Thanks again!

@matirojasg
Copy link
Author

Here is the link to the clinical models in Spanish, let me know if any file is missing or you can't see the drive.

https://drive.google.com/drive/folders/1M1b5FzZqEebTF7B2l58GQvciF4SXP5dT?usp=sharing

Thanks!

@alanakbik
Copy link
Collaborator

Hi @matirojasg I put then on our server: https://flair.informatik.hu-berlin.de/resources/embeddings/flair/

Would you like to do the PR for integration into Flair, or should I?

@matirojasg
Copy link
Author

Could you do it, please? Thank you :)

alanakbik added a commit that referenced this issue Jul 1, 2021
GH-2292: add support for Spanish clinical Flair embeddings
@codemaster-22
Copy link

codemaster-22 commented Jul 11, 2021

Hi @matirojasg can you please suggest me size of Corpus to fine tune language model 'news-forward' on english tweets , I am currently thinking to follow 50 million words as mentioned by you. But will it be fine? please suggest me

@stale
Copy link

stale bot commented Nov 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 9, 2021
@stale stale bot closed this as completed Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants