Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiring about details of the Entity Resolution Model #735

Closed
agr505 opened this issue Oct 19, 2022 · 2 comments
Closed

Inquiring about details of the Entity Resolution Model #735

agr505 opened this issue Oct 19, 2022 · 2 comments

Comments

@agr505
Copy link

agr505 commented Oct 19, 2022

Hi,

Here at Alpine Health, we are investigating the use of sparknlp's pretrained healthcare entity resolution models in our solution. I've been going through the Clinical_Entity_Resolvers, Improved_Entity_Resolution_with_SentenceChunkEmbeddings, and Finetuning_Sentence_Entity_Resolver_Model notebooks. I have been very impressed by how well-documented everything is. I just wanted to confirm some things.

Is there a paper or documentation that describes the model architecture of SentenceEntityResolverApproach or SentenceEntityResolverModel and what is the difference between these two? The most similar paper I found was https://www.aaai.org/AAAI21Papers/AAAI-7273.LiB.pdf
Any papers/documentation that is relevant would be very helpful

Without yet seeing the details of the SentenceEntityResolverModel architecture, I wanted confirm that when I am fine tuning the resolver model by dropping concept codes like in the tutorial, I am dropping the concept sentence embeddings from the model correct? How do I count how many concept codes are stored in the pretrained model?

We want to fine-tune the pretrained model for our use case and only use a subset of UMLS or snomed concepts for our Social Determinants of Health (SDOH) ontology, the current sparknlp pretrained UMLS/snomed resolver models has not been that accurate on our target concepts so far. By decreasing the number of concepts to roughly 80, will that allow us to significantly approve the accuracy of the model correct due to the reduction in the concept space? We also want to fine tune the clinical NER model on these "chunks" as a new entity type which will be passed to the Resolver model, if the chunks/concepts we have labels are more encompassing then they should be, ex, "She denied alcohol or illicit drug use" will this be an issue when fine tuning the NER model on this new entity type?.

Thank you very much, and please let me know if you want set up a call to further discuss how Alpine Health and John Snow Labs can work together as we build out our solution.

Aaron Reich
Director of Machine Learning, Alpine Health Systems Inc
8634507132 | aaron@alpinehealth.io

@muhammetsnts
Copy link
Contributor

Hi @agr505 ,

  1. SentenceEntityResolverApproach annotator is used for model training, SentenceEntityResolverModel is used for downloading a pretrained resolver model. When you train a model with SentenceEntityResolverApproach and save it, then you can call it by using SentenceEntityResolverModel annotator. All "Model" extensions are used for calling pretrained models in Spark NLP.
  2. It is like a tree architecture but not the same, there are some other special algorithms behind it. We don't have a paper for Sentence Entity Resolution yet.
  3. These models are trained with the augmented version of the formal datasets, so there may be more than one line that has the same code, but not the same concept name and embeddings, in the model training data. When you want to drop a code from the model, all the lines in the model that has the same code will be dropped. So yes, the embeddings, codes, concept names, etc. will be dropped.
  4. You can train your own resolver model and update it from time to time with the new rows, this would be better in your case. But if the pretrained model performances are enough for you, dropping irrelevant codes from the model may help to increase the accuracy since the models return the closest embeddings from the space.
  5. The most important stage of getting resolutions of the terms is entity extraction. So having a specific NER model will help you to extract the appropriate entities according to the concept and this will affect the accuracy directly.
  6. You can use assertion status models to check the negation status of the entities.

@muhammetsnts
Copy link
Contributor

Also we will have some SDOH models soon @agr505. I hope these answers would help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants