-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass in custom trained spacy model #851
Comments
Hi @vajjasaikiran, It has the from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import SpacyRecognizer
# Define the new entities supported by the custom model
spacy_entities = ["PERS", "LOC", "ORG", "TIME", "DATE", "MONEY", "PERCENT", "MISC__AFF", "MISC__ENT"]
# Translate the model's entity types to Presidio's (if needed, in this example we map tham 1:1)
spacy_label_groups = [({ent}, {ent}) for ent in spacy_entities]
spacy_recognizer = SpacyRecognizer(supported_language="en",
supported_entities=spacy_entities,
check_label_groups=spacy_label_groups)
# Create Presidio Analyzer Engine
analyzer = AnalyzerEngine()
# List existing (predefined) recognizers
print([rec.name for rec in analyzer.registry.recognizers])
# Remove the previous SpacyRecognizer
analyzer.registry.recognizers = [rec for rec in analyzer.registry.recognizers if rec.name != "SpacyRecognizer"]
# Add the new custom SpacyRecognizer
analyzer.registry.add_recognizer(spacy_recognizer)
# Run Analyzer Engine
res = analyzer.analyze(text="text with custom entities", language="en") Hope this helps! |
Hi @omri374 . Thank you for the response. I can see that we are adding new custom entities, label groups to the Analyzer Engine. But where are we passing the custom model weight file into the Analyzer engine? |
Hi @vajjasikiran, This is a good point. This is the flow in high level:
So the actual model weights are being used when the This example shows this in more detail. It takes tokens out of the presidio/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py Line 95 in e52cf5f
|
Hi @omri374 , I understood the flow. But one thing I am still not clear is that , how can I pass my new model weights to the NLPEngine. Presidio by default has en_core_web_lg model loaded during initialisation. I wanted to pass my new custom trained model to the Engine. You can consider this as a doube NER kind of pipeline. Predict(spacy entities) + Predict(custom entities) + combine them and give the result. My custom model can predict 5 entities, which spacy default model does not. I want the NLPEngine to predict (Spacy entities + custom entities). Could you please share a sample code where I can pass my custom model object and it just adds to the default pipeline of presidio and DONE. If there is no such way to do it right now, can you help me in doing some hacks around the available classes and get the work DONE. |
Hi @vajjasaikiran, so if I understand correctly the ambition is to have both In this case I would suggest creating a new recognizer which loads the custom model. Here's an example implementation. It uses the same logic in the from typing import Tuple, Set
from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_analyzer.predefined_recognizers import SpacyRecognizer
import spacy
class CustomSpacyRecognizer(SpacyRecognizer):
def __init__(self, path_to_model:str):
"""
SpacyRecognizer with a new/custom model,
to run in parallel with the model in NlpEngine.
:param path_to_model: Path to the custom model's location
"""
self.path_to_model = path_to_model
self.model = None # Model will be loaded on .load()
entities = ["ORG"] # TODO change to the custom model's entities
spacy_label_groups = [({ent}, {ent}) for ent in entities]
super().__init__(
supported_language='en',
supported_entities=entities,
ner_strength=0.85,
check_label_groups=spacy_label_groups
)
def load(self):
self.model = spacy.load(self.path_to_model)
def analyze(self, text, entities, nlp_artifacts=None):
"""
Analyze using a spaCy model. Similar to SpacyRecognizer.analyze,
except it has an actual call to a spaCy model loaded as part of this recognizer.
"""
results = []
doc = self.model(text)
ner_entities = doc.ents
for entity in entities:
if entity not in self.supported_entities:
continue
for ent in ner_entities:
if not self.__check_label(entity, ent.label_, self.check_label_groups):
continue
textual_explanation = f"Identified as {ent.label_} by the spaCy model: {self.path_to_model}"
explanation = self.build_spacy_explanation(
self.ner_strength, textual_explanation
)
spacy_result = RecognizerResult(
entity_type=entity,
start=ent.start_char,
end=ent.end_char,
score=self.ner_strength,
analysis_explanation=explanation,
recognition_metadata={
RecognizerResult.RECOGNIZER_NAME_KEY: self.name
},
)
results.append(spacy_result)
return results
@staticmethod
def __check_label(
entity: str, label: str, check_label_groups: Tuple[Set, Set]
) -> bool:
return any(
[entity in egrp and label in lgrp for egrp, lgrp in check_label_groups]
) Adding the new recognizer (in this example only detects the ORG entity): custom_spacy = CustomSpacyRecognizer(path_to_model="en_core_web_sm")
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(custom_spacy)
results = analyzer.analyze(text="David Smith works at IBM", language="en", return_decision_process=True) Results (with the decision process to see that the same entity was detected twice, once by the default spaCy model and second by the custom model, in this case [res.__dict__ for res in results]
|
Hi @omri374 Thank you so much for your quick responses. I tried this and it is working as expected. |
I have installed my custom trained NER model as a Python package. How can i use it with the final provided pieces of code (the accepted solution). |
@efka84 is it a spaCy model? if yes, you can pass a model loaded by spaCy into Presidio |
We have trained a custom spacy model having entities which currently spacy does not have. We plan to use that spacy model as the default spacyNLP engine .
I tried with the code mentioned in #822 , but I am not getting the required entities.
I tried the below code.
Expected entities: [PERSON, ORG, CUSTOM]
Predicted entities: [PERSON, ORG]
Can somebody explain if there is any hack or something to achieve this.
The text was updated successfully, but these errors were encountered: