Skip to content

imvladikon/weak_annotators

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weak Annotators (NER)[WIP]

Experiments with weak annotators for NER using different models and methods including LLMs. Requires some GPU memory :) (for 7B models ~ 8-12GB).

Installation

pip install weak-annotators

or from source:

pip install git+https://github.com/imvladikon/weak_annotators.git

Usage

  1. Using UniversalNER extractor:
from weak_annotators import UniversalNerExtractor

text = """
The patient was prescribed 100 mg of aspirin daily for 3 days.
""".strip()
labels = ["DRUG", "DISEASE", "SYMPTOM", "DURATION"]
extractor = UniversalNerExtractor(labels=labels)
print(extractor(text))
# [Span(start=37, end=44, text='aspirin', label='DRUG'), Span(start=55, end=61, text='3 days', label='DURATION')]

It returns a list of Spans but if pass return_dict=True it will return a list of dictionaries:

print(extractor(text, return_dict=True))
# [{'start': 37, 'end': 44, 'text': 'aspirin', 'label': 'DRUG'}, {'start': 55, 'end': 61, 'text': '3 days', 'label': 'DURATION'}]
  1. Using medalpaca LLM:

It requires labels descriptions:

from weak_annotators import MedAlpacaExtractor

labels = ["DRUG", "DISEASE", "SYMPTOM", "DURATION"]
labels_descriptions = {
    "DRUG": "Drug or medication",
    "DISEASE": "Any disease, syndrome, or medical condition",
    "SYMPTOM": "Any symptom or sign of a disease or medical condition",
    "DURATION": "Any period of time",
}
extractor = MedAlpacaExtractor(labels=labels, labels_description=labels_descriptions)

text = """
The patient was prescribed 100 mg of aspirin daily for 3 days.
""".strip()

annotations = extractor(text)
print(annotations)

Optionally, it's possible to pass prompt_template to MedAlpacaExtractor.

prompt_template = "Extract entities of type {} from the following text:"
extractor = MedAlpacaExtractor(labels=labels, labels_description=labels_descriptions, prompt_template=prompt_template)
  1. Using flair (TARS extractor):
from weak_annotators import FlairExtractor

labels = ["DRUG", "DISEASE", "SYMPTOM", "DURATION"]
extractor = FlairExtractor(labels=labels)
print(extractor(text))

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages