## Intro to Huggingface Transformers

In [2]:
# https://huggingface.co/docs/transformers/installation --> conda install -c huggingface transformers
# More about available pipelines here (e.g., feature-extraction, ner (named entity recognition), question-answering, translation, zero-shot-classification): 
# https://huggingface.co/docs/transformers/main_classes/pipelines
import transformers
from transformers import pipeline
classifier = pipeline("sentiment-analysis")

  from .autonotebook import tqdm as notebook_tqdm
2023-10-06 22:56:24.554789: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


### Sentiment Analysis

In [21]:
classifier("I have been so excited for this Clinical NLP Course.")

[{'label': 'POSITIVE', 'score': 0.9968151450157166}]

In [24]:
classifier("I have been so excited for this Clinical NLP Course, but the teacher turned out to be boring.")

[{'label': 'NEGATIVE', 'score': 0.9996137022972107}]

### Zero-shot classification

In [3]:
classifier = pipeline("zero-shot-classification")
classifier(
    "I have been so excited for this Clinical NLP Course, but the teacher turned out to be so boring.",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'I have been so excited for this Clinical NLP Course, but the teacher turned out to be so boring.',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.7822906374931335, 0.18534605205059052, 0.032363295555114746]}

In [4]:
generator = pipeline("text-generation")
generator("In MED277 NLP Course we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In MED277 NLP Course we will teach you how to develop new language skills during your CSL training.\n\nHere are some of the key things you need to know:\n\nLearn to recognize and remember correct words\n\nDo not try'}]

In [5]:
# you can pick form a number of models here: https://huggingface.co/models?pipeline_tag=text-generation
generator = pipeline("text-generation", model="distilgpt2") 
generator(
    "In MED277 NLP Course we will teach you how to",
    max_length=100,
    num_return_sequences=1,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In MED277 NLP Course we will teach you how to make a real life experience with Med277 or see your own ideas and opinions on the topic.'}]

### Let's try Named entity recognition

In [7]:
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all")
model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all")

pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu
clinical_note = "A 28-year-old previously healthy man presented with tachycardia, fever, and mental confusion. The symptoms started after a cut to his leg while gardening."

pipe(clinical_note)

[{'entity_group': 'Age',
  'score': 0.99761283,
  'word': '28 - year - old',
  'start': 2,
  'end': 13},
 {'entity_group': 'History',
  'score': 0.9990705,
  'word': 'previously healthy',
  'start': 14,
  'end': 32},
 {'entity_group': 'Sex',
  'score': 0.99968266,
  'word': 'man',
  'start': 33,
  'end': 36},
 {'entity_group': 'Clinical_event',
  'score': 0.9992685,
  'word': 'presented',
  'start': 37,
  'end': 46},
 {'entity_group': 'Sign_symptom',
  'score': 0.9997489,
  'word': 'ta',
  'start': 52,
  'end': 54},
 {'entity_group': 'Sign_symptom',
  'score': 0.99960655,
  'word': '##chy',
  'start': 54,
  'end': 57},
 {'entity_group': 'Sign_symptom',
  'score': 0.9215581,
  'word': 'cut',
  'start': 123,
  'end': 126},
 {'entity_group': 'Biological_structure',
  'score': 0.9997769,
  'word': 'leg',
  'start': 134,
  'end': 137}]

### Question answering

In [8]:
question_answerer = pipeline("question-answering")
question_answerer(
    question=["When did this patient get his cut?","What is the location of the cut?"],
    context=clinical_note,
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.7490040063858032,
  'start': 138,
  'end': 153,
  'answer': 'while gardening'},
 {'score': 0.3615308701992035, 'start': 134, 'end': 137, 'answer': 'leg'}]

In [None]:
### for additional tutorials and concepts see: https://huggingface.co/learn/nlp-course/chapter1/1?fw=pt