<a href="https://colab.research.google.com/github/peterhanlon/notebooks/blob/main/ECI_Presentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Examples from the ECI presentation on Natural Language Processing.

In [1]:
!pip install transformers
!pip install torch
!pip install keybert

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 12.6 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 6.8 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 59.0 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 21.5 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 61.0 MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel fo

Sentiment Analysis - Huggingface

In [4]:
from transformers import pipeline
sentiment_pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
data = ["The product is amazing, I really love it", "I was really frustrated they didn't get back to me"]
sentiment_pipeline(data)

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9998874664306641},
 {'label': 'NEGATIVE', 'score': 0.9976345300674438}]

Zero Shot Classification - Huggingface

In [None]:
from transformers import pipeline
classification_pipeline = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
candidate_labels = ["renewable", "politics", "emissions", "temperature","emergency","advertisement"]
sentence = ["The smoke from the car exhaust was unbearable"]
classification_pipeline (sentence, candidate_labels)

[{'labels': ['emissions',
   'temperature',
   'emergency',
   'advertisement',
   'renewable',
   'politics'],
  'scores': [0.9738226532936096,
   0.009774100966751575,
   0.009313610382378101,
   0.0025348025374114513,
   0.0024800188839435577,
   0.0020748234819620848],
  'sequence': 'The smoke from the car exhaust was unbearable'}]

Few Shot Classification - Huggingface

NOTE : This seems to be too large for free colab, if you have a PC with plenty of memory it will work

In [None]:
from transformers import pipeline
examples='''
sentence: "My car needs a service"
intent: repair
###
sentence: "My car is dirty"
intent: valet
###
sentence: "I want to sell my car"
intent: sale
###
sentence: "My cars engine is making a funny noise"
intent:'''
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B')
generator(examples, do_sample=True, max_new_tokens=3, temperature=0.1, end_sequence="###")

Named Entity Extraction - Huggingface

In [None]:
from transformers import pipeline
classification_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
sentence = ["Pete wanted to go to London to present NLP stuff for ECI"]
classification_pipeline (sentence)

Question Answering

In [None]:
from transformers import pipeline
classification_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
document="A reusable launch system (RLS, or reusable launch vehicle RLV) is a launch system which is capable of launching a payload into space more than once."
question = ["Whas it an RLS"]
classification_pipeline (question=question, context=document)

Key Phrase Extraction - Keybert
*italicised text*
**bold text**

In [2]:
from keybert import KeyBERT

kw_model = KeyBERT()
document='''
My electricity isn't working, and I've not had any power for five hours, can you send someone to fix it please.
'''
kw_model.extract_keywords(document, keyphrase_ngram_range=(1, 3), stop_words='english')

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

[('electricity isn working', 0.6718),
 ('power hours send', 0.5076),
 ('hours send fix', 0.4926),
 ('electricity', 0.4564),
 ('ve power hours', 0.4516)]