<a href="https://colab.research.google.com/github/saumyasingh98/Sudan-Food-Crisis/blob/main/TNER_demo_(inference).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T-NER: Model Inference Example
Notebook to describe model inference with finetuned model.
See more info at the [T-NER](https://github.com/asahi417/tner).

### Setup

In [None]:
# need to avoid version conflic on Colab notebook
%pip install pip -U
%pip install sentencepiece
%pip install sortedcontainers==2.1.0
%pip install transformers
%pip install pytorch

In [None]:
# main package
%pip install tner

## Identify messages/news that denote urgency during a disaster/mishap


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("ReynaQuita/twitter_disaster_bert_large")

model = AutoModelForSequenceClassification.from_pretrained("ReynaQuita/twitter_disaster_bert_large")

Downloading:   0%|          | 0.00/322 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/729 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.25G [00:00<?, ?B/s]

In [None]:
test_sentences = ["I think there is an earthquake at Clark St.",
                  "I sense an earthquake near Dempster Street, might need help.",
                  "It is very windy around Lakefill, can't walk back home.",
                  "Is the earthquake over ","Earthquake at Ridge, HELP!",
                  "I am enjoying my day. how about you?"]

test_para = ' '.join(test_sentences)
test_para

"I think there is an earthquake at Clark St. I sense an earthquake near Dempster Street, might need help. It is very windy around Lakefill, can't walk back home. Is the earthquake over  Earthquake at Ridge, HELP! I am enjoying my day. how about you?"

In [None]:
disaster_sentences = []
for sentence in test_sentences:
  inputs = tokenizer(sentence, return_tensors="pt")
  labels = torch.tensor([1]).unsqueeze(0) # need dtype=float for BCEWithLogitsLoss
  outputs = model(**inputs, labels=labels)
  loss = outputs.loss
  logits = outputs.logits
  logits = logits.detach()
  logits = logits[0]
  prob = torch.softmax(logits, dim=-1)
  if torch.argmax(prob):
    disaster_sentences.append(sentence)

In [None]:
disaster_sentences

['I think there is an earthquake at Clark St.',
 'I sense an earthquake near Dempster Street, might need help.',
 "It is very windy around Lakefill, can't walk back home.",
 'Is the earthquake over ',
 'Earthquake at Ridge, HELP!']

### Identify locations embedded in the messages and find a route

In [None]:
from tner import TransformersNER
from pprint import pprint

In [None]:
trainer = TransformersNER('asahi417/tner-xlm-roberta-base-all-english')

2022-02-04 23:11:42 INFO     *** initialize network ***


Downloading:   0%|          | 0.00/3.03k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.03G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/211 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/4.83M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

In [None]:

prediction = trainer.predict(disaster_sentences)
pprint(prediction)

[{'entity': [{'mention': 'Clark St.',
              'position': [34, 43],
              'probability': 0.7876514991124471,
              'type': 'facility'}],
  'sentence': 'I think there is an earthquake at Clark St.'},
 {'entity': [{'mention': 'Dempster Street',
              'position': [27, 42],
              'probability': 0.8846414685249329,
              'type': 'facility'}],
  'sentence': 'I sense an earthquake near Dempster Street, might need help.'},
 {'entity': [{'mention': 'Lakefill',
              'position': [24, 32],
              'probability': 0.9420061409473419,
              'type': 'location'}],
  'sentence': "It is very windy around Lakefill, can't walk back home."},
 {'entity': [], 'sentence': 'Is the earthquake over '},
 {'entity': [{'mention': 'Ridge',
              'position': [14, 19],
              'probability': 0.663341760635376,
              'type': 'location'}],
  'sentence': 'Earthquake at Ridge, HELP!'}]


In [None]:
data = ''
for d in prediction:
  for entities in d['entity']:
    mention = entities['mention']
    type = entities['type']
    if type == 'location'or type == 'facility' or type == 'geopolitical area':
      data = mention + '/' + data


In [None]:
link = 'http://google.com/maps/dir/'
location = link + data
location

'http://google.com/maps/dir/Ridge/Lakefill/Dempster Street/Clark St./'

### Summarize the news/messages

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")



No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

In [None]:
print(summarizer(test_para, max_length=130, min_length=30, do_sample=False))

Your max_length is set to 130, but you input_length is only 62. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)


[{'summary_text': " I sense an earthquake near Dempster Street, might need help. It is very windy around Lakefill, can't walk back home. I think there is an earthquake at Clark St. I am enjoying my day. how about you?"}]
