<a href="https://colab.research.google.com/github/jzfrank/h4g-idmc-articleClassifier/blob/main/idmc_article_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.22.2-py3-none-any.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 5.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 39.5 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.9.0
  Downloading huggingface_hub-0.10.0-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 40.6 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.0 tokenizers-0.12.1 transformers-4.22.2


In [5]:
import torch

In [10]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

Definig Classes

In [6]:
class IsDisasterClassifier:
  def __init__(self):
    self.tokenizer = AutoTokenizer.from_pretrained("sacculifer/dimbat_disaster_distilbert")
    self.model = AutoModelForSequenceClassification.from_pretrained("sacculifer/dimbat_disaster_distilbert", from_tf=True)
  def isDisaster(self, text):
    inputs = self.tokenizer(text, return_tensors="pt")
    with torch.no_grad():
      logits = self.model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    return {
        1: True, 
        0: False
    }[
        predicted_class_id
    ]

In [7]:
class DisasterTypeClassifier:
  def __init__(self):
    self.tokenizer = AutoTokenizer.from_pretrained("sacculifer/dimbat_disaster_type_distilbert")
    self.model = AutoModelForSequenceClassification.from_pretrained("sacculifer/dimbat_disaster_type_distilbert", from_tf=True)
  def disasterType(self, text):
    inputs = self.tokenizer(text, return_tensors="pt")
    with torch.no_grad():
      logits = self.model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    return {
        1: "disease",
        2: "earthquake",
        3: "flood",
        4: "hurricane & tornado",
        5: "wildfire",
        6: "industrial accident",
        7: "societal crime",
        8: "transportation accident",
        9: "meteor crash",
        0: "haze"
    }[
        predicted_class_id
    ]


In [8]:
class ArticleClassifier:
  def __init__(self):
    self.isDisasterClassifier = IsDisasterClassifier()
    self.disasterTypeClassifier = DisasterTypeClassifier()
  def isDisaster(self, text: str) -> bool:
    return self.isDisasterClassifier.isDisaster(text)
  def disasterType(self, text: str) -> str:
    if not self.isDisaster(text):
      return "not a disaster"
    return self.disasterTypeClassifier.disasterType(text)

In [11]:
ac = ArticleClassifier()

Downloading:   0%|          | 0.00/333 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/557 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

All TF 2.0 model weights were used when initializing DistilBertForSequenceClassification.

All the weights of DistilBertForSequenceClassification were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.


Downloading:   0%|          | 0.00/333 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/982 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

All TF 2.0 model weights were used when initializing DistilBertForSequenceClassification.

All the weights of DistilBertForSequenceClassification were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.


In [12]:
examples = [
    "NBC: Evacuations Lifted in 1,100-Acre Brush Fire in Santa Clarita Valley",
    "KRQE News: Dog Head Fire: Information for evacuees",
    "France 24: Hurricane Fiona batters Turks and Caicos after devastating Puerto Rico - 21/09/2022",
    "Russia-Ukraine War Explosion Damages Crimea Bridge, Imperiling Russian Supply Route",
    "Arizona court halts enforcement of near-total abortion ban",
    "Wow, Google Really, Really Wants to Be Cooler Than Apple",
    "The Hack4Good coordinator is in charge of facilitating the communication between the H4G Organization Committee to address any organizational issue which might arise."
]

In [13]:
for example in examples:
  isDisaster = ac.isDisaster(example)
  disasterType = ac.disasterType(example)
  print(f"{example} \n isDisaster? {isDisaster}\n disasterType? {disasterType} \n\n")

NBC: Evacuations Lifted in 1,100-Acre Brush Fire in Santa Clarita Valley 
 isDisaster? True
 disasterType? wildfire 


KRQE News: Dog Head Fire: Information for evacuees 
 isDisaster? True
 disasterType? wildfire 


France 24: Hurricane Fiona batters Turks and Caicos after devastating Puerto Rico - 21/09/2022 
 isDisaster? True
 disasterType? hurricane & tornado 


Russia-Ukraine War Explosion Damages Crimea Bridge, Imperiling Russian Supply Route 
 isDisaster? True
 disasterType? industrial accident 


Arizona court halts enforcement of near-total abortion ban 
 isDisaster? False
 disasterType? not a disaster 


Wow, Google Really, Really Wants to Be Cooler Than Apple 
 isDisaster? False
 disasterType? not a disaster 


The Hack4Good coordinator is in charge of facilitating the communication between the H4G Organization Committee to address any organizational issue which might arise. 
 isDisaster? False
 disasterType? not a disaster 


