# Complaint Type Prediction with AutoNLP by HuggingFace

Note: This notebook uses the **class balanced** data created from the NYC 311 Dataset in the notebook **Ezoic-Challenge-Two-Data-Preprocessing-2**

I closely follow HuggingFace and the revolutions they bring with Transformers.  
Recently I had read about their new product AutoNLP, which trains a set of models when we give our data.  
I submitted my filtered, cleaned and class balanced data to AutoNLP to try it out.  
I have chosen the best model trained and depicted its details in this notebook.   

In [2]:
# !pip install transformers

Logging in to my huggingface account from CLI

In [6]:
# token: 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
!huggingface-cli login


        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        To login, `huggingface_hub` now requires a token generated from https://huggingface.co/settings/token.
        (Deprecated, will be removed in v0.3.0) To login with username and password instead, interrupt with Ctrl+C.
        
Token: 
Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on you

I have created an inference class which loads the model from the HuggingFace hub and does the prediction

In [7]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

class Inference:
  def __init__(self):
    self.model = AutoModelForSequenceClassification.from_pretrained("luciferhardy/autonlp-balaned-311-433911355", use_auth_token=True)
    self.tokenizer = AutoTokenizer.from_pretrained("luciferhardy/autonlp-balaned-311-433911355", use_auth_token=True)

  def __predict_complaint_type(self, text):
    # Converting logits to probabilities with argmax and taking the value of the tensor
    inputs = self.tokenizer(text, return_tensors="pt")
    outputs = self.model(**inputs)
    predicted_label_code = outputs[0].argmax().item()

    # Mapping label_code to actual label
    predicted_label = self.model.config.id2label[predicted_label_code]

    return predicted_label

  def predict(self, text):
    prediction = self.__predict_complaint_type(text)
    print(f"Complaint Description: {text}")
    print(f"Complaint Type Prediction: {prediction}")

More information about the model at https://huggingface.co/luciferhardy/autonlp-balaned-311-433911355

# Model Metrics 

**Accuracy**: 0.83  
**Loss**: 0.65  
**Macro F1**: 0.82  
**Micro F1**: 0.83  
**Weighted F1**: 0.82  
**Macro Precision**: 0.82  
**Micro Precision**: 0.83  
**Weighted Precision**: 0.82  
**Macro Recall**: 0.82  
**Micro Recall**: 0.83  
**Weighted Recall**: 0.83  

The auto-trained Model has achieved an accuracy of **83%**

# Sample Predictions

In [8]:
inference = Inference()

Downloading:   0%|          | 0.00/2.23k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.25G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/311 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [9]:
# Sample 1
text = "Loud noise in this area"
inference.predict(text)

Complaint Description: Loud noise in this area
Complaint Type Prediction: NOISE


In [10]:
# Sample 2
text = "tree is blocking the way"
inference.predict(text)

Complaint Description: tree is blocking the way
Complaint Type Prediction: TREE


In [11]:
# Sample 3
text = "water pipe has broken"
inference.predict(text)

Complaint Description: water pipe has broken
Complaint Type Prediction: WATER CONSERVATION


In [12]:
# Sample 4
text = "broken electical wires"
inference.predict(text)

Complaint Description: broken electical wires
Complaint Type Prediction: ELECTRICAL


In [13]:
# Sample 5
text = "very dirty street, needs cleaning"
inference.predict(text)

Complaint Description: very dirty street, needs cleaning
Complaint Type Prediction: SANITATION


In [14]:
# Sample 6
text = "vehicle is damaged"
inference.predict(text)

Complaint Description: vehicle is damaged
Complaint Type Prediction: CONSUMER COMPLAINT


In [15]:
# Sample 7
text = "vehicle parked wrongly"
inference.predict(text)

Complaint Description: vehicle parked wrongly
Complaint Type Prediction: CONSUMER COMPLAINT


In [16]:
# Sample 8
text = "illegal building work on this street"
inference.predict(text)

Complaint Description: illegal building work on this street
Complaint Type Prediction: WATER CONSERVATION


In [17]:
# Sample 9
text = "school sign board broken"
inference.predict(text)

Complaint Description: school sign board broken
Complaint Type Prediction: STREET CONDITION


In [18]:
# Sample 10
text = "unhygienic restaurant"
inference.predict(text)

Complaint Description: unhygienic restaurant
Complaint Type Prediction: VIOLATION OF PARK RULES
