# **Transformer Architecture**

The transformer model introduces an architecture that is solely based on attention mechanism and does not use any Recurrent Networks but yet produces results superior in quality to Seq2Seq models.It addresses the long term dependency problem of the Seq2Seq model. The transformer architecture is also parallelizable and the training process is considerably faster.

## **Text Classification With Transformers**

In this hands-on session, you will be introduced to [Simple Transformers ](https://github.com/ThilinaRajapakse/simpletransformers)library. The library is built on top of the popular [huggingface transformers library ](https://github.com/huggingface/transformers)and consists of implementations of various transformer-based models and algorithms.

The library makes it effortless to implement various language modeling tasks such as Sequence Classification, Token Classification (NER), and Question Answering. 

## **Introduction To Simple Transformers**

The Simple Transformers library is made with the objective of making the implementation as simple as possible and it has quite achieved it. Transformers can now be used effortlessly with just a few lines of code. All credit goes to Simple Transformers — Multi-Class Text Classification with BERT, RoBERTa, XLNet, XLM, and DistilBERT and huggingface transformers.

## **Installing Simple Transformers**

Type and execute the following command to install the simple transformers library.

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn tensorflow keras nltk gensim simpletransformers --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## **Creating A Classifier Model**

In [None]:
from simpletransformers.classification import ClassificationModel

#Create a ClassificationModel
model = ClassificationModel(model_type, model_name, number_of_labels, use_cuda = boolean)

> * model_type: This parameter can be one of  ‘bert’, ‘xlnet’, ‘xlm’, ‘roberta’, ‘distilbert’
> * model_name: All available model names can be found here.
> * number_of_labels: These are a number of unique labels or classes in the problem.
> * use_cuda: When set to true uses the CUDA framework for GPUs.

The ClassificationModel also has dict args which contains attributes for controlling the values of hyperparameters.The default argument list is given below :

In [None]:
self.args = {
    "output_dir": "outputs/",
    "cache_dir": "cache_dir/",
    "fp16": True,
    "fp16_opt_level": "O1",
    "max_seq_length": 128,
    "train_batch_size": 8,
    "gradient_accumulation_steps": 1,
    "eval_batch_size": 8,
    "num_train_epochs": 1,
    "weight_decay": 0,
    "learning_rate": 4e-5,
    "adam_epsilon": 1e-8,
    "warmup_ratio": 0.06,
    "warmup_steps": 0,
    "max_grad_norm": 1.0,
    "logging_steps": 50,
    "save_steps": 2000,
    "overwrite_output_dir": False,
    "reprocess_input_data": False,
    "evaluate_during_training": False,
    "process_count": cpu_count() - 2 if cpu_count() > 2 else 1,
    "n_gpu": 1,
}

## **Training The Model**

The train_model method can be used to train the model. The method accepts a dataframe.

In [None]:
model.train_model(training_dataframe)

The method also saves checkpoints of the model to the path if specified using the dict args

## **Evaluating The Classifier**

The eval_model method evaluates the model on a validation set and returns the metrics, the outputs of the model as well as the wrong predictions.

In [None]:
result, model_outputs, wrong_predictions = model.eval_model(validation_dataframe)

## **Predicting**

The predict method returns predictions and row outputs that contains a value for each class in the predicted labels.

`predictions, raw_outputs = model.predict(['input sentence']`

## **Multi-Class Classification With Simple Transformers**

#Multi-Class Classification Using Simple Transformers

---
In this hands-on session, you will be introduced to Simple Transformers library. The library is built on top of the popular huggingface transformers library which consists of implementations of various transformer based models and algorithms.

The library makes it effortless to implement various language modeling tasks such as Simple Transformers currently supports tasks such as Sequence Classification, Token Classification (NER), and Question Answering. 

So without further ado let's get our hands dirty !

##About The Dataset - [Predict The News Category Hackathon](https://machinehack.com/hackathons/predict_the_news_category_hackathon/data)

From the beginning, since the first printed newspaper, every news that makes into a page has had a specific section allotted to it. Although pretty much everything changed in newspapers from the ink to the type of paper used, this proper categorization of news was carried over by generations and even to the digital versions of the newspaper. Newspaper articles are not limited to a few topics or subjects, it covers a wide range of interests from politics to sports to movies and so on. For long, this process of sectioning was done manually by people but now technology can do it without much effort. In this hackathon, Data Science and Machine Learning enthusiasts like you will use Natural Language Processing to predict which genre or category a piece of news will fall in to from the story.

* Size of training set: 7,628 records
* Size of test set: 2,748 records

FEATURES:

* STORY:  A part of the main content of the article to be published as a piece of news.
* SECTION: The genre/category the STORY falls in.

There are four distinct sections where each story may fall in to. The Sections are labelled as follows :

* Politics: 0
* Technology: 1
* Entertainment: 2
* Business: 3


##Importing Modules

In [None]:
try:
  %tensorflow_version 2.x  #gpu
except Exception:
  pass
import tensorflow as tf

In [None]:
import os
import re
import pandas as pd

##Loading & Splitting The Data

In [None]:
train = pd.read_excel("Data_Train.xlsx")

#Reducing the training sample for fast execution
train = train.sample(frac = 0.2)

#splitting the training set in to training and validation sets
from sklearn.model_selection import train_test_split
train, val =  train_test_split(train, test_size = 0.2, random_state = 120)

In [None]:
train.head()

In [None]:
train.shape

In [None]:
val.shape

##Installing & Importing Simple Transformers

## Creating A Classification Model

In [None]:
from simpletransformers.classification import ClassificationModel

#Create a ClassificationModel
model = ClassificationModel('roberta', 'roberta-base', num_labels=4, use_cuda = False)


##Training the Classifier

In [None]:
model.train_model(train)

##Evaluating The Classifier

In [None]:
scores1, model_outputs, wrong_predictions = model.eval_model(val)

In [None]:
scores1

In [None]:
#Evaluating With F1 Score & Accuracy

from sklearn.metrics import f1_score, accuracy_score
def f1_multiclass(labels, preds):
    return f1_score(labels, preds, average='micro')

In [None]:
scores2, model_outputs, wrong_predictions = model.eval_model(val, f1=f1_multiclass, acc=accuracy_score)

In [None]:
scores2

##Predicting
---

Classes & Labels

* Politics: 0
* Technology: 1
* Entertainment: 2
* Business: 3


In [None]:
predictions, raw_output  = model.predict(['Indian is lead by prime minister Modi '])

In [None]:
predictions

In [None]:
raw_output

In [None]:
predictions2, _ = model.predict(['my phone is soo dumb and slow'])

In [None]:
predictions2

# **Related Articles:**

> * [Introduction to Simple Transformers](https://analyticsindiamag.com/text-classification-using-simple-transformers/)

> * [Google Sentence Embedder with Tensorflow](https://analyticsindiamag.com/guide-to-universal-sentence-encoder-with-tensorflow/)

> * [Sequence-to-Sequence Modeling using LSTM for Language Translation](https://analyticsindiamag.com/sequence-to-sequence-modeling-using-lstm-for-language-translation/)

> * [Text Generation using RNN](https://analyticsindiamag.com/recurrent-neural-network-in-pytorch-for-text-generation/)

> * [SVD in Recommender System](https://analyticsindiamag.com/singular-value-decomposition-svd-application-recommender-system/)

> * [TF-IDF from Scratch in Python](https://analyticsindiamag.com/hands-on-implementation-of-tf-idf-from-scratch-in-python/)

