 # ***Sentiment Analysis using XLNET in Pytorch***


- **XLNet** is a state-of-the-art neural network designed for various NLP tasks.
- Researchers from  Carnegie Mellon University and Google released a new pre-trained language     model called XLNet
- The previous state-of-the-art language model is BERT, which achieves a GLUE (General Language Understanding Evaluation) score of 80.5%. GLUE is a benchmark for training, evaluating, and analyzing natural language understanding systems, and the human baseline score is 87.1%. 
- XLNet outperforms BERT on 20 tasks (many times by a large margin) and pushes the GLUE score to 88.4% which is greater than Humans!!!!! For detailed information read paper [XLNet paper](https://arxiv.org/abs/1906.08237)


#### XLNet is pre-trained and made open source by google 
- The AIM of this article is to use the pytorch implementation of XLNet by huggingface [GitHub Repo](https://github.com/huggingface/transformers)
    - **We will focus more on code in this article.** To understand XLNet and how it works read paper
- We will perform a Binary Sentiment analysis on Amazon review dataset [Dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/)
- Install pytorch transformers with the following command 
<br> <code>pip install pytorch-transformers</code>
 

#### If you are on Google Colab then run
<code> !pip install pytorch-transformers</code>
#### In the notebook
- You can upload the dataset on google drive and access in colab by using
     - <code>from google.colab import drive <br> drive.mount('/content/drive') 

In [0]:
!pip install pytorch-transformers # For Colab 
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
PATH = "--Path to Folder -- /amazon_cells_labelled.txt"
fd = pd.read_csv(PATH,sep='\t')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [2]:
fd.head()

Unnamed: 0,So there is no way for me to plug it in here in the US unless I go by a converter.,0
0,"Good case, Excellent value.",1
1,Great for the jawbone.,1
2,Tied to charger for conversations lasting more...,0
3,The mic is great.,1
4,I have to jiggle the plug to get it to line up...,0


#### As you can see our dataset contains sentences with their value 1 for positive and 0 for negative
We rename our columns as 'sentence' and 'value' 

In [0]:
fd.columns = ['sentence','value']

In [4]:
fd.head()

Unnamed: 0,sentence,value
0,"Good case, Excellent value.",1
1,Great for the jawbone.,1
2,Tied to charger for conversations lasting more...,0
3,The mic is great.,1
4,I have to jiggle the plug to get it to line up...,0


- XLNet need <code>[SEP] [CLS]</code> tags at the end of each sentence  
- We add them by using following code

In [0]:
sentences  = []
for sentence in fd['sentence']:
  sentence = sentence+"[SEP] [CLS]"
  sentences.append(sentence)

In [6]:
sentences[0] ##To check if tags are added or not

'Good case, Excellent value.[SEP] [CLS]'

#### *We import all the dependencies* 

In [0]:
from pytorch_transformers import XLNetTokenizer,XLNetForSequenceClassification

In [8]:
from sklearn.model_selection import train_test_split
from pytorch_transformers import AdamW
import matplotlib.pyplot as plt
from keras.preprocessing.sequence import pad_sequences
import torch
from torch.utils.data import TensorDataset,DataLoader,RandomSampler,SequentialSampler


Using TensorFlow backend.


### Inputs

1. XLNet tokenizer is used to convert our text into tokens that correspond to   XLNet’s vocabulary.
2. a sequence of integers identifying each input token to its index number in the XLNet tokenizer 
    - Use the XLNet tokenizer to convert the tokens to their index numbers in the XLNet vocabulary


In [0]:
tokenizer  = XLNetTokenizer.from_pretrained('xlnet-base-cased',do_lower_case=True)
tokenized_text = [tokenizer.tokenize(sent) for sent in sentences]

In [10]:
tokenized_text[0]

['▁good',
 '▁case',
 ',',
 '▁excellent',
 '▁value',
 '.',
 '[',
 's',
 'ep',
 ']',
 '▁[',
 'cl',
 's',
 ']']

In [0]:
input_ids = [tokenizer.convert_tokens_to_ids(x) for x in tokenized_text]

In [12]:
print(input_ids[0])
labels = fd['value'].values
print(labels[0])

[195, 363, 19, 2712, 991, 9, 10849, 23, 3882, 3158, 4145, 11974, 23, 3158]
1


### We find the maximum length of our sentences so that we can pad the rest

In [13]:
max1 = len(input_ids[0])
for i in input_ids:
  if(len(i)>max1):
    max1=len(i)
print(max1)
MAX_LEN = max1

54


#### We pad our sentences

In [0]:
input_ids2 = pad_sequences(input_ids,maxlen=MAX_LEN,dtype="long",truncating="post",padding="post")

In [0]:
xtrain,xtest,ytrain,ytest = train_test_split(input_ids2,labels,test_size=0.15)

In [16]:
print(len(input_ids2[0]))

54


In [0]:
Xtrain = torch.tensor(xtrain)
Ytrain = torch.tensor(ytrain)
Xtest = torch.tensor(xtest)
Ytest = torch.tensor(ytest)

In [0]:
batch_size = 3

In [0]:
train_data = TensorDataset(Xtrain,Ytrain)
test_data = TensorDataset(Xtest,Ytest)
loader = DataLoader(train_data,batch_size=batch_size)
test_loader = DataLoader(test_data,batch_size=batch_size)

In [20]:
model = XLNetForSequenceClassification.from_pretrained("xlnet-base-cased",num_labels=2)
model.cuda()

XLNetForSequenceClassification(
  (transformer): XLNetModel(
    (word_embedding): Embedding(32000, 768)
    (layer): ModuleList(
      (0): XLNetLayer(
        (rel_attn): XLNetRelativeAttention(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (ff): XLNetFeedForward(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (layer_1): Linear(in_features=768, out_features=3072, bias=True)
          (layer_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (1): XLNetLayer(
        (rel_attn): XLNetRelativeAttention(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (ff): XLNetFeedForward(
          (layer_norm): LayerNorm((768,), eps=1e

- We use AdamW optimizer which is imported earlier
- For loss function we use Cross Entropy Loss

In [0]:
optimizer = AdamW(model.parameters(),lr=2e-5)# We pass model parameters

In [0]:
import torch.nn as nn
criterion = nn.CrossEntropyLoss()

In [0]:
import numpy as np
def flat_accuracy(preds,labels):  # A function to predict Accuracy
  correct=0
  for i in range(0,len(labels)):
    if(preds[i]==labels[i]):
      correct+=1
  return (correct/len(labels))*100


### Here our training Begins

In [24]:
no_train = 0
epochs = 3
for epoch in range(epochs):
  model.train()
  loss1 = []
  steps = 0
  train_loss = []
  l = []
  for inputs,labels1 in loader :
    inputs.to(device)
    labels1.to(device)
    optimizer.zero_grad()
    outputs = model(inputs.to(device))
    loss = criterion(outputs[0],labels1.to(device)).to(device)
    logits = outputs[1]
    #ll=outp(loss)
    [train_loss.append(p.item()) for p in torch.argmax(outputs[0],axis=1).flatten() ]#our predicted 
    [l.append(z.item()) for z in labels1]# real labels
    loss.backward()
    optimizer.step()
    loss1.append(loss.item())
    no_train += inputs.size(0)
    steps += 1
  print("Current Loss is : {} Step is : {} number of Example : {} Accuracy : {}".format(loss.item(),epoch,no_train,flat_accuracy(train_loss,l)))


Current Loss is : 0.057762544602155685 Step is : 0 number of Example : 849 Accuracy : 68.31566548881037
Current Loss is : 0.006311813835054636 Step is : 1 number of Example : 1698 Accuracy : 93.05064782096584
Current Loss is : 0.00749961519613862 Step is : 2 number of Example : 2547 Accuracy : 92.8150765606596


- torch.argmax() returns the index of the max number 
- axis = 1 means that it will search maximum number in a row

In [26]:
model.eval()#Testing our Model
acc = []
lab = []
t = 0
for inp,lab1 in test_loader:
  inp.to(device)
  lab1.to(device)
  t+=lab1.size(0)
  outp1 = model(inp.to(device))
  [acc.append(p1.item()) for p1 in torch.argmax(outp1[0],axis=1).flatten() ]
  [lab.append(z1.item()) for z1 in lab1]
print("Total Examples : {} Accuracy {}".format(t,flat_accuracy(acc,lab)))


Total Examples : 150 Accuracy 93.33333333333333
