#### What are you trying to do in this notebook?
Detecting contradiction and entailment in multilingual text using TPUs.
To create an NLI(Natural Language Inferencing) model that assigns labels of 0, 1, or 2 (corresponding to entailment, neutral, and contradiction) to pairs of premises and hypotheses. To make things more interesting, the train and test set.

#### Why are you trying it?
In this Competition, I wanna classify pairs of sentences (consisting of a premise and a hypothesis) into three categories - entailment, contradiction, or neutral.

There are three hypotheses :-

Hypothesis 1:
Just by the look on his face when he came through the door I just knew that he was let down.

**We know that this is true based on the information in the premise. So, this pair is related by *entailment*.**

Hypothesis 2:
He was trying not to make us feel guilty but we knew we had caused him trouble.

**This very well might be true, but we can’t conclude this based on the information in the premise. So, this relationship is *neutral*.**

Hypothesis 3:
He was so excited and bursting with joy that he practically knocked the door off it's frame.

**We know this isn’t true, because it is the complete opposite of what the premise says. So, this pair is related by *contradiction*.**

### Special thanks to Tensorflow Datasets (TFDS) for providing this and many other useful datasets! 


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from transformers import TFAutoModel,AutoTokenizer
import tensorflow as tf

In [None]:
checkpoint = 'joeddav/xlm-roberta-large-xnli'

tokenizer=AutoTokenizer.from_pretrained(checkpoint)

In [None]:
train=pd.read_csv('../input/contradictory-my-dear-watson/train.csv')
test=pd.read_csv('../input/contradictory-my-dear-watson/test.csv')
print(train.shape)
print(test.shape)

In [None]:
MAX_LEN=100

In [None]:
train_encoded = tokenizer.batch_encode_plus(train[['premise','hypothesis']].values.tolist(),padding='max_length',max_length=MAX_LEN,truncation=True,return_attention_mask=True)
test_encoded = tokenizer.batch_encode_plus(test[['premise','hypothesis']].values.tolist(),padding='max_length',max_length=MAX_LEN,truncation=True,return_attention_mask=True)

train_ids=tf.convert_to_tensor(train_encoded['input_ids'],dtype=tf.int32)
train_mask=tf.convert_to_tensor(train_encoded['attention_mask'],dtype=tf.int32)
train_input={'input_ids':train_ids,'input_mask':train_mask}

test_ids=tf.convert_to_tensor(test_encoded['input_ids'],dtype=tf.int32)
test_mask=tf.convert_to_tensor(test_encoded['attention_mask'],dtype=tf.int32)
test_input={'input_ids':test_ids,'input_mask':test_mask}

In [None]:
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
except ValueError:
    strategy = tf.distribute.get_strategy() # for CPU and single GPU

print('Number of replicas:', strategy.num_replicas_in_sync)


In [None]:
EPOCHS=20
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
VAL_SPLIT = 0.2

In [None]:
with strategy.scope():
    input_ids = tf.keras.Input(shape = (MAX_LEN,), dtype = tf.int32,name='input_ids') 
    input_mask=tf.keras.Input(shape=(MAX_LEN,),dtype=tf.int32,name='input_mask')   
    
    pretrained_model = TFAutoModel.from_pretrained(checkpoint)
    logits = pretrained_model([input_ids,input_mask])[0]

    output = tf.keras.layers.GlobalAveragePooling1D()(logits)
    output = tf.keras.layers.Dense(3, activation = 'softmax')(output)
    model = tf.keras.Model(inputs = [input_ids,input_mask], outputs = output)

    model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-5), 
                  loss = 'sparse_categorical_crossentropy', 
                  metrics = ['accuracy']) 
    model.summary()

In [None]:
early_stop = tf.keras.callbacks.EarlyStopping(patience=2, restore_best_weights=True)
model.fit(train_input, train.label, validation_split = VAL_SPLIT, epochs=EPOCHS, batch_size=BATCH_SIZE, callbacks=[early_stop], verbose=1)

In [None]:
predictions=[np.argmax(i) for i in model.predict(test_input)]

submission = test.id.copy().to_frame()
submission['prediction'] = predictions
submission.head()
id	

In [None]:
submission.to_csv("submission.csv", index = False)

#### Did it work?
Kaggle provides all users TPU Quota at no cost, which we can use to explore this competition. The most common approaches to NLI problems include using embeddings and transformers like BERT. In this competition, Kaggle provids a starter notebook to try our hand at this problem using the power of Tensor Processing Units (TPUs). Yes, it works because of the TPUs, TPUs are powerful hardware accelerators specialized in deep learning tasks, including Natural Language Processing. 

#### What did you not understand about this process?
Well, everything provides in the competition data page. I've no problem while working on it. If you guys don't understand the thing that I'll do in this notebook then please comment on this notebook. 

#### What else do you think you can try as part of this approach?
Well, everything is in its place. If I feel like i need to add something to it then i'll definitely do this.





### PLEASE UPVOTE if you like this notebook. It will keep me motivated to update my notebook. :)