# Prequential RoBERTa DPR Pool

Within the paper [WatClaimCheck: A new Dataset for Claim Entailment and Inference](https://aclanthology.org/2022.acl-long.92.pdf), they state that the Dense Passage Retrieval model using a pooling.

**How do they pool?**
1. For each claim, all of the sentences are pooled from every associated premise article and ranked using a similarity score.
2. Evidence sentences are concatenated in the descending order of similarity score.
3. The claim text and evidence sentences are concatenated.
4. Resulting text is truncated to maximum sequence length of transformer model to perform claim veracity inference

In [14]:
from helper import download_dataset
from transformers import RobertaConfig, RobertaTokenizer, TFRobertaForSequenceClassification

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

In [15]:
# Set Global Variables
DATASET_FP = "./WatClaimCheck_dataset" # CHANGE TO MATCH LOCAL

In [20]:
# Retrieve dataset
train_pd_df, valid_pd_df, test_pd_df = download_dataset(DATASET_FP)

In [18]:
# Set model properties
max_seq_len = 512
model_checkpoint = "roberta-base"
learning_rate = 0.00001

In [19]:
# Get model and set number of labels
model_config = RobertaConfig.from_pretrained(model_checkpoint)
model_config.num_labels = 3
model = TFRobertaForSequenceClassification.from_pretrained(model_checkpoint, config=model_config)

# Get Tokenizer
tokenizer = RobertaTokenizer.from_pretrained(model_checkpoint)

# Get optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate)

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predicti