<a href="https://colab.research.google.com/github/rachelyayra/AIAssignment/blob/main/SEMEVAL_TABULAR_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview of Code Implementation

This notebook demonstrates the implementation of a question-answering system over tabular data using the TAPAS and T5 models. The goal is to train a model that can effectively answer questions based on the context provided by structured tables.

## Key Components:

1. **Data Preparation**:
   - The dataset consists of tables that are structured for question-answering tasks.
   - The first 1,000 tables are selected for the training set, while the subsequent 300 tables are designated for validation.

2. **Model Initialization**:
   - Models are initialized using the appropriate architectures (TAPAS and T5) designed for processing tabular data and generating text-based responses.

3. **Training Loop**:
   - The training process includes:
     - Forward pass through the model with input data.
     - Loss computation using the CrossEntropyLoss function.
     - Backpropagation to optimize the model parameters through the AdamW optimizer.
     - Losses are collected for each batch to monitor the training progress.

4. **Validation**:
   - A validation phase is integrated to assess the model's performance on unseen data after each epoch, ensuring that the model generalizes well.

5. **Error Handling**:
   - The implementation includes error handling to gracefully skip batches that encounter issues during processing.

6. **Batch Loss Tracking**:
   - The notebook maintains a list of batch losses, which can be analyzed after training to understand the model's learning behavior over time.


# HOUSE KEEPING

In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.1-py3-none-any.whl (471 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-an

In [None]:
import pandas as pd
from datasets import load_dataset
from transformers import TapasTokenizer, TapasForQuestionAnswering

import torch
import torch.nn as nn
from transformers import TapasModel, T5ForConditionalGeneration, T5Tokenizer
from torch.utils.data import DataLoader, Dataset
from transformers import AdamW
from transformers import TapasModel, T5ForConditionalGeneration, T5Tokenizer, AutoTokenizer, Seq2SeqTrainingArguments, Seq2SeqTrainer

# DATA

In [None]:
# Load the QA pairs
semeval_dev_qa = load_dataset("cardiffnlp/databench", name="semeval", split="dev")

README.md:   0%|          | 0.00/43.6k [00:00<?, ?B/s]

Resolving data files:   0%|          | 0/65 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/49 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/49 [00:00<?, ?files/s]

qa.parquet:   0%|          | 0.00/8.47k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.87k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.42k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.66k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.74k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.42k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.02k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.47k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.25k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.69k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.38k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.40k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.91k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.52k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.63k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.79k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/9.37k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.58k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.71k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.36k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.50k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.91k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.06k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.83k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.93k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.82k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.26k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.42k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.23k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.31k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.64k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.35k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.84k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.78k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.64k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.44k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.85k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.51k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.62k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.41k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.68k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.35k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.67k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.09k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.75k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.02k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

Downloading data:   0%|          | 0/16 [00:00<?, ?files/s]

qa.parquet:   0%|          | 0.00/7.85k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.62k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.58k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.06k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.91k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.44k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.23k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.12k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/9.35k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.61k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.88k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/7.72k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/8.29k [00:00<?, ?B/s]

qa.parquet:   0%|          | 0.00/9.46k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/988 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/320 [00:00<?, ? examples/s]

In [None]:
# Explore the data
print(semeval_dev_qa)


# Check the structure of the first dataset
print(semeval_dev_qa['dataset'][0])  # View the first QA pair

Dataset({
    features: ['question', 'answer', 'type', 'columns_used', 'column_types', 'sample_answer', 'dataset'],
    num_rows: 320
})
050_ING


In [None]:
for i in semeval_dev_qa:
  print(f"Question: {i['question']}")
  print(f"Answer: {i['answer']}")
  print(f"Sample Answer: {i['sample_answer']}")
  print(f"Columns: {i['columns_used']}")
  print(f"Columns: {i['column_types']}")

Question: Is the most favorited author mainly communicating in Spanish?
Answer: True
Sample Answer: True
Columns: ['favorites', 'lang']
Columns: ['category', 'category']
Question: Does the author with the longest name post mainly original content?
Answer: True
Sample Answer: False
Columns: ['author_name', 'type']
Columns: ['category', 'category']
Question: Is there an author who received no retweets for any of their posts?
Answer: True
Sample Answer: True
Columns: ['author_name', 'retweets']
Columns: ['category', 'number[uint8]']
Question: Are there any posts that do not contain any links?
Answer: True
Sample Answer: True
Columns: ['links']
Columns: ['list[url]']
Question: How many unique authors are in the dataset?
Answer: 3765
Sample Answer: 20
Columns: ['author_name']
Columns: ['category']
Question: What is the length of the longest post (based on the number of words)?
Answer: 61
Sample Answer: 49
Columns: ['text']
Columns: ['text']
Question: What is the total number of retweets rec

# TRAINING

In [None]:
# Load the TAPAS model and extract the encoder

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tapas_model = TapasModel.from_pretrained("google/tapas-base")
tapas_encoder = tapas_model.to(device)
tokenizer = AutoTokenizer.from_pretrained("google/tapas-base")
# Load the T5 model (decoder) and tokenizer
t5_model = T5ForConditionalGeneration.from_pretrained("t5-base").to(device)
t5_tokenizer = T5Tokenizer.from_pretrained("t5-base")

# Set the models to training mode
t5_model.train()

tokenizer_config.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/262k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/154 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=768, out_features=3072, bias=False)
              (wo): Linear(in_features=3072, out_features=768, bias=False)
              (dropout): Dro

In [None]:
for param in tapas_encoder.parameters():
    param.requires_grad = False

In [None]:
class TAPAST5Model(nn.Module):
    def __init__(self, tapas_model_name="google/tapas-base", t5_model_name="t5-base"):
        super(TAPAST5Model, self).__init__()

        # Load TAPAS encoder and T5 decoder
        self.tapas_encoder = tapas_encoder
        self.t5_decoder = t5_model

        # Get hidden sizes of TAPAS and T5
        self.tapas_hidden_size = self.tapas_encoder.config.hidden_size
        self.t5_hidden_size = self.t5_decoder.config.d_model

        # Linear layer to map TAPAS hidden size to T5 hidden size (if they differ)
        if self.tapas_hidden_size != self.t5_hidden_size:
            self.mapping_layer = nn.Linear(self.tapas_hidden_size, self.t5_hidden_size)
        else:
            self.mapping_layer = None  # No need for mapping if dimensions are already the same

    def forward(self, input_ids, attention_mask, decoder_input_ids, target_ids):
        # TAPAS encoder forward pass
        encoder_outputs = self.tapas_encoder(input_ids=input_ids, attention_mask=attention_mask)
        hidden_states = encoder_outputs.last_hidden_state  # Shape: (batch_size, seq_len, tapas_hidden_size)

        # If TAPAS and T5 have different hidden sizes, pass through the linear mapping layer
        if self.mapping_layer:
            hidden_states = self.mapping_layer(hidden_states)  # Shape: (batch_size, seq_len, t5_hidden_size)

        # T5 decoder forward pass
        outputs = self.t5_decoder(
            decoder_input_ids=decoder_input_ids,
            encoder_outputs=(hidden_states,),  # Pass encoder hidden states to T5 decoder
            labels=target_ids  # Use target sequence for teacher forcing
        )

        return outputs

    def generate_answer(self, input_ids, attention_mask):
        # Get encoder outputs
        encoder_outputs = self.tapas_encoder(input_ids=input_ids, attention_mask=attention_mask)

        # Generate answer using T5
        generated_ids = self.t5_decoder.generate(
            encoder_outputs=encoder_outputs,
            max_length=10
        )

        generated_answer = t5_tokenizer.decode(generated_ids[0], skip_special_tokens=True)
        return generated_answer
# Example usage:
# Assuming you have input_ids, attention_mask, decoder_input_ids, and target_ids ready
model = TAPAST5Model()

In [None]:
import os
import shutil
import pandas as pd
import torch
from transformers import  TapasForQuestionAnswering
from IPython.display import display, HTML


# Convert table to tokenizer-friendly format
def convert_table_to_inputs(table_data, queries):
    table = pd.DataFrame(table_data.values.tolist(), columns=table_data.columns)
    table = table.map(str)

    # Tokenize the inputs using the Tapas tokenizer
    inputs = tokenizer(
        table=table,
        queries=queries,
        padding="max_length",
        max_length = 512,
        truncation=True,
        return_tensors="pt"
    )

    inputs = {key: value.to(device) for key, value in inputs.items()}

    return inputs, table

In [None]:
all_qa = load_dataset("cardiffnlp/databench", name="qa", split="train").to_pandas()

# Define a custom dataset
class TabularQADataset(Dataset):
    def __init__(self, qa_data):
        self.qa_data = qa_data

    def __len__(self):
        return len(self.qa_data)

    def __getitem__(self, idx):

        question = self.qa_data.loc[idx, 'question']
        dataset_id = self.qa_data.loc[idx, 'dataset']

        # Load the corresponding table
        df = pd.read_parquet(f"hf://datasets/cardiffnlp/databench/data/{dataset_id}/sample.parquet").head(250)

        # Convert the table to a list of lists or a format suitable for your model
        # Here, we concatenate all rows into a string or can directly use DataFrame for processing
        inputs, table = convert_table_to_inputs(df, str(question))


        # Prepare the answer (assuming the answer is in the 'answer' field)
        answer = self.qa_data.loc[idx, 'answer']
        if answer == None:
          answer = 'None'
        # Tokenize the answer using the T5 tokenizer
        print(f"Answer: {answer}")
        target = t5_tokenizer(answer, padding='max_length', truncation=True, return_tensors="pt", max_length=32)

        target = {key: value.to(device) for key, value in target.items()}
        decoder_input_ids = torch.cat([torch.tensor([[t5_tokenizer.pad_token_id]]).to(device), target['input_ids'][:, :-1]], dim=1)
        return inputs['input_ids'].squeeze(0), inputs['attention_mask'].squeeze(0), decoder_input_ids.squeeze(), target['input_ids'].squeeze().long(), target['attention_mask'].squeeze().long()   # Labels for the target sequence


# Create dataset and dataloader
dataset = TabularQADataset(all_qa[:1200].reset_index())
dataloader = DataLoader(dataset, batch_size=16, shuffle=True)  # Adjust batch size as needed

val_dataset = TabularQADataset(all_qa[1000:].reset_index())
val_dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

Resolving data files:   0%|          | 0/65 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/65 [00:00<?, ?it/s]

In [None]:
import torch
import torch.nn as nn
from torch.optim import AdamW

# Assuming 'model', 'train_dataloader', and 'val_dataloader' are defined.
optimizer = AdamW(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-5)
loss_fn = nn.CrossEntropyLoss()

# Training loop
num_epochs = 4
train_losses = []
val_losses = []
val_accuracies = []

# Training loop
for epoch in range(num_epochs):
    total_loss = 0.0  # Accumulate the total loss for the epoch
    num_batches = 0  # Keep track of the number of batches

    # Training phase
    model.train()  # Set model to training mode
    for batch_num, (input_ids, attention_mask, decoder_inputs, target_ids, at) in enumerate(dataloader):
        if batch_num == 10:
          break
        try:
            print(f"Batch {batch_num}: at:", input_ids.shape)
            print(f"Batch {batch_num}: Attention mask shape:", attention_mask.shape)

            # Clear gradients
            optimizer.zero_grad()

            # Forward pass
            output = model(input_ids, attention_mask, decoder_inputs, target_ids)

            # Compute loss
            logits = output.logits
            loss = loss_fn(logits.view(-1, logits.size(-1)), target_ids.view(-1))  # Correct the target_ids slicing

            # Backpropagation
            loss.backward()
            optimizer.step()

            # Accumulate loss
            total_loss += loss.item()
            train_losses.append(loss.item())  # Save the loss for this batch
            num_batches += 1

            print(f"Epoch [{epoch + 1}/{num_epochs}], Batch [{batch_num}], Loss: {loss.item():.4f}")

        except ValueError as e:
            print(f"ValueError encountered in Batch {batch_num}: {e}. Skipping this batch.")
            continue  # Skip the current batch and continue to the next one

    if num_batches > 0:
        avg_loss = total_loss / num_batches
        print(f"Epoch [{epoch + 1}/{num_epochs}] Average Training Loss: {avg_loss:.4f}")

    # Validation phase
    model.eval()  # Set model to evaluation mode
    total_val_loss = 0.0
    val_num_batches = 0
    correct_predictions = 0

    with torch.no_grad():  # No gradients needed for validation
        for val_batch_num, (input_ids, attention_mask, decoder_inputs, target_ids, at) in enumerate(val_dataloader):
            try:
                # Forward pass
                output = model(input_ids, attention_mask, decoder_inputs, target_ids)

                # Compute validation loss
                logits = output.logits
                val_loss = loss_fn(logits.view(-1, logits.size(-1)), target_ids.view(-1))  # Correct the target_ids slicing

                # Accumulate validation loss
                total_val_loss += val_loss.item()
                val_num_batches += 1

                # Calculate accuracy
                predictions = torch.argmax(logits, dim=-1)
                correct_predictions += (predictions.view(-1) == target_ids.view(-1)).sum().item()

                print(f"Validation Batch [{val_batch_num}], Loss: {val_loss.item():.4f}")

            except ValueError as e:
                print(f"ValueError encountered in Validation Batch {val_batch_num}: {e}. Skipping this batch.")
                continue  # Skip the current validation batch

    if val_num_batches > 0:
        avg_val_loss = total_val_loss / val_num_batches
        val_accuracy = correct_predictions / (val_num_batches * target_ids.numel())  # Compute accuracy
        val_losses.append(avg_val_loss)  # Save validation loss
        val_accuracies.append(val_accuracy)  # Save validation accuracy
        print(f"Epoch [{epoch + 1}/{num_epochs}] Average Validation Loss: {avg_val_loss:.4f}, Accuracy: {val_accuracy:.4f}")

# After training, you can print or analyze the collected losses
print("All training batch losses:", train_losses)
print("All validation losses:", val_losses)
print("All validation accuracies:", val_accuracies)


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Private', 'children']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.9794365922809228, 0.9723660656030668, 0.954299437125917, 0.9362989453985364, 0.9307917067583288]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.373214039767641


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Madrid


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Central


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Super Deluxe', 'King']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Western Europe


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['m', 'f']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 264


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [(37.784560141211806, -122.40733704162238), (37.7751608100771, -122.40363551943442), (37.78640961281089, -122.40803623744476), (37.7839325760642, -122.4125952775858), (37.77871942789032, -122.4147412230519)]
Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3765


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Basic', 'Deluxe']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 9432


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Male', 'Female', 'Female', 'Male']
Batch 0: at: torch.Size([16, 512])
Batch 0: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [0], Loss: 13.7597


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [501, 501, 501, 501, 501, 501]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [6.0, 13.0, 10.0, 16.0, 12.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.560663998


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Friday', 'Sunday', 'Saturday', 'Thursday', 'Wednesday']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.0, 0.0, 0.0, 0.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [10, 10, 10, 10]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [2053, 2, 1936, 6227, 45]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 14


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 75


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 46.591644204851754


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Fashion accessories


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.08465


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [6037, 36061, 48201, 6059, 6071, 6085]
Batch 1: at: torch.Size([16, 512])
Batch 1: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [1], Loss: 14.0462


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Poison


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: English


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1, 1, 1]
Answer: [117, 111]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Month-to-month', 'Two year']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 549900


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['GP', 'MS']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Electronic check
Answer: [0, 0, 0, 0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [93, 92, 91]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [271.74, 254.63, 254.6, 252.72]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['sales', 'technical', 'support', 'IT']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Entire apartment


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['B2', 'S1', 'B1', 'P2', 'P3']
Batch 2: at: torch.Size([16, 512])
Batch 2: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [2], Loss: 12.9231


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [233.0, 233.0, 233.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['sam the sham and the pharaohs', 'ssgt barry sadler', 'the beach boys', 'the beatles', 'the beatles']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5.0, 3.0, 4.0, 0.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['No', 'Yes', 'No phone service']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [40580, 39899, 38430, 28524, 24452]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.2748


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5255, 4619, 10311, 6237]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Less than $40K', '$40K - $60K', '$80K - $120K']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 940


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Macau', 'Andorra', 'Moldova', 'Liechtenstein']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['65+', '55-64', '45-54', '35-44', '18-24', '25-34']
Batch 3: at: torch.Size([16, 512])
Batch 3: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [3], Loss: 12.2053


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [' CA', ' MA', ' NY', ' VA', ' IL']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [48.0, 83.0, 83.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Graduation


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 29370.243704368546


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [20, 20, 20, 20, 20, 20]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: photographer]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: beef


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Cluster 1


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [15628852, 15634748, 15634996]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 96


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [32.0, 31.0, 29.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [65051954, 57074270, 56932551, 49730580]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 524.0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: housell
Batch 4: at: torch.Size([16, 512])
Batch 4: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [4], Loss: 13.4990


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Flat


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Biotech & Pharmaceuticals', 'Insurance Carriers', 'Computer Hardware & Software', 'IT Services']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Wise


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1, 1, 1]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Highly Skilled', 'Skilled', 'Highly Skilled']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Unknown


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [8, 10, 12, 9, 7, 13]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Diésel


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [AK, VI]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['low', 'low']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Castellano


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: honey


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['candy', 'black tea', 'bacon', 'champagne', 'red wine', 'red apple']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: News
Batch 5: at: torch.Size([16, 512])
Batch 5: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [5], Loss: 12.9101


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Correctorada', 'El Joker', 'Xenia Viladas', 'DrJaus \xa0🇪🇸']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [72, 72, 72, 72, 72, 72]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [19.83, 19.74, 19.68]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 234.7


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 149.1


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Book Publisher', 'Bureau Chief', 'Publisher']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 117386


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2023-01-31


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: european/caucasian-american


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [35.0, 36.0, 34.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [8.28, 7.57, 7.4]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.1, 0.11, 0.09, 0.37]
Answer: True
Batch 6: at: torch.Size([16, 512])
Batch 6: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [6], Loss: 12.7965


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 10


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [13.0, 26.0, 7.75, 10.5]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Marathon (Mix Cut) - Simon O'Shine Mix
Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Director of Athletics', 'Recruiting Coordinator', 'Athletic Coordinator', 'Director of Personnel', 'Skills Trainer']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Art Competitions


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 40580


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 75.25


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Entire rental unit', 'Private room in rental unit']


  text = normalize_for_match(row[col_index].text)


Answer: None


  cell = row[col_index]
  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1934-02-09 00:00:00, 1917-12-30 00:00:00, 1943-02-15 00:00:00]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 19


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 4047


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['English', 'English, Spanish', 'English, Russian', 'English, Hungarian']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [102, 119, 112]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Batch 7: at: torch.Size([16, 512])
Batch 7: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [7], Loss: 12.2664


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Customer Service Issue


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1316


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [53.0, 55.0, 11.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [18484, 17744, 17634, 17498, 17437, 17350]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [The Smiler, Colossus (Thorpe Park)]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [50000.0, 100000.0, 10000.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1.0, 1.0, 1.0, 0.6593, 1.0, 0.6940000000000001]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Answer: 17


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [171000, 129000, 111000, 107000, 106000, 91400]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [Marathon (Mix Cut) - Simon O'Shine Mix, Applause; Martha Tilton Returns to Stage - Live]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Program Manager


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [3543402, 3543402, 3543402]
Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Data Scientist', 'Software Engineer', 'Other', 'Data Analyst', 'Currently not employed']
Answer: 0
Batch 8: at: torch.Size([16, 512])
Batch 8: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [8], Loss: 11.9251


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3922


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 6


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [20855.0, 21288.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: B2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: original


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Governor


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 18


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [-167.08526, -167.08526, -167.08526, -167.08526]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1, 2]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Less than $40K


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 214.0
Batch 9: at: torch.Size([16, 512])
Batch 9: Attention mask shape: torch.Size([16, 512])
Epoch [1/4], Batch [9], Loss: 12.9822


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['January', 'December']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [870, 822]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Retail trade


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Married


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [55.0, 55.0, 55.0, 42.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Water', 'Normal', 'Grass', 'Bug']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.7557249985959847, 0.7413189187628788, 0.7034528053640179]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['The party ideas are close to my own', 'The party is the most competent', 'I prefer not to say']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5, 1, 4, 2]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 10344


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Epoch [1/4] Average Training Loss: 12.9314


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [80.0, 80.0, 80.0, 80.0, 79.55]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Mortgage Banker


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Credit card


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Info-Unavailable', 'Result - Runway excursion', 'Result - Damaged on the ground']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.891806662, 0.861874342, 0.861330688]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Castellano


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)


Answer: ['Madrid Capital', 'Torrejón de Ardoz', 'Alcalá de Henares', 'Móstoles']


  cell = row[col_index]
  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Data Scientist', 'Software Engineer', 'Other', 'Data Analyst', 'Currently not employed']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ST


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [20, 20, 20, 20, 20, 20]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5081.805590062112


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['marvin gaye', 'wilson pickett', 'neil diamond', 'jerry butler', 'the beatles']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Electronic check


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5, 4, 3, 2]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 13056


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [6037, 13121, 48201]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Aquatic foods', 'Vegetables', 'Fruits']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Colombia', 'Andorra']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [The Smiler, Colossus (Thorpe Park)]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: the way you love me


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 58852


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['reply', 'original']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 504


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Shuckle


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Wise


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1992, 1988, 2000, 1996]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0-50


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1, 1]
Validation Batch [0], Loss: 11.3876


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['no code data science', 'no code data analytics', 'no code data science', 'no code data science']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [9.0, 7.0, 8.0, 6.0, 16.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2010


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['low', 'low']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 940


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Customer Service Issue


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['other', 'services', 'teacher', 'health', 'at_home']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 15


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [15, 9, 21, 11]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: en


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Aleutians West', 'Nome']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 799


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1243


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Junkers Ju-52/3m


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Salaried', 'Small Business', 'Large Business']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1380


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 159


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 774


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['30-50', '18-30', '0-18', '50+']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 52


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0, 1, 2, 3, 4, 5]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Tops


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Macau', 'Andorra', 'Moldova', 'Liechtenstein']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['es', 'es', 'es', 'es', 'es']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Westminster', 'Tower Hamlets', 'Hackney']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Validation Batch [1], Loss: 11.4734


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [4225998.36, 4153877.05, 4021902.63, 3903390.45, 2192967.2]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [524.0, 521.0, 517.0, 469.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [209, 210, 208, 207, 211, 205]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 10


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [3543402, 3543402, 3543402]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [15628852, 15634748, 15634996]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [570306133677760513, 570301031407624196, 570300817074462722, 570300767074181121]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: no code data science


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 14999


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['https://www.obviously.ai/', 'https://venturebeat.com/2021/10/12/no-code-ai-startup-obviously-ai-raises-4-7m/', 'https://hbr.org/2021/11/how-no-code-platforms-could-disrupt-the-it-industry']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['WELLS FARGO BANK NATL ASSOC', 'BANK OF AMERICA NATL ASSOC', 'U.S. BANK NATIONAL ASSOCIATION', 'JPMORGAN CHASE BANK NATL ASSOC', 'PNC BANK, NATIONAL ASSOCIATION']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Supergarage


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Credit card', 'Credit card', 'Ewallet']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 236


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: United States of America


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 18424


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [3, 10, 14, 16, 18]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: News


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Radio/TV


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 32.2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['G7', 'foreignoffice', 'UN', 'Conservatives', 'COP26']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [60, 78, 78]
Answer: 5


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Answer: 187
Validation Batch [2], Loss: 12.0345


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.0, 0.0, 0.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Hut


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Entire apartment', 'Private room in apartment', 'Entire condominium', 'Entire house']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2473


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: None


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 131
Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['TEDxPuget Sound', 'TEDxHouston', 'TEDxFiDiWomen', 'TEDxUW']
Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Salaried', 'Small Business', 'Large Business', 'Free Lancer']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Western Europe', 'Southeast Asia', 'Sub-Saharan Africa']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.0, 0.0, 0.0007, 0.0007, 0.0007, 0.002]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Westminster


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['https://www.obviously.ai/', 'https://www.obviously.ai/', 'https://venturebeat.com/2021/10/12/no-code-ai-startup-obviously-ai-raises-4-7m/']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [21, 4, 11]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 96.07926963408374


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 198.7995642701525


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 73


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5.0, 5.0, 5.0, 5.0, 5.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Brown


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Adjustment


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [' CA', ' MA', ' NY', ' VA', ' IL']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Rural', 'Urban']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 14.745974597459746


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Soy', 'Green vegetables']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Piso', 'Chalet', 'Apartamento', 'Chalet adosado', 'Chalet unifamiliar']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 234.7
Validation Batch [3], Loss: 11.8578


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [2016.0, 2017.0, 2019.0, 2020.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['US Airways', 'American', 'United']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1998, 2009, 2010, 2007, 2002]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Ricardo Blas Jr.


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['North Slope', 'Northwest Arctic', 'Yukon-Koyukuk', 'Nome', 'Fairbanks North Star']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: en


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [3, 2, 4, 1, 5]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Madrid', 'Barcelona', 'Valencia']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [86, 85, 84, 84]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Right


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [17000000.0, 13600000.0, 13250000.0, 13000000.0, 12000000.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [35.0, 36.0, 34.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 10344


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [-23714.217, -23706.5, -23698.271, -23697.166]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Graduate', 'High School', 'Unknown']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 149.1


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: News


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 24


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Fashion accessories
Validation Batch [4], Loss: 11.9944


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 65+


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [8, 10, 12, 9, 7, 13]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 13


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Other', 'Healthcare', 'Office worker or other professional', 'Industrial (e.g. construction, manufacturing, maintenance and repair)']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Retail trade', 'Other services (except public administration)', 'Manufacturing']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: N26


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 6015


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: f


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 25.4603341382503


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Python', 'SQL', 'R', 'Javascript', 'C++', 'Java']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Summer


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Western Europe']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2023-01-31


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Sunday


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['female', 'male']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [69.42718361, 69.42718361, 69.42718361]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['unknown', 'Havana-José Martí International Airport (HAV)', 'Miami International Airport, FL (MIA)', 'Rio de Janeiro-Galeão International Airport, RJ (GIG)', 'Beirut International Airport (BEY)']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: -15.0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 683


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['ST', 'LVH', 'Normal']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Some college, no degree


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 80.94


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Male', 'Female', 'Female', 'Male']
Validation Batch [5], Loss: 12.2043


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.0, 0.0, 0.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2536


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.7826336180787501


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: BROOKLYN


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1979-04-10 00:00:00


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3131144855


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.64


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['uptown funk', 'thinking out loud', 'see you again', 'trap queen']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Entire rental unit', 'Private room in rental unit']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1, 2, 5, 4]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 8.28


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5504.0, 3957.0, 2974.0, 2927.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [24.0, 22.0, 27.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Married
Answer: 17
Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [33629705, 46718634, 51900343, 53128216, 34575561, 46015340]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2301


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Very thin


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Tuesday', 'Monday']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3574


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: South Atlantic


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [106.0, 104.0, 104.0, 104.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 906


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [9432, 1503, 1501]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: european/caucasian-american


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ASY


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Die drei ???', 'Benjamin Blümchen', 'TKKG Retro-Archiv', 'Bibi Blocksberg', 'Lata Mangeshkar']
Validation Batch [6], Loss: 12.5982


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['NYPD', 'HPD', 'DOT', 'DSNY']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['U.S. Senator', 'Congressman']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['bacon', 'peanuts', 'cheese', 'popcorn']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Existing Credits Paid Back Duly Till Now


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [65, 62, 60, 60, 59]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Book Publisher', 'Bureau Chief', 'Publisher']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: member, portion, body, end


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Skilled', 'Skilled', 'Skilled', 'Unskilled - Resident']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 549900


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [569972097453137920, 568092537786748928, 568028183267639297, 568993773277069312, 569227372223811584]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: photographer]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [50, 50, 50]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Medium', 'Unknown']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['ASY', 'NAP', 'ATA']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5.0, 3.0, 6.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 72


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer:  CA


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.5085, 0.6045]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['MENOS 19 AÑOS', 'MÁS DE 59', 'DE 20 A 29']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['English', 'English, Spanish', 'English, Russian', 'English, Hungarian']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [118.75, 118.65, 118.6, 118.6]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: TA


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [95150.0, 90130.0, 64430.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['The party ideas are close to my own', 'The party is the most competent', 'I prefer not to say']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Universidad


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.910357833, 0.852532268, 0.846880972]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['.Net Architect', 'Android Developer', 'Principal Engineer', 'Game Engineer']
Validation Batch [7], Loss: 11.6391


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Ciudadanos


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['TA', 'ATA', 'NAP', 'ASY']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [200.91135265700484, 202.49742647058824, 200.75818752803949]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 524.0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [44, 32, 31]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['habit, plant, foliage, flowers', 'soybean, plant, cultivar, soybean cultivar']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [8.28, 7.57, 7.4]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [214.0, 198.0, 190.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [9479476.0, 9479477.0, 9479478.0, 9479479.0, 9479480.0, 9479481.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Monster Tunes Yearmix 2011 - Mixed by Mark Eteson


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [90.0, 12.275, 9.35, 10.5167]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1934-02-09 00:00:00


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['8536270', '2261367', '41552433', '23124338']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.55, 0.5, 0.54, 0.51, 0.57, 0.49]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Answer: 1


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Damaged beyond repair', 'Substantial']
Answer: 0.12


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [Heartbreak Anniversary, Good Days, Paradise (feat. Dermot Kennedy)]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Red', 'Other', 'White', 'Blue']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [75.25, 74.4, 73.0, 73.0, 73.0, 73.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3274


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [42.0, 42.0, 42.0, 42.0, 39.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: USA


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: US Airways


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1.3407668971005122


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Wood
Validation Batch [8], Loss: 11.9183


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Director of Athletics', 'Recruiting Coordinator', 'Athletic Coordinator', 'Director of Personnel', 'Skills Trainer']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Adventure, Family, Fantasy


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 123


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Dresses


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Adventure, Family, Fantasy', 'Drama, Mystery', 'Drama, Romance, War']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['New York, New York, United States', 'US', 'Brooklyn, New York, United States', 'Queens, New York, United States', 'Bronx, New York, United States']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Less than $40K', '$40K - $60K', '$80K - $120K', '$60K - $80K']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Policy Officer


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Entire apartment


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1947-12-27 00:00:00, 1947-12-28 00:00:00]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [2.0, 3.0, 4.0, 1.0, 5.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: New York, New York, United States


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 201.05


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1002


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 164636.4123068934


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2016-12-22T00:00:00Z


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [161, 237, 236]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['FC Bayern München', 'Real Madrid', 'FC Barcelona', 'Paris Saint-Germain', 'Juventus', 'Manchester City']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Airplane - Pressurization, Airplane - Pressurization - Bulkhead failure, Airplane - Pressurization - Explosive decompression, Maintenance - (repair of) previous damage, Result - Loss of control


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [-167.08526, -167.08526, -167.08526, -167.08526]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Los Angeles', 'New York', 'San Francisco']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 94


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0, 12, 10, 11]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 20776


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 16.19912


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [455, 455, 508, 455]
Validation Batch [9], Loss: 11.3715


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 27


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: En route (ENR)


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['management', 'RandD', 'product_mng', 'marketing', 'support']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [560000, 370000]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Sports and travel', 'Fashion accessories']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['July', 'August', 'June']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5000000, 4000006, 3500000]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 7.222675085250001


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.0353641396193574, 0.0355792960526332]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Eggs', 'Baby foods', 'Unclassified', 'Herbs and spices']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Answer: 15.9


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Eastern Time (US & Canada)', 'Central Time (US & Canada)', 'Pacific Time (US & Canada)', 'Quito', 'Atlantic Time (Canada)']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 3220


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [5.0, 3.0, 4.0, 0.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [39, 30, 45, 33]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['USA', 'UK', 'Canada', 'UK, USA', 'Australia']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [234.7, 217.8, 202.5, 202.1]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: honey


  text = normalize_for_match(row[col_index].text)


Answer: Mortgage Banker


  cell = row[col_index]
  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [149.1, 128.0, 120.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Validation Batch [10], Loss: 12.1427


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Diésel


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 6


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2.74


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [19.83, 19.74, 19.68]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: A laptop


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Retail trade


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 29370.243704368546


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Answer: ['7.9', '10.3', '5.6', '12.6', '11.3']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [18484, 17744, 17634, 17498, 17437, 17350]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Madrid


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [65051954, 57074270, 56932551, 49730580]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.7493706045


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 47


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.0353641396193574


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Lavincompae', '#NI UNA MENOS \xa0♐\xa0✊\xa0🚺', 'SFC The World']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: UKR


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: low


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Spring', 'Winter', 'Autumn', 'Summer']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Alex Gendler', 'Iseult Gillespie', 'Emma Bryce']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['News', 'News', 'News', 'News', 'News']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 105500000


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.08465


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [Marathon (Mix Cut) - Simon O'Shine Mix, Applause; Martha Tilton Returns to Stage - Live]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['News', 'News', 'News']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['candy', 'black tea', 'bacon', 'champagne', 'red wine', 'red apple']
Validation Batch [11], Loss: 11.9090


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Graduation


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.1652381569664056, 0.2005428064324122, 0.2215546116855247, 0.2506791678499942]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Together


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: low


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [171000, 129000, 111000, 107000, 106000, 91400]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Private', 'children']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 5504.0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [500000.0, 360000.0, 300000.0, 300000.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 56332.81720430108


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['CA', 'TX', 'NY', 'FL', 'GA']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 85


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 73


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Crystal Palace


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [50000.0, 100000.0, 10000.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1315


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [158, 116, 114, 94, 80, 72]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['U.S. Senator', 'Congressman']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Studio Apartment in East Williamsburg', 'Spacious Artist Loft Williamsburg', 'Cute 1 BR in the Lower East Side']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 117386


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0.0353641396193574


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2148005.5737827714
Answer: ['6', '6']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Blanco', 'Gris / Plata', 'Negro', 'Azul']
Validation Batch [12], Loss: 12.1830


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Skilled', 'Skilled', 'Unskilled - Resident', 'Highly Skilled', 'Skilled']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False
Answer: [1, 1, 1, 1, 1]
Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Richard Dawkins


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [254.0, 232.5, 225.0, 205.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: sam the sham and the pharaohs


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: TEDxPuget Sound


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Eastern Time (US & Canada)


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Private', 'Self-employed', 'Govt_job']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 2240


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Drama', 'Crime, Drama']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['USAAF', 'USAF', 'RAF', 'US Navy']
Answer: [0, 0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 52247.25135379061


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1947-12-27 00:00:00


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['bacon', 'peanuts', 'chocolate bar', 'popcorn', 'cookie']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['girl youll be a woman soon', 'papa dont preach', 'breathe']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['65+', '55-64', '45-54', '35-44', '18-24', '25-34']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [15.7, 16.7, 16.7]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 870


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [54, 54, 54, 54, 54]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 0


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 199496.34


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 7043


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [18.0, 18.0, 18.0, 18.0, 18.0, 18.0]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['India', 'USA', 'Western Europe', 'China - Japan - Korea']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Open or Active
Validation Batch [13], Loss: 11.4707


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 4


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ING España


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [72, 72, 72, 72, 72, 72]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Entire rental unit


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: drugstores


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [4, 4, 4, 4, 4]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 906


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 4


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Alton Towers


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Tops', 'Dresses', 'Bottoms', 'Intimate', 'Jackets', 'Trend']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['Virgin America', 'Delta']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Countdown


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [0.263732493, 0.246944219, 0.214965805]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Intamin


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: honey


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1744, 1781, 1781]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Los Angeles


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 1965


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['No', 'Yes', 'No phone service']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: ['affection', 'achievement', 'enjoy_the_moment', 'bonding']


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: bacon


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 340


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 372


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [102, 119, 112]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True
Validation Batch [14], Loss: 11.8997


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: 23110


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [1740, 1500, 1228]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: Entire home/apt


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: False


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: [2, 1, 6, 0, 7, 8]


  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Answer: True


  text = normalize_for_match(row[col_index].text)


KeyboardInterrupt: 

In [None]:
from transformers import AutoTokenizer, TapasForQuestionAnswering
import pandas as pd

import torch

# Set the model to evaluation mode
# Load your custom model
model.eval()  # Set the model to evaluation mode

# Prepare the input data
table_data = pd.read_parquet(f"hf://datasets/cardiffnlp/databench/data/{semeval_dev_qa['dataset'][1]}/sample.parquet").head(250)
print(table_data)

queries = ['Is the most favorited author mainly communicating in Spanish?']

input_ids, table = convert_table_to_inputs(table_data, queries)

with torch.no_grad():
    generated_ids = model.generate_answer(input_ids['input_ids'], input_ids['attention_mask'])


# Output result
print("Question:", queries)
print("Generated Answer:", generated_ids)


                     id            author_id               author_name  \
0   1166109737643139072           1007329988                Lau_Arbona   
1   1166096471650963456   868749503461040130             Jimmy McNulty   
2   1166078033691840513           2882602743                  la chula   
3   1166068194299252736           1348393310                  Mr_Gosky   
4   1166066536093814786            867101041                 Angeloclv   
5   1166056669689896960            133189676                       Eva   
6   1166055519024553989            332723994              𝕻𝖆𝖇𝖑𝖔 𝕬𝖗𝖗𝖔𝖞𝖔   
7   1166023559250157570           1072994982                   xAnubis   
8   1165996371750653952            295455234         Unnamed player  💬   
9   1165995541303582720             14524588              Sergio Gómez   
10  1165976106161385474  1094963080948260864                     Astur   
11  1165972146729947137  1144999258023747584                  Don Vito   
12  1165963665725673472            139

  text = normalize_for_match(row[col_index].text)
  cell = row[col_index]


Question: ['Is the most favorited author mainly communicating in Spanish?']
Generated Answer: 


## The model Fails to generate an answer

In [None]:
def generate_predictions(qa_dataset, max_rows=100):
    predictions = []

    for idx, row in enumerate(qa_dataset):
        print(f"Row Number: {idx}")
        question = row['question']
        table_key = row['dataset']  # Get the dataset associated with the question

        df = pd.read_parquet(f"hf://datasets/cardiffnlp/databench/data/{table_key}/all.parquet").head(250)
        queries = [question]

        input_ids, table = convert_table_to_inputs(df, queries)

        # Get TAPAS encoder outputs
        with torch.no_grad():  # Disable gradient tracking for evaluation
            encoder_outputs = tapas_encoder(input_ids=input_ids['input_ids'], attention_mask=input_ids['attention_mask'])

        # Get hidden states
        hidden_states = encoder_outputs.last_hidden_state

        # Generate predictions using T5's generate method
        with torch.no_grad():
            generated_ids = t5_model.generate(
                encoder_outputs=encoder_outputs,
                max_length=32,  # Adjust max length according to your task
                num_beams=5,  # You can use beam search or set this to 1 for greedy decoding
                early_stopping=True
            )

        # Decode the generated token IDs back to text
        predicted_text = t5_tokenizer.decode(generated_ids[0], skip_special_tokens=True)
        print(f"Predicted Text: {predicted_text}")

        predictions.append(predicted_text)
        break

    return predictions


In [None]:
def generate_predictions_sample(qa_dataset, max_rows=100):
    predictions = []

    for idx, row in enumerate(qa_dataset):
        print(f"Row Number: {idx}")
        question = row['question']
        table_key = row['dataset']  # Get the dataset associated with the question

        df = pd.read_parquet(f"hf://datasets/cardiffnlp/databench/data/{table_key}/sample.parquet").head(250)
        queries = [question]

        input_ids, table = convert_table_to_inputs(df, queries)

        # Get TAPAS encoder outputs
        with torch.no_grad():  # Disable gradient tracking for evaluation
            encoder_outputs = tapas_encoder(input_ids=input_ids['input_ids'], attention_mask=input_ids['attention_mask'])

        # Get hidden states
        hidden_states = encoder_outputs.last_hidden_state

        # Generate predictions using T5's generate method
        with torch.no_grad():
            generated_ids = t5_model.generate(
                encoder_outputs=encoder_outputs,
                max_length=32,  # Adjust max length according to your task
                num_beams=5,  # You can use beam search or set this to 1 for greedy decoding
                early_stopping=True
            )

        # Decode the generated token IDs back to text
        predicted_text = t5_tokenizer.decode(generated_ids[0], skip_special_tokens=True)
        print(f"Predicted Text: {predicted_text}")

        predictions.append(predicted_text)
        break

    return predictions

### SAVE FILES

In [None]:
predictions = generate_predictions(semeval_dev_qa)

In [None]:
predictions_lite = generate_predictions_sample(semeval_dev_qa)

In [None]:
with open('predictions_lite.txt', 'w') as f:
  for prediction in predictions_lite:
    f.write(f"{prediction}\n")  # Write each prediction on a new line

In [None]:
with open('predictions.txt', 'w') as f:
  for prediction in predictions:
    f.write(f"{prediction}\n")  # Write each prediction on a new line
