<a href="https://colab.research.google.com/github/iliaadam/QA_system_SEC_filling_dataset/blob/main/QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I have already create the dataset, you can skip all next cells and start running everything above TEST MODEL.

This function extracts content from an SEC filing page within the 'Item 1' section,
    removes extraneous information, and returns the cleaned content.

    Args:
        content (str): The raw content of an SEC filing page.
        printData (bool, optional): Whether to print the cleaned content. Defaults to False.

    Returns:
        str: The cleaned content, stripped of unnecessary lines and symbols.

    The function performs the following steps:
    1. Extracts content between "Item 1." and the next "Item X." section.
    2. Removes lines with excessive spaces between characters.
    3. Discards lines with text in uppercase only.
    4. Eliminates lines with only symbols.
    5. Retains at most one empty line between paragraphs.

    If content is found, the cleaned content is returned. If no content is found,
    an empty string is returned.

In [None]:
import re
def preprocessContent(content, printData=False):
    # Use regular expressions to extract content between "Item 1.  " and next Item  "
    item_content = re.search(r'Item 1\.\s{2,}(.*?)Item (?!1)\d+\.\s{2,}', content, re.DOTALL)

    if item_content:
        content_between_items = item_content.group(1).strip()

        # Remove lines with more than 3 spaces between characters
        cleaned_content = '\n'.join(line for line in content_between_items.split('\n') if not re.search(r'\s{4,}', line))


        # Remove extra empty lines
        lines = cleaned_content.split('\n')
        cleaned_lines = []
        empty_line_count = 0
        for line in lines:
            if line.strip():
                # Remove lines with text in uppercase only
                if not line.isupper():
                    # Remove lines with only symbols
                    if not re.match(r'^[\W_]+$', line):
                        cleaned_lines.append(line)
                empty_line_count = 0
            else:
                empty_line_count += 1
                # Keep at most one empty line
                if empty_line_count <= 1:
                    cleaned_lines.append(line)

        final_content = '\n'.join(cleaned_lines)

        if printData:
            print(final_content)

        return final_content

    else:
      if printData:
            print("NO CONTECT FOUND")

      return ""


'''
def preprocessContent(content, printData=False):
  # Use regular expressions to extract content between "Item X.  " sections
        items = re.findall(r'Item \d+\.\s{2,}', content) #Item X. plus 2 or more spaces
        item_contents = []

        for i in range(len(items)):
            if i < len(items) - 1:
                start = content.find(items[i]) + len(items[i])
                end = content.find(items[i + 1])
            else:
                start = content.find(items[i]) + len(items[i])
                end = len(content)

            #item_contents.append(content[start:end].strip())
            item_content = content[start:end].strip()

            # Check if the item content contains only "None."
            if not re.search(r'\nNone\.\n', item_content):
                item_contents.append(item_content)


        if printData:
          for item_content in item_contents:
              print("Content: ")
              print(item_content)
              print("=" * 50)  # Add a separator for clarity
'''



'\ndef preprocessContent(content, printData=False):\n  # Use regular expressions to extract content between "Item X.  " sections\n        items = re.findall(r\'Item \\d+\\.\\s{2,}\', content) #Item X. plus 2 or more spaces\n        item_contents = []\n\n        for i in range(len(items)):\n            if i < len(items) - 1:\n                start = content.find(items[i]) + len(items[i])\n                end = content.find(items[i + 1])\n            else:\n                start = content.find(items[i]) + len(items[i])\n                end = len(content)\n\n            #item_contents.append(content[start:end].strip())\n            item_content = content[start:end].strip()\n        \n            # Check if the item content contains only "None."\n            if not re.search(r\'\nNone\\.\n\', item_content):\n                item_contents.append(item_content)\n            \n\n        if printData:\n          for item_content in item_contents:\n              print("Content: ")\n             

This code is a script for collecting and cleaning financial report documents. It does the following:

1. Reads a CSV file containing URLs to financial reports.
2. Makes HTTP requests to these URLs while setting a specific user agent.
3. Downloads the content of the reports and cleans it using a preprocessContent function.
4. Saves the cleaned content to text files, provided the content is not empty and substantial (at least 500 characters).
5. Packs these text files into a zip archive called "documents.zip."
6. Tracks the progress and terminates after processing 50 valid documents.

In essence, this code automates the retrieval, cleaning, and archiving of financial report data from the specified URLs.

In [None]:
import requests
import pandas as pd
import zipfile
from io import BytesIO

# Load the CSV file
df = pd.read_csv("1994.QTR2.csv")

# Create a zip file to store the text files
with zipfile.ZipFile("documents.zip", "w") as zipf:
    valid_contents = 0  # Keep track of the number of valid contents

    for i, row in df.iterrows():
        url = row["report_url"]

        # Set your user agent to a descriptive string as recommended by the SEC
        headers = {"User-Agent": "Content-Scraping"}

        try:
            response = requests.get(url, headers=headers)

            if response.status_code == 200:
                # Content of the text file
                content = response.text
                clean_content = preprocessContent(content, False)

                # Check if the content is not empty
                if len(clean_content.encode()) >= 500:
                    # Save cleaned content to a text file
                    filename = f"document{i + 1}.txt"
                    with open(filename, "w") as f:
                        f.write(clean_content)

                    # Add the text file to the zip archive
                    zipf.write(filename)
                    valid_contents += 1
                    print(valid_contents)
                    if valid_contents == 50:
                        break

            else:
                print(f"Failed to fetch content from URL {url}. Status code:", response.status_code)

        except requests.exceptions.RequestException as e:
            print(f"Request Exception for URL {url}:", e)



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


UPDATE THE ANSWER STARTING VARIABLE



In [None]:
import json

def update_answer_start(data):
    for item in data:
        if not item["qas"][0]["is_impossible"]:
            context = item["context"].lower()
            answer_text = item["qas"][0]["answers"][0]["text"].lower()
            answer_start = context.find(answer_text)
            if answer_start >= 0:
                item["qas"][0]["answers"][0]["answer_start"] = answer_start

    return data

# Load your JSON data
with open("train.json", "r") as file:
    json_data = json.load(file)

# Update answer_start values
updated_data = update_answer_start(json_data)

# Save the updated JSON data
with open("train_fixed.json", "w") as file:
    json.dump(updated_data, file, indent=4)

# Load your JSON data
with open("test.json", "r") as file:
    json_data = json.load(file)

# Update answer_start values
updated_data = update_answer_start(json_data)

# Save the updated JSON data
with open("test_fixed.json", "w") as file:
    json.dump(updated_data, file, indent=4)


TEST MODEL

In [1]:
!pip install simpletransformers

Collecting simpletransformers
  Downloading simpletransformers-0.64.3-py3-none-any.whl (250 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.8/250.8 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Collecting transformers>=4.31.0 (from simpletransformers)
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m68.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets (from simpletransformers)
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m42.7 MB/s[0m eta [36m0:00:00[0m
Collecting seqeval (from simpletransformers)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tokenizers (from simpletransformers)

In [2]:
import json
with open(r"train_fixed.json", "r") as read_file:
    train = json.load(read_file)

In [None]:
train

In [3]:
with open(r"test_fixed.json", "r") as read_file:
    test = json.load(read_file)

In [4]:
import logging
from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs

import nltk
nltk.download('punkt')
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from nltk import word_tokenize
from sklearn.model_selection import train_test_split

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [5]:
def createModel(model_name, model_type, epochs=50):
  ### Advanced Methodology
  train_args = {
      "reprocess_input_data": True,
      "overwrite_output_dir": True,
      "use_cached_eval_features": True,
      "output_dir": f"outputs/{model_type}",
      "best_model_dir": f"outputs/{model_type}/best_model",
      "evaluate_during_training": True,
      "max_seq_length": 128,
      "num_train_epochs": epochs,
      "evaluate_during_training_steps": 1000,
      "wandb_project": "Question Answer Application",
      "wandb_kwargs": {"name": model_name},
      "save_model_every_epoch": False,
      "save_eval_checkpoints": False,
      "n_best_size":3,
      # "use_early_stopping": True,
      # "early_stopping_metric": "mcc",
      # "n_gpu": 2,
      # "manual_seed": 4,
      # "use_multiprocessing": False,
      "train_batch_size": 16,
      "eval_batch_size": 8,
      #"dropout": 0.4
  }
  ### Remove output folder
  !rm -rf outputs

  model = QuestionAnsweringModel(
    model_type,model_name, args=train_args, use_cuda=False
  )

  return model

def evaluation(model, validation_data):
  # Evaluate the model
  result, texts = model.eval_model(validation_data)

  print("model's result: ", result)
  print("----------------------------------------------------")
  print("model's predictions : ")
  print(texts)

  # Calculate the overall blue score

  # Extract correct answers and predicted answers from 'correct_text', 'similar_text', and 'incorrect_text'
  truths = []
  predictions = []

  correct_text = texts['correct_text']
  similar_text = texts['similar_text']
  incorrect_text = texts['incorrect_text']

  # Create a SmoothingFunction
  smoother = SmoothingFunction()

  for qid, qa in correct_text.items():
          truths.append(qa)
          predictions.append(qa)

  for qid, qa in similar_text.items():
          truths.append(qa['truth'])
          predictions.append(qa['predicted'])

  for qid, qa in incorrect_text.items():
          truths.append(qa['truth'])
          predictions.append(qa['predicted'])

  # Calculate BLEU scores with smoothing
  bleu_scores = []
  for i in range(len(truths)):
    if truths[i] == predictions[i]:
      bleu_scores.append(100.0)
    else:
      truth = word_tokenize(truths[i])  # Tokenize the ground truth
      prediction = word_tokenize(predictions[i])  # Tokenize the prediction
      print(truths[i], " : ",predictions[i])
      score = sentence_bleu([truth], prediction, weights=(1,0,0,0), smoothing_function=smoother.method1)
      bleu_scores.append(score * 100)

  # Calculate the average BLEU score
  average_bleu_score = sum(bleu_scores) / len(bleu_scores)

  print("Average BLEU Score:", average_bleu_score)


Testing the BERT model

In [6]:
model_type="bert"
model_name= "bert-base-cased"
model = createModel(model_name, model_type)
model.train_model(train, eval_data=test)

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

convert squad examples to features:   0%|          | 0/64 [00:00<?, ?it/s]Could not find answer: 'Earnings from continuing operations before Earnings before cumulative effect of Cumulative effect of accounting changes Earnings per common share: Primary - Earnings from continuing operations Fully - Earnings from continuing operations' vs. 'Earnings from continuing operations, Earnings before cumulative effect of, Cumulative effect of accounting changes, Earnings per common share: Primary - Earnings from continuing operations, Fully - Earnings from continuing operations'
convert squad examples to features: 100%|██████████| 64/64 [00:00<00:00, 64.89it/s]
add example index and unique id: 100%|██████████| 64/64 [00:00<00:00, 261123.98it/s]


Epoch:   0%|          | 0/50 [00:00<?, ?it/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Running Epoch 0 of 50:   0%|          | 0/4 [00:00<?, ?it/s]


convert squad examples to features: 100%|██████████| 10/10 [00:00<00:00, 119.22it/s]

add example index and unique id: 100%|██████████| 10/10 [00:00<00:00, 9404.27it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 1 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 2 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 3 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 4 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 5 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 6 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 7 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 8 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 9 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 10 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 11 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 12 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 13 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 14 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 15 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 16 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 17 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 18 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 19 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 20 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 21 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 22 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 23 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 24 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 25 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 26 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 27 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 28 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 29 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 30 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 31 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 32 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 33 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 34 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 35 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 36 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 37 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 38 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 39 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 40 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 41 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 42 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 43 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 44 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 45 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 46 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 47 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 48 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 49 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

(200,
 {'global_step': [4,
   8,
   12,
   16,
   20,
   24,
   28,
   32,
   36,
   40,
   44,
   48,
   52,
   56,
   60,
   64,
   68,
   72,
   76,
   80,
   84,
   88,
   92,
   96,
   100,
   104,
   108,
   112,
   116,
   120,
   124,
   128,
   132,
   136,
   140,
   144,
   148,
   152,
   156,
   160,
   164,
   168,
   172,
   176,
   180,
   184,
   188,
   192,
   196,
   200],
  'correct': [0,
   0,
   0,
   2,
   2,
   3,
   4,
   4,
   3,
   3,
   3,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   1,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2],
  'similar': [7,
   9,
   9,
   6,
   6,
   6,
   5,
   5,
   5,
   6,
   6,
   5,
   7,
   6,
   8,
   6,
   6,
   7,
   7,
   9,
   7,
   7,
   8,
   5,
   6,
   7,
   6,
   6,
   6,
   5,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   4,

In [7]:
evaluation(model, test)

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

model's result:  {'correct': 2, 'similar': 4, 'incorrect': 4, 'eval_loss': -5.836533308029175}
----------------------------------------------------
model's predictions : 
{'correct_text': {'00002': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts', '000010': '$90 million'}, 'similar_text': {'00001': {'truth': 'dental products, including consumer oral hygiene and professional dental products; consumer products, including proprietary over-the-counter products and household products; and ethical pharmaceuticals.', 'predicted': '', 'question': 'What are the three categories of products developed by Block Drug Company, Inc.?'}, '00005': {'truth': 'office infection control', 'predicted': 'Company markets the VITAL DEFENSE line of office infection control products.', 'question': 'What is the focus of the VITAL DEFENSE line of products?'}, '00006': {'truth': '', 'predicted': 'The Company continues to expand its

In [7]:
model_type="bert"
model_name= "bert-base-cased"

# Split the data into training and validation sets
train_data, validation_data = train_test_split(train, test_size=0.2, random_state=42)
model = createModel(model_name, model_type,25)
model.train_model(train_data, eval_data=validation_data)

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
convert squad examples to features: 100%|██████████| 51/51 [00:00<00:00, 272.89it/s]
add example index and unique id: 100%|██████████| 51/51 [00:00<00:00, 33020.92it/s]


Epoch:   0%|          | 0/25 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
correct,▁▅▆█▆██▆███▆
eval_loss,███▇▆▅▄▄▃▂▂▁
global_step,▁▂▂▃▄▄▅▅▆▇▇█
incorrect,█▄▄▂▅▅▅▄▁▂▄▅
similar,▆█▆▆▃▁▁▆█▆▃▃
train_loss,██▇▆▅▅▄▃▃▂▁▁

0,1
correct,3.0
eval_loss,-4.15215
global_step,36.0
incorrect,5.0
similar,8.0
train_loss,0.38502


Running Epoch 0 of 25:   0%|          | 0/4 [00:00<?, ?it/s]


convert squad examples to features: 100%|██████████| 13/13 [00:00<00:00, 219.16it/s]

add example index and unique id: 100%|██████████| 13/13 [00:00<00:00, 9613.18it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 1 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 2 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 3 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 4 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 5 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 6 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 7 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 8 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 9 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 10 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 11 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 12 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 13 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 14 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 15 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 16 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 17 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 18 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 19 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 20 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 21 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 22 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 23 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 24 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

(100,
 {'global_step': [4,
   8,
   12,
   16,
   20,
   24,
   28,
   32,
   36,
   40,
   44,
   48,
   52,
   56,
   60,
   64,
   68,
   72,
   76,
   80,
   84,
   88,
   92,
   96,
   100],
  'correct': [1,
   2,
   1,
   1,
   1,
   2,
   3,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2,
   2],
  'similar': [8,
   10,
   9,
   7,
   10,
   10,
   9,
   7,
   8,
   6,
   6,
   6,
   7,
   6,
   6,
   7,
   7,
   7,
   6,
   6,
   6,
   7,
   7,
   7,
   7],
  'incorrect': [4,
   1,
   3,
   5,
   2,
   1,
   1,
   4,
   3,
   5,
   5,
   5,
   4,
   5,
   5,
   4,
   4,
   4,
   5,
   5,
   5,
   4,
   4,
   4,
   4],
  'train_loss': [4.789034843444824,
   3.8798141479492188,
   3.5680837631225586,
   2.63498592376709,
   2.282599687576294,
   1.1071667671203613,
   1.4720226526260376,
   0.7070402503013611,
   0.5271852612495422,
   0.45778408646583557,
   0.202438086271286,
   0.14995595812797546,
   0.0676170065999031,
   

In [8]:
evaluation(model, test)

convert squad examples to features: 100%|██████████| 10/10 [00:00<00:00, 137.66it/s]
add example index and unique id: 100%|██████████| 10/10 [00:00<00:00, 24230.53it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

model's result:  {'correct': 4, 'similar': 4, 'incorrect': 2, 'eval_loss': -5.184555768966675}
----------------------------------------------------
model's predictions : 
{'correct_text': {'00002': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts', '00006': '', '00007': '', '000010': '$90 million'}, 'similar_text': {'00001': {'truth': 'dental products, including consumer oral hygiene and professional dental products; consumer products, including proprietary over-the-counter products and household products; and ethical pharmaceuticals.', 'predicted': '', 'question': 'What are the three categories of products developed by Block Drug Company, Inc.?'}, '00004': {'truth': 'NYTOL Sleep-Aid Tablets, TEGRIN Medicated Shampoos and BC Headache Powder.', 'predicted': '', 'question': "What are the three well-known consumer brand names included in the Company's personal care products line?"}, '00005': {'truth': 'off

In [8]:
model_type="electra"
model_name= "google/electra-base-discriminator"
model = createModel(model_name, model_type)
model.train_model(train, eval_data=test)

Downloading (…)lve/main/config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of ElectraForQuestionAnswering were not initialized from the model checkpoint at google/electra-base-discriminator and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading (…)okenizer_config.json:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

convert squad examples to features:   0%|          | 0/64 [00:00<?, ?it/s]Could not find answer: 'Earnings from continuing operations before Earnings before cumulative effect of Cumulative effect of accounting changes Earnings per common share: Primary - Earnings from continuing operations Fully - Earnings from continuing operations' vs. 'Earnings from continuing operations, Earnings before cumulative effect of, Cumulative effect of accounting changes, Earnings per common share: Primary - Earnings from continuing operations, Fully - Earnings from continuing operations'
convert squad examples to features: 100%|██████████| 64/64 [00:00<00:00, 231.66it/s]
add example index and unique id: 100%|██████████| 64/64 [00:00<00:00, 130689.12it/s]


Epoch:   0%|          | 0/50 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Training loss,█▁▁▁
correct,▁▁▁▅▆██▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅
eval_loss,████▇▆▅▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
incorrect,▆▃▃▅▃▃▃▅▃▆▃▅▅▅▃▃▃▃▁▆▃▅▅▅████████████████
lr,█▆▃▁
similar,▅██▄▄▂▂▂▄▂▅▄▄▄▅▅▅▅▇▂▅▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,██▇▆▄▃▃▂▂▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
Training loss,0.00078
correct,2.0
eval_loss,-5.83653
global_step,200.0
incorrect,4.0
lr,0.0
similar,4.0
train_loss,0.00078


Running Epoch 0 of 50:   0%|          | 0/4 [00:00<?, ?it/s]


convert squad examples to features: 100%|██████████| 10/10 [00:00<00:00, 200.78it/s]

add example index and unique id: 100%|██████████| 10/10 [00:00<00:00, 27235.74it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 1 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 2 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 3 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 4 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 5 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 6 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 7 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 8 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 9 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 10 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 11 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 12 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 13 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 14 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 15 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 16 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 17 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 18 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 19 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 20 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 21 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 22 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 23 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 24 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 25 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 26 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 27 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 28 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 29 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 30 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 31 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 32 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 33 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 34 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 35 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 36 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 37 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 38 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 39 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 40 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 41 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 42 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 43 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 44 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 45 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 46 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 47 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 48 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 49 of 50:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

(200,
 {'global_step': [4,
   8,
   12,
   16,
   20,
   24,
   28,
   32,
   36,
   40,
   44,
   48,
   52,
   56,
   60,
   64,
   68,
   72,
   76,
   80,
   84,
   88,
   92,
   96,
   100,
   104,
   108,
   112,
   116,
   120,
   124,
   128,
   132,
   136,
   140,
   144,
   148,
   152,
   156,
   160,
   164,
   168,
   172,
   176,
   180,
   184,
   188,
   192,
   196,
   200],
  'correct': [1,
   1,
   3,
   0,
   0,
   1,
   1,
   2,
   2,
   2,
   3,
   2,
   3,
   2,
   2,
   2,
   3,
   3,
   3,
   3,
   4,
   3,
   3,
   3,
   3,
   3,
   3,
   4,
   4,
   4,
   3,
   3,
   3,
   3,
   3,
   1,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3],
  'similar': [4,
   5,
   5,
   9,
   8,
   9,
   9,
   7,
   7,
   7,
   6,
   7,
   6,
   7,
   7,
   7,
   6,
   6,
   6,
   6,
   5,
   6,
   6,
   5,
   5,
   5,
   5,
   3,
   4,
   3,
   5,
   5,
   5,
   5,
   4,
   6,
   4,
   5,
   7,
   5,
   5,
   4,
   4,
   4,
   4,
   4,
   4,

In [9]:
evaluation(model, test)

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

model's result:  {'correct': 3, 'similar': 4, 'incorrect': 3, 'eval_loss': -4.372351408004761}
----------------------------------------------------
model's predictions : 
{'correct_text': {'00006': '', '00007': '', '000010': '$90 million'}, 'similar_text': {'00002': {'truth': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts', 'predicted': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts.', 'question': 'What are the principal products of Tultex Corporation?'}, '00003': {'truth': 'why it should not refund to its customers $2,300,000', 'predicted': '$2,300,000', 'question': 'What the PSC requires the Company to show?'}, '00004': {'truth': 'NYTOL Sleep-Aid Tablets, TEGRIN Medicated Shampoos and BC Headache Powder.', 'predicted': "Included in the Company's personal care products line are three well-known consumer brand names:

In [6]:
model_type="electra"
model_name= "google/electra-base-discriminator"
# Split the data into training and validation sets
train_data, validation_data = train_test_split(train, test_size=0.2, random_state=42)
model = createModel(model_name, model_type, 25)
model.train_model(train_data, eval_data=validation_data)

Downloading (…)lve/main/config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of ElectraForQuestionAnswering were not initialized from the model checkpoint at google/electra-base-discriminator and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading (…)okenizer_config.json:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

convert squad examples to features: 100%|██████████| 51/51 [00:00<00:00, 78.50it/s]
add example index and unique id: 100%|██████████| 51/51 [00:00<00:00, 198800.65it/s]


Epoch:   0%|          | 0/25 [00:00<?, ?it/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Running Epoch 0 of 25:   0%|          | 0/4 [00:00<?, ?it/s]


convert squad examples to features:   0%|          | 0/13 [00:00<?, ?it/s][A
convert squad examples to features: 100%|██████████| 13/13 [00:00<00:00, 48.75it/s]

add example index and unique id: 100%|██████████| 13/13 [00:00<00:00, 24528.09it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 1 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 2 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 3 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 4 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 5 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 6 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 7 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 8 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 9 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 10 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 11 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 12 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 13 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 14 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 15 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 16 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 17 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 18 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 19 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 20 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 21 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 22 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 23 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 24 of 25:   0%|          | 0/4 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

(100,
 {'global_step': [4,
   8,
   12,
   16,
   20,
   24,
   28,
   32,
   36,
   40,
   44,
   48,
   52,
   56,
   60,
   64,
   68,
   72,
   76,
   80,
   84,
   88,
   92,
   96,
   100],
  'correct': [1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1],
  'similar': [4,
   10,
   11,
   9,
   9,
   9,
   9,
   9,
   9,
   9,
   9,
   8,
   11,
   10,
   8,
   8,
   8,
   8,
   8,
   8,
   8,
   7,
   7,
   7,
   7],
  'incorrect': [8,
   2,
   1,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   3,
   4,
   1,
   2,
   4,
   4,
   4,
   4,
   4,
   4,
   4,
   5,
   5,
   5,
   5],
  'train_loss': [4.796840190887451,
   4.363395690917969,
   3.8628439903259277,
   3.6087985038757324,
   3.0655975341796875,
   2.6099085807800293,
   3.0518746376037598,
   1.791245460510254,
   2.0983541011810303,
   2.069085121154785,
   1.6603196859359741,
   1.043884515762329,
   1.4096081256866455,
   1

In [7]:
evaluation(model, test)

convert squad examples to features: 100%|██████████| 10/10 [00:00<00:00, 235.54it/s]
add example index and unique id: 100%|██████████| 10/10 [00:00<00:00, 8559.80it/s]


Running Evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

model's result:  {'correct': 4, 'similar': 6, 'incorrect': 0, 'eval_loss': -3.3036656379699707}
----------------------------------------------------
model's predictions : 
{'correct_text': {'00006': '', '00007': '', '00008': '$3.6 million', '000010': '$90 million'}, 'similar_text': {'00001': {'truth': 'dental products, including consumer oral hygiene and professional dental products; consumer products, including proprietary over-the-counter products and household products; and ethical pharmaceuticals.', 'predicted': '', 'question': 'What are the three categories of products developed by Block Drug Company, Inc.?'}, '00002': {'truth': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts', 'predicted': 'knitwear items of various kinds for work and casual wear such as sweatshirts, jogging suits, hooded jackets, headwear, and T-shirts.', 'question': 'What are the principal products of Tultex Corporation?'}, '00