<a href="https://colab.research.google.com/github/isikus/qualification-project/blob/master/notebooks/4.%20Model%20reevaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The T5 Authors

Licensed under the Apache License, Version 2.0 (the "License");

In [0]:
# Copyright 2019 The T5 Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Model training
In this notebook we reevaluate the most successful model from our research using the last checkpoint. This notebook is based on [this](github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb) example notebook from T5 authors.

**Please note the following:**
1. A connection to a Google Cloud Storage bucket is required to train the model.

## Imports and necessary dependencies

In [0]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/48/35/ad2c5b1b8f99feaaf9d7cdadaeef261f098c6e1a6a2935d4d07662a6b780/transformers-2.11.0-py3-none-any.whl (674kB)
[K     |████████████████████████████████| 675kB 3.5MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 16.2MB/s 
Collecting tokenizers==0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/14/e5/a26eb4716523808bb0a799fcfdceb6ebf77a18169d9591b2f46a9adb87d9/tokenizers-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 40.7MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |███

In [0]:
import os
import re

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

import pandas as pd

from transformers import T5Tokenizer

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [0]:
from contextlib import contextmanager, redirect_stderr, redirect_stdout
from os import devnull

@contextmanager
def suppress_stdout_stderr():
    """A context manager that redirects stdout and stderr to devnull"""
    with open(devnull, 'w') as fnull:
        with redirect_stderr(fnull) as err, redirect_stdout(fnull) as out:
            yield (err, out)

In [0]:
import tarfile
from tqdm.auto import tqdm

tqdm.pandas()

In [0]:
tokenizer = T5Tokenizer.from_pretrained('t5-3b')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




## Set Up

<h3><a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a>  &nbsp;&nbsp;Evaluate on TPU</h3>




   1. Create a Cloud Storage bucket for your data and model checkpoints at http://console.cloud.google.com/storage, and fill in the `BASE_DIR` parameter in the following form. There is a [free tier](https://cloud.google.com/free/) if you do not yet have an account.
 
   1. On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
   1. Run the following cell and follow instructions to:
    *  Set up a Colab TPU running environment
    *   Verify that you are connected to a TPU device
    *   Upload your credentials to TPU to access your GCS bucket


In [0]:
print("Installing dependencies...")
%tensorflow_version 2.x
!pip install -q t5

import functools
import os
import time
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import tensorflow.compat.v1 as tf
import tensorflow_datasets as tfds

import t5

BASE_DIR = "gs://ml-bucket-isikus/t5-base-model" #@param { type: "string" }
if not BASE_DIR or BASE_DIR == "gs://":
  raise ValueError("You must enter a BASE_DIR.")
DATA_DIR = os.path.join(BASE_DIR, "data")
MODELS_DIR = os.path.join(BASE_DIR, "models")
ON_CLOUD = True


if ON_CLOUD:
  print("Setting up GCS access...")
  import tensorflow_gcs_config
  from google.colab import auth
  # Set credentials for GCS reading/writing from Colab and TPU.
  TPU_TOPOLOGY = "2x2"
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    TPU_ADDRESS = tpu.get_master()
    print('Running on TPU:', TPU_ADDRESS)
  except ValueError:
    raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
  auth.authenticate_user()
  tf.config.experimental_connect_to_host(TPU_ADDRESS)
  tensorflow_gcs_config.configure_gcs_from_colab_auth()

tf.disable_v2_behavior()

# Improve logging.
from contextlib import contextmanager
import logging as py_logging

if ON_CLOUD:
  tf.get_logger().propagate = False
  py_logging.root.setLevel('INFO')

@contextmanager
def tf_verbosity_level(level):
  og_level = tf.logging.get_verbosity()
  tf.logging.set_verbosity(level)
  yield
  tf.logging.set_verbosity(og_level)

In [0]:
from google.colab import auth
auth.authenticate_user()
project_id = 'better-record'  # @param {"type": "string"}
bucket_name = 'ml-bucket-isikus' # @param {"type": "string"}
model_dir = 't5-base-model' # @param {"type": "string"}
!gcloud config set project {project_id}
!gsutil ls

### Try to add reproducibility

In [0]:
import random
import numpy as np

def set_seed(seed):
  random.seed(seed)
  np.random.seed(seed)
  tf.compat.v1.set_random_seed(seed)

set_seed(42)

## Transfer the checkpoint to your Google Cloud Storage

By default it is assumed that in this document you reevaluate the example produced by out training. Please change the `run` section and optinally delete the following cells if you wish to test your retrained model instead.

In [0]:
!wget https://storage.googleapis.com/ml-bucket-isikus/t5-base-model/models/3B-3b-fuse/model.ckpt-1025200.meta
!wget https://storage.googleapis.com/ml-bucket-isikus/t5-base-model/models/3B-3b-fuse/model.ckpt-1025200.index
!wget https://storage.googleapis.com/ml-bucket-isikus/t5-base-model/models/3B-3b-fuse/model.ckpt-1025200.data-00000-of-00002
!wget https://storage.googleapis.com/ml-bucket-isikus/t5-base-model/models/3B-3b-fuse/model.ckpt-1025200.data-00001-of-00002

In [0]:
!gsutil -m cp -r model.ckpt-1025200.meta gs://{bucket_name}/{model_dir}/models/3B-3b-fuse/model.ckpt-1025200.meta
!gsutil -m cp -r model.ckpt-1025200.index gs://{bucket_name}/{model_dir}/models/3B-3b-fuse/model.ckpt-1025200.index
!gsutil -m cp -r model.ckpt-1025200.data-00000-of-00002 gs://{bucket_name}/{model_dir}/models/3B-3b-fuse/model.ckpt-1025200.data-00000-of-00002
!gsutil -m cp -r model.ckpt-1025200.data-00001-of-00002 gs://{bucket_name}/{model_dir}/models/3B-3b-fuse/model.ckpt-1025200.data-00001-of-00002

## Get the evaluation data

Here we download the test files for a number of competitions and then prepare them to a model-supported TSV format.

In [0]:
!wget https://www.cl.cam.ac.uk/research/nl/bea2019st/data/ABCN.bea19.test.orig 

--2020-06-03 06:45:42--  https://www.cl.cam.ac.uk/research/nl/bea2019st/data/ABCN.bea19.test.orig
Resolving www.cl.cam.ac.uk (www.cl.cam.ac.uk)... 128.232.0.20, 2a05:b400:110::80:14
Connecting to www.cl.cam.ac.uk (www.cl.cam.ac.uk)|128.232.0.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 437326 (427K) [text/plain]
Saving to: ‘ABCN.bea19.test.orig’


2020-06-03 06:45:43 (997 KB/s) - ‘ABCN.bea19.test.orig’ saved [437326/437326]



In [0]:
with open("ABCN.bea19.test.orig", "r", encoding="utf-8") as testdf:
  bea = [sent.split(" ") for sent in testdf.read().split("\n")][:-1]
  lenbea = len(bea)

In [0]:
!pip install mosestokenizer

Collecting mosestokenizer
  Downloading https://files.pythonhosted.org/packages/4b/b3/c0af235b16c4f44a2828ef017f7947d1262b2646e440f85c6a2ff26a8c6f/mosestokenizer-1.1.0.tar.gz
Collecting openfile
  Downloading https://files.pythonhosted.org/packages/93/e6/805db6867faacb488b44ba8e0829ef4de151dd0499f3c5da5f4ad11698a7/openfile-0.0.7-py3-none-any.whl
Collecting uctools
  Downloading https://files.pythonhosted.org/packages/63/6e/15f479cb4d1168f07d875be369ffc08fa0f900419f71a379aeb2882a775d/uctools-1.2.1.tar.gz
Collecting toolwrapper
  Downloading https://files.pythonhosted.org/packages/41/7b/34bf8fb69426d8a18bcc61081e9d126f4fcd41c3c832072bef39af1602cd/toolwrapper-2.1.0.tar.gz
Building wheels for collected packages: mosestokenizer, uctools, toolwrapper
  Building wheel for mosestokenizer (setup.py) ... [?25l[?25hdone
  Created wheel for mosestokenizer: filename=mosestokenizer-1.1.0-cp36-none-any.whl size=49120 sha256=865219d729041bb3d8a63bdd856c87c136ea46ef719c2ecb6cc54ff8ed148979
  Stored i

In [0]:
from mosestokenizer import *

with MosesDetokenizer('en') as detokenize:
  general_list = [detokenize(sent) for sent in bea]

In [0]:
!git clone https://github.com/keisks/jfleg
%cd jfleg

Cloning into 'jfleg'...
remote: Enumerating objects: 170, done.[K
remote: Total 170 (delta 0), reused 0 (delta 0), pack-reused 170[K
Receiving objects: 100% (170/170), 777.12 KiB | 5.98 MiB/s, done.
Resolving deltas: 100% (73/73), done.
/content/jfleg


In [0]:
with open("./test/test.src", "r", encoding="utf-8") as testdf:
  jfleg = [sent.split(" ") for sent in testdf.read().split("\n")][:-1]
  lenfleg = len(jfleg)

In [0]:
with MosesDetokenizer('en') as detokenize:
  general_list += [detokenize(sent) for sent in jfleg]

In [0]:
%cd ../

/content


In [0]:
!wget https://www.comp.nus.edu.sg/~nlp/conll14st/conll14st-test-data.tar.gz
!tar -xzf conll14st-test-data.tar.gz

--2020-06-03 06:45:56--  https://www.comp.nus.edu.sg/~nlp/conll14st/conll14st-test-data.tar.gz
Resolving www.comp.nus.edu.sg (www.comp.nus.edu.sg)... 45.60.31.225
Connecting to www.comp.nus.edu.sg (www.comp.nus.edu.sg)|45.60.31.225|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 643482 (628K) [application/x-gzip]
Saving to: ‘conll14st-test-data.tar.gz’


2020-06-03 06:45:59 (281 KB/s) - ‘conll14st-test-data.tar.gz’ saved [643482/643482]



In [0]:
with open("./conll14st-test-data/noalt/official-2014.combined.m2", "r", encoding="utf-8") as testdf:
  conll14 = [sent[2:].split(" ") for sent in testdf.read().split("\n") if sent.startswith("S")]
  lenconll = len(conll14)

In [0]:
with MosesDetokenizer('en') as detokenize:
  general_list += [detokenize(sent) for sent in conll14]

In [0]:
outwrite = ""

for entry in general_list:
  outwrite += entry + "\t" + entry + "\n"

outwrite = outwrite[:-1]

In [0]:
assert len(general_list) == len(outwrite.split("\n"))

In [0]:
with open("testeval.tsv", "w", encoding="utf-8") as outfile:
  outfile.write(outwrite)

In [0]:
!gsutil -m cp -r testeval.tsv gs://{bucket_name}/{model_dir}/data/testeval.tsv

Copying file://testeval.tsv [Content-Type=text/tab-separated-values]...
-
Operation completed over 1 objects/1.3 MiB.                                      


## Create new Tasks and Mixture

In [0]:
import gzip
import json

tsv_path = {
    "train": os.path.join(DATA_DIR, "testeval.tsv"),
    "validation": os.path.join(DATA_DIR, "correct-val.tsv")
}

In [0]:
def corr_dataset_fn(split, shuffle_files=False):
  # We only have one file for each split.
  del shuffle_files

  # Load lines from the text file as examples.
  ds = tf.data.TextLineDataset(tsv_path[split])
  # Split each "<orig_text>\t<corr_text>" example into (orig_text, corr_text) tuple.
  ds = ds.map(
      functools.partial(tf.io.decode_csv, record_defaults=["", ""],
                        field_delim="\t", use_quote_delim=False),
      num_parallel_calls=tf.data.experimental.AUTOTUNE)
  # Map each tuple to a {"orig_text": ... "corr_text": ...} dict.
  ds = ds.map(lambda *ex: dict(zip(["orig_text", "corr_text"], ex)))
  return ds

print("A few raw validation examples...")
for ex in tfds.as_numpy(corr_dataset_fn("validation").take(5)):
  print(ex)

In [0]:
def correction_preprocessor(ds):
  def normalize_text(text):
    """Remove quotes from a TensorFlow string."""
    text = tf.strings.regex_replace(text,"'(.*)'", r"\1")
    return text

  def to_inputs_and_targets(ex):
    """Map {"orig_text": ..., "corr_text": ...}->{"inputs": ..., "targets": ...}."""
    return {
        "inputs":
             tf.strings.join(
                 ["correction: ", normalize_text(ex["orig_text"])]),
        "targets": normalize_text(ex["corr_text"])
    }
  return ds.map(to_inputs_and_targets, 
                num_parallel_calls=tf.data.experimental.AUTOTUNE)

Finally, we put everything together to create a `Task`.

In [0]:
t5.data.TaskRegistry.add(
    "correct_3b",
    # Supply a function which returns a tf.data.Dataset.
    dataset_fn=corr_dataset_fn,
    splits=["train", "validation"],
    # Supply a function which preprocesses text from the tf.data.Dataset.
    text_preprocessor=[correction_preprocessor],
    # Use the same vocabulary that we used for pre-training.
    sentencepiece_model_path=t5.data.DEFAULT_SPM_PATH,
    # Lowercase targets before computing metrics.
    # not needed here as we prepare eval data
    # postprocess_fn=t5.data.postprocessors.lower_text, 
    # We'll use accuracy as our evaluation metric.
    metric_fns=[t5.evaluation.metrics.accuracy,
                t5.evaluation.metrics.bleu,
                t5.evaluation.metrics.rouge]
)

In [0]:
t5.data.MixtureRegistry.remove("correct_3b_all")
t5.data.MixtureRegistry.add(
    "correct_3b_all",
    ["correct_3b"],
     default_rate=1.0
)

## Define the Model
Please provide the name of the model here. If the name of the trained model was `3b-retrain`, then change the `run` string to this. To use the `3b-fuse` pretrained example, leave the `run` string as it is.

In [0]:
run = "3b-fuse"  # @param {"type": "string"}

In [0]:
MODEL_SIZE = "3B" #@param["small", "base", "large", "3B", "11B"]
# Public GCS path for T5 pre-trained model checkpoints
BASE_PRETRAINED_DIR = "gs://t5-data/pretrained_models"
PRETRAINED_DIR = os.path.join(BASE_PRETRAINED_DIR, MODEL_SIZE)
if run not in [None, ""]:
    MODEL_DIR = os.path.join(MODELS_DIR, MODEL_SIZE+"-"+run)
else:
    MODEL_DIR = os.path.join(MODELS_DIR, MODEL_SIZE)

if ON_CLOUD and MODEL_SIZE == "3B":
  tf.logging.warn(
      "The `3B` model is too large to use with the 5GB GCS free tier. "
      "Make sure you have at least 25GB on GCS before continuing."
  )
elif ON_CLOUD and MODEL_SIZE == "11B":
  raise ValueError(
      "The `11B` parameter is too large to fine-tune on the `v2-8` TPU "
      "provided by Colab. Please comment out this Error if you're running "
      "on a larger TPU."
  )

# Set parallelism and batch size to fit on v2-8 TPU (if possible).
# Limit number of checkpoints to fit within 5GB (if possible).
model_parallelism, train_batch_size, keep_checkpoint_max = {
    "small": (1, 128, 16),
    "base": (2, 64, 8),
    "large": (8, 32, 4),
    "3B": (8, 8, 1),
    "11B": (8, 8, 1)}[MODEL_SIZE]

tf.io.gfile.makedirs(MODEL_DIR)
# The models from our paper are based on the Mesh Tensorflow Transformer.
model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=TPU_ADDRESS,
    tpu_topology=TPU_TOPOLOGY,
    model_parallelism=model_parallelism,
    batch_size=train_batch_size,
    sequence_length={"inputs": 512, "targets": 512},
    learning_rate_schedule=0.0025,
    save_checkpoints_steps=5000,
    keep_checkpoint_max=keep_checkpoint_max if ON_CLOUD else None,
    iterations_per_loop=100,
)

## Evaluate

We now evaluate on the test inputs. It is a bit unorthodox decision as inferring would be preferrable, but the model is too big to efficiently produce results on CPU, so only TPU evaluation could suit us, that is why we use TPU evaluation for the task.

In [0]:
%%time

# Use a larger batch size for evaluation, which requires less memory.
model.batch_size = train_batch_size * 4
model.eval(
    mixture_or_task_name="correct_3b_all",
    checkpoint_steps=-1  # use latest checkpoint
)

## Get the scores

Now we use the evaluation results to get our scores. Please note that the score for BEA-2019 is calculated via Codalab system, so you have to upload the produced .zip file manually [here](https://competitions.codalab.org/competitions/20229#participate) (in order to do so, please uncomment the corresponding cells).

#### Imports and dependencies

In [0]:
from copy import copy

In [0]:
!pip install spacy==1.9.0
!python -m spacy download -d en_core_web_sm-1.2.0
!python -m spacy link en_core_web_sm en

Collecting spacy==1.9.0
[?25l  Downloading https://files.pythonhosted.org/packages/63/ce/afee53c365617e5f3e58825d71421bce14949a15f7150742d2a7b8859c53/spacy-1.9.0.tar.gz (3.4MB)
[K     |████████████████████████████████| 3.4MB 3.2MB/s 
Collecting murmurhash<0.27,>=0.26
  Downloading https://files.pythonhosted.org/packages/ff/53/1f428861e59c2382e22b8839d03cc315e1a7633a827497b3d389b8d8772d/murmurhash-0.26.4.tar.gz
Collecting cymem<1.32,>=1.30
  Downloading https://files.pythonhosted.org/packages/a5/0f/d29aa68c55db37844c77e7e96143bd96651fd0f4453c9f6ee043ac846b77/cymem-1.31.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting preshed<2.0.0,>=1.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/12/88/57a818051f3d71e800bfb7ba4df56d3ea5793482ef11f1d2109b726f3bac/preshed-1.0.1-cp36-cp36m-manylinux1_x86_64.whl (80kB)
[K     |████████████████████████████████| 81kB 8.3MB/s 
[?25hCollecting thinc<6.6.0,>=6.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/f7/9b/78fab962e0c8b5


    Downloading en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz

Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz (52.2MB)
[K    100% |████████████████████████████████| 52.2MB 55.4MB/s 
Installing collected packages: en-core-web-sm
  Found existing installation: en-core-web-sm 2.2.5
    Uninstalling en-core-web-sm-2.2.5:
      Successfully uninstalled en-core-web-sm-2.2.5
  Running setup.py install for en-core-web-sm ... [?25ldone
[?25hSuccessfully installed en-core-web-sm-1.2.0

[93m    Linking successful[0m

    /usr/local/lib/python3.6/dist-packages/en_core_web_sm/en_core_web_sm-1.2.0
    --> /usr/local/lib/python3.6/dist-packages/spacy/data/en

    You can now load the model via spacy.load('en').



In [0]:
import spacy

nlp = spacy.load("en", disable=["tagger", "parser", 'ner', 'textcat', 'lemmatizer'])
tokenize = lambda snt: [str(x) for x in nlp(snt)]

#### Download and process the results

In [0]:
os.environ["BUCKET_NAME"] = bucket_name
os.environ["MODEL_DIR"] = model_dir
os.environ["RUN"] = run

In [0]:
!gsutil -m cp -r gs://$BUCKET_NAME/$MODEL_DIR/models/3B-$RUN/validation_eval/correct_3b_1025200_predictions evalres.bin

Copying gs://ml-bucket-isikus/t5-base-model/models/3B-3b-fuse/validation_eval/correct_3b_1025200_predictions...
- [1/1 files][  6.6 MiB/  6.6 MiB] 100% Done                                    
Operation completed over 1 objects/6.6 MiB.                                      


In [0]:
with open("evalres.bin", "rb") as inf:
  lines = inf.read().decode().split("\n")[:-1]

In [0]:
corr_test_nospc = copy(lines[:lenbea])
lenbea == len(corr_test_nospc)

True

In [0]:
corr_jfleg_nospc = copy(lines[lenbea:lenbea + lenfleg])
lenfleg == len(corr_jfleg_nospc)

True

In [0]:
corr_conll14_nospc = copy(lines[lenbea + lenfleg + 4:])
lenconll == len(corr_conll14_nospc)

False

#### Results for BEA-2019
 Our evaluaton results can be found [here](https://competitions.codalab.org/my/competition/submission/626818/detailed_results/).

In [0]:
outstr = ""

for line in corr_test_nospc:
  line = line.replace("\n", " ")
  outstr += " ".join(tokenize(line)) + "\n"

with open("ABCN.bea19.test.corr", "w", encoding="utf-8") as outtest:
  outtest.write(outstr)

In [44]:
!zip bea-test.zip ABCN.bea19.test.corr

  adding: ABCN.bea19.test.corr (deflated 66%)


In [0]:
# from google.colab import files
# files.download("bea-test.zip")

In [0]:
# !cp bea-test.zip /content/gdrive/My\ Drive

#### Results for JFLEG

In [47]:
%cd jfleg

/content/jfleg


In [0]:
outstr = ""

for line in corr_jfleg_nospc:
  line = line.replace("\n", " ")
  outstr += " ".join(tokenize(line)) + "\n"

with open("test.nospc.res", "w", encoding="utf-8") as outtest:
  outtest.write(outstr)

In [0]:
!python ./eval/gleu.py -r ./test/test.ref[0-3] -s ./test/test.src --hyp test.nospc.res

Running GLEU...
test.nospc.res
[['0.532931', '0.007896', '(0.517,0.548)']]


In [50]:
%cd ../

/content


#### Results for CoNLL-14

In [0]:
outstr = ""

for line in corr_conll14_nospc:
  line = line.replace("\n", " ")
  outstr += " ".join(tokenize(line)) + "\n"

with open("conll14_nospc.txt", "w", encoding="utf-8") as outtest:
  outtest.write(outstr)

In [0]:
!wget https://www.comp.nus.edu.sg/~nlp/conll14st/conll14st-test-data.tar.gz
!tar -xzf conll14st-test-data.tar.gz

--2020-06-02 00:11:21--  https://www.comp.nus.edu.sg/~nlp/conll14st/conll14st-test-data.tar.gz
Resolving www.comp.nus.edu.sg (www.comp.nus.edu.sg)... 45.60.31.225
Connecting to www.comp.nus.edu.sg (www.comp.nus.edu.sg)|45.60.31.225|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 643482 (628K) [application/x-gzip]
Saving to: ‘conll14st-test-data.tar.gz’


2020-06-02 00:11:24 (392 KB/s) - ‘conll14st-test-data.tar.gz’ saved [643482/643482]



In [0]:
!wget https://www.comp.nus.edu.sg/~nlp/sw/m2scorer.tar.gz
!tar -xzf m2scorer.tar.gz

--2020-06-02 00:11:47--  https://www.comp.nus.edu.sg/~nlp/sw/m2scorer.tar.gz
Resolving www.comp.nus.edu.sg (www.comp.nus.edu.sg)... 45.60.31.225
Connecting to www.comp.nus.edu.sg (www.comp.nus.edu.sg)|45.60.31.225|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22836 (22K) [application/x-gzip]
Saving to: ‘m2scorer.tar.gz’


2020-06-02 00:11:48 (82.9 KB/s) - ‘m2scorer.tar.gz’ saved [22836/22836]



In [0]:
!python2 ./m2scorer/scripts/m2scorer.py ./fuse_1025200_conll14_nospc.txt ./conll14st-test-data/noalt/official-2014.combined.m2

Precision   : 0.5836
Recall      : 0.2350
F_0.5       : 0.4501
