Incorrect pretraining data format for Factual Adapter #2

theblackcat102 · 2021-01-05T00:42:34Z

I have followed the code here and generate all 3 tsv files under DisExtract/data/books/ALL18_2019jan02_[valid, train, test].tsv. However the format is not aligned with the required json file to run pretraining for Factual Adapter. The format of the tsv is also different than the required json format as well.

The content format of generated tsv file after executing python producer.py is as follows:

[Sentence 1]\t[Sentence 2]\t[Marker]
...

The required json file format should be as follows:

{ "sent" : "Sentence 1", "tokens": "sentence 2", "pairs" : [ ... ] }
...

Is there a conversion script that convert generated tsv format to json?

The text was updated successfully, but these errors were encountered:

ttliu-kiwi · 2022-12-14T14:17:33Z

+1

theblackcat102 changed the title ~~In consistency in pretraining data format for Factual Adapter~~ Incorrect pretraining data format for Factual Adapter Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect pretraining data format for Factual Adapter #2

Incorrect pretraining data format for Factual Adapter #2

theblackcat102 commented Jan 5, 2021 •

edited

ttliu-kiwi commented Dec 14, 2022

Incorrect pretraining data format for Factual Adapter #2

Incorrect pretraining data format for Factual Adapter #2

Comments

theblackcat102 commented Jan 5, 2021 • edited

ttliu-kiwi commented Dec 14, 2022

theblackcat102 commented Jan 5, 2021 •

edited