Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect pretraining data format for Factual Adapter #2

Open
theblackcat102 opened this issue Jan 5, 2021 · 1 comment
Open

Incorrect pretraining data format for Factual Adapter #2

theblackcat102 opened this issue Jan 5, 2021 · 1 comment

Comments

@theblackcat102
Copy link

theblackcat102 commented Jan 5, 2021

I have followed the code here and generate all 3 tsv files under DisExtract/data/books/ALL18_2019jan02_[valid, train, test].tsv. However the format is not aligned with the required json file to run pretraining for Factual Adapter. The format of the tsv is also different than the required json format as well.

The content format of generated tsv file after executing python producer.py is as follows:

[Sentence 1]\t[Sentence 2]\t[Marker]
...

The required json file format should be as follows:

{ "sent" : "Sentence 1", "tokens": "sentence 2", "pairs" : [ ... ] }
...

Is there a conversion script that convert generated tsv format to json?

@theblackcat102 theblackcat102 changed the title In consistency in pretraining data format for Factual Adapter Incorrect pretraining data format for Factual Adapter Jan 5, 2021
@ttliu-kiwi
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants