Fix QA task preprocessing #19

RKorzeniowski · 2020-11-08T08:58:50Z

Hi,
very cool lib. Just wanted to say that pre_process_squad function is not working correctly when following docs. There are two problems when nlp package is used like that nlp.load_dataset('squad_v2').

Column names differ, to be exact "anwsers" and "anwser_text".
Answers are given in dict(list(str)) format and tokenization that sets end and start token targets works as if it was dict(str). This ends up setting all targets as (0,0).

I had to fix that for my usecase so if you want I can make a PR with fixes. Let me know if there are things that I should do before like running tests

The text was updated successfully, but these errors were encountered:

ohmeow · 2020-11-08T23:09:35Z

Yah if you want to make a PR go for it. The project is built of on nbdev and so the process for developing and submitting PRs is the same as for libraries like fastai. See https://docs.fast.ai/dev-setup. In particular, make sure you run `nbdev_install_git_hooks` right after you git clone the library. If you want to add some tests that would be great too. Check out the nbdev docs for how to do that and work on any project based on it: https://nbdev.fast.ai/. Thanks and lmk if you have any questions.

…

-wg

On Sun, Nov 8, 2020 at 12:59 AM RKorzeniowski ***@***.***> wrote: Hi, very cool lib. Just wanted to say that pre_process_squad <https://github.com/ohmeow/blurr/blob/master/blurr/data/question_answering.py> function is not working correctly when following docs <https://ohmeow.github.io/blurr/modeling-question-answering/>. There are two problems when huggingface datasets (updated nlp package) is used like that nlp.load_dataset('squad_v2') <https://huggingface.co/docs/datasets/package_reference/loading_methods.html> . - column names differ, to be exact "anwsers" and "anwser_text". - answers are given in dict(list(str)) format and tokenization that sets end and start token targets works as if it was dict(str). This ends up setting all targets as (0,0). I had to fix that for my usecase so if you want I can make a PR with fixes. Let me know if there are things that I should do before like running tests — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#19>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADNMAON377IMLFRDBSPQTSOZMVNANCNFSM4TOGA72A> .

ohmeow · 2020-12-26T20:41:51Z

I think this is fixed now so I'm closing it out. If you're still seeing issues, feel free to reopen.

RKorzeniowski changed the title ~~Fix QA~~ Fix QA task preprocessing Nov 8, 2020

ohmeow closed this as completed Dec 26, 2020

ocm248 mentioned this issue Jul 9, 2021

QA task preprocessing #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix QA task preprocessing #19

Fix QA task preprocessing #19

RKorzeniowski commented Nov 8, 2020 •

edited

Loading

ohmeow commented Nov 8, 2020 via email

ohmeow commented Dec 26, 2020

Fix QA task preprocessing #19

Fix QA task preprocessing #19

Comments

RKorzeniowski commented Nov 8, 2020 • edited Loading

ohmeow commented Nov 8, 2020 via email

ohmeow commented Dec 26, 2020

RKorzeniowski commented Nov 8, 2020 •

edited

Loading