Rationalise Scripts and Run Final Experiments #15

j6mes · 2017-12-05T11:40:26Z

To run

MLP: Train on FNC, Evaluate on FNC, Evaluate on FEVER 3 way
MLP: Train on FEVER with sampled negative pages, Test
MLP: Train on FEVER with IR negative pages, Test
DR: Final score for recall/precision/MRR
DR: Score using Oracle RTE component
RTE: Pre-trained model, evaluate on FEVER
RTE: Train on FEVER bodies, evaluate on FEVER

Extra:

BiDAF: Precision/Recall of pretrained model
BiDAF: FEVER Accuracy using pretrained model on DRQA Pages
RTE: Train on BiDAF retrieved model: evaluate P/R of BiDAF. Evaluate FEVER score

andreasvlachos · 2017-12-05T19:50:18Z

Just checking: we are not planning on learning DR right? That's fine, but would be good though to ensure that the DR component is good enough for the entailment part. I.e., given an oracle RTE part, what is the accuracy given the DR we have? Should be better than random baseline, right? A related question, is there some kind of threshold to restrict the documents we get from DR? Or do we take the top one only? (probably a good start assuming it gives us decent accuracy with an oracle RTE)

…

On Tue, 5 Dec 2017 at 11:40 James Thorne ***@***.***> wrote: To run - MLP: Train on FNC, Evaluate on FNC, Evaluate on FEVER 3 way - MLP: Train on FEVER with sampled negative pages, Test - MLP: Train on FEVER with IR negative pages, Test - DR: Final score for recall/precision/MRR - RTE: Pre-trained model, evaluate on FEVER - RTE: Train on FEVER bodies, evaluate on FEVER Extra: - BiDAF: Precision/Recall of pretrained model - BiDAF: FEVER Accuracy using pretrained model on DRQA Pages - RTE: Train on BiDAF retrieved model: evaluate P/R of BiDAF. Evaluate FEVER score — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#15>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABbUhWXtLlR0zvc3KPHpLqLoi0YC9mclks5s9SuqgaJpZM4Q2J6c> .

j6mes · 2017-12-06T12:14:20Z

The DR has no parameters, so there's nothing to learn.
Taking the top 5 articles at the moment. Will also try taking all articles above a threshold.

The only metric I've done is recall the recall, but testing with an oracle RTE is a good idea and easy for me to do too.

j6mes self-assigned this Dec 5, 2017

j6mes added Baseline Models P1 labels Dec 5, 2017

j6mes closed this as completed Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationalise Scripts and Run Final Experiments #15

Rationalise Scripts and Run Final Experiments #15

j6mes commented Dec 5, 2017 •

edited

Loading

andreasvlachos commented Dec 5, 2017 via email

j6mes commented Dec 6, 2017

Rationalise Scripts and Run Final Experiments #15

Rationalise Scripts and Run Final Experiments #15

Comments

j6mes commented Dec 5, 2017 • edited Loading

andreasvlachos commented Dec 5, 2017 via email

j6mes commented Dec 6, 2017

j6mes commented Dec 5, 2017 •

edited

Loading