integration test for the text classification #216

dcfidalgo · 2020-05-22T17:08:48Z

This PR adds an integration test using the TextClassification head.

On my machine it takes <1 min and the numbers are reproducible.
It covers only a small part of the functionality, but with this test we would have caught the embedding bug for example.
The idea is that with time we extend the test to cover more functionalities, and maybe it can serve as blue print for other integration tests.

dcfidalgo · 2020-05-22T17:16:54Z

Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials.

tests/integration/test_text_classification.py

frascuchon · 2020-05-24T15:51:28Z

Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials.

I prefer to keep a local copy of data used in tests. Some metrics could change if we update the original data.

dvsrepo · 2020-05-24T15:56:57Z

We can keep a very small local sample of the data and assert that the model overfits. El dom., 24 may. 2020 17:51, Francisco Aranda <notifications@github.com> escribió:

…

Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials. I prefer to keep a local copy of data used in tests. Some metrics could change if we update the original data. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#216 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIOJJZML7LK5XVMV6WYA6LRTE7AZANCNFSM4NIAPLJA> .

dcfidalgo · 2020-05-25T12:27:22Z

We can keep a very small local sample of the data and assert that the model
overfits.

@dvsrepo The idea was to be able to detect even subtle changes that maybe go undetected when completely overfitting a small data set. Maybe the question is to find a good balance between data set size and being able to catch subtle changes.
@frascuchon What is the maximum size of the data set i should include as local copy in the test?

frascuchon · 2020-05-25T20:49:36Z

@frascuchon What is the maximum size of the data set i should include as local copy in the test?

Your filtered dataset could be enough since take no quite training time. If I remember was set in code to 2000 records, isn't it ?

dcfidalgo · 2020-05-27T22:25:03Z

Ok, i think this can go in.
@frascuchon i added a few asserts regarding the vocab, not sure if you had some more thorough checks in mind. We could add them here or in a follow-up PR.

It would be nice to programatically create the pipeline configuration PipelineConfiguration and the save as a yaml file.

@frascuchon If you are ok with it, i will add this test in a specific def test_pipeline_configuration in another PR, in order not to delay further this one ...

dcfidalgo requested review from frascuchon and dvsrepo May 22, 2020 17:08

dcfidalgo changed the title ~~integration test for the text classification~~ WIP: integration test for the text classification May 22, 2020

dcfidalgo marked this pull request as draft May 22, 2020 17:11

dcfidalgo force-pushed the feat/integration_text_classifier_test branch from c8fee58 to 8464fae Compare May 22, 2020 17:33

frascuchon reviewed May 24, 2020

View reviewed changes

tests/integration/test_text_classification.py Show resolved Hide resolved

tests/integration/test_text_classification.py Show resolved Hide resolved

Add first integration test for the text classification head

6ac93df

dcfidalgo force-pushed the feat/integration_text_classifier_test branch from 6b0d54e to 6ac93df Compare May 27, 2020 18:13

add vocab asserts

9812bbb

dcfidalgo changed the title ~~WIP: integration test for the text classification~~ integration test for the text classification May 27, 2020

dcfidalgo marked this pull request as ready for review May 27, 2020 18:31

dcfidalgo changed the title ~~integration test for the text classification~~ WIP: integration test for the text classification May 27, 2020

dcfidalgo marked this pull request as draft May 27, 2020 18:32

dvsrepo approved these changes May 27, 2020

View reviewed changes

dcfidalgo changed the title ~~WIP: integration test for the text classification~~ integration test for the text classification May 27, 2020

dcfidalgo marked this pull request as ready for review May 27, 2020 22:20

frascuchon merged commit 66b9bb7 into master May 28, 2020

frascuchon deleted the feat/integration_text_classifier_test branch May 28, 2020 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration test for the text classification #216

integration test for the text classification #216

dcfidalgo commented May 22, 2020

dcfidalgo commented May 22, 2020

frascuchon commented May 24, 2020 •

edited

dvsrepo commented May 24, 2020 via email

dcfidalgo commented May 25, 2020 •

edited

frascuchon commented May 25, 2020 •

edited

dcfidalgo commented May 27, 2020 •

edited

integration test for the text classification #216

integration test for the text classification #216

Conversation

dcfidalgo commented May 22, 2020

dcfidalgo commented May 22, 2020

frascuchon commented May 24, 2020 • edited

dvsrepo commented May 24, 2020 via email

dcfidalgo commented May 25, 2020 • edited

frascuchon commented May 25, 2020 • edited

dcfidalgo commented May 27, 2020 • edited

frascuchon commented May 24, 2020 •

edited

dcfidalgo commented May 25, 2020 •

edited

frascuchon commented May 25, 2020 •

edited

dcfidalgo commented May 27, 2020 •

edited