integration test for the text classification #216
Conversation
Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials. |
c8fee58
to
8464fae
Compare
I prefer to keep a local copy of data used in tests. Some metrics could change if we update the original data. |
We can keep a very small local sample of the data and assert that the model
overfits.
El dom., 24 may. 2020 17:51, Francisco Aranda <notifications@github.com>
escribió:
… Just realized that it this is useless for the CI, since i get the data
from our S3 ... maybe we can discuss briefly where to put this data, since
it is the same as for our tutorials.
I prefer to keep a local copy of data used in tests. Some metrics could
change if we update the original data.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#216 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIOJJZML7LK5XVMV6WYA6LRTE7AZANCNFSM4NIAPLJA>
.
|
@dvsrepo The idea was to be able to detect even subtle changes that maybe go undetected when completely overfitting a small data set. Maybe the question is to find a good balance between data set size and being able to catch subtle changes. |
Your filtered dataset could be enough since take no quite training time. If I remember was set in code to 2000 records, isn't it ? |
6b0d54e
to
6ac93df
Compare
Ok, i think this can go in.
@frascuchon If you are ok with it, i will add this test in a specific |
This PR adds an integration test using the
TextClassification
head.On my machine it takes <1 min and the numbers are reproducible.
It covers only a small part of the functionality, but with this test we would have caught the embedding bug for example.
The idea is that with time we extend the test to cover more functionalities, and maybe it can serve as blue print for other integration tests.