Skip to content
This repository has been archived by the owner on Jul 15, 2022. It is now read-only.

integration test for the text classification #216

Merged
merged 2 commits into from May 28, 2020

Conversation

dcfidalgo
Copy link
Contributor

This PR adds an integration test using the TextClassification head.

On my machine it takes <1 min and the numbers are reproducible.
It covers only a small part of the functionality, but with this test we would have caught the embedding bug for example.
The idea is that with time we extend the test to cover more functionalities, and maybe it can serve as blue print for other integration tests.

@dcfidalgo dcfidalgo changed the title integration test for the text classification WIP: integration test for the text classification May 22, 2020
@dcfidalgo dcfidalgo marked this pull request as draft May 22, 2020 17:11
@dcfidalgo
Copy link
Contributor Author

Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials.

@dcfidalgo dcfidalgo force-pushed the feat/integration_text_classifier_test branch from c8fee58 to 8464fae Compare May 22, 2020 17:33
@frascuchon
Copy link
Member

frascuchon commented May 24, 2020

Just realized that it this is useless for the CI, since i get the data from our S3 ... maybe we can discuss briefly where to put this data, since it is the same as for our tutorials.

I prefer to keep a local copy of data used in tests. Some metrics could change if we update the original data.

@dvsrepo
Copy link
Member

dvsrepo commented May 24, 2020 via email

@dcfidalgo
Copy link
Contributor Author

dcfidalgo commented May 25, 2020

We can keep a very small local sample of the data and assert that the model
overfits.

@dvsrepo The idea was to be able to detect even subtle changes that maybe go undetected when completely overfitting a small data set. Maybe the question is to find a good balance between data set size and being able to catch subtle changes.
@frascuchon What is the maximum size of the data set i should include as local copy in the test?

@frascuchon
Copy link
Member

frascuchon commented May 25, 2020

@frascuchon What is the maximum size of the data set i should include as local copy in the test?

Your filtered dataset could be enough since take no quite training time. If I remember was set in code to 2000 records, isn't it ?

@dcfidalgo dcfidalgo force-pushed the feat/integration_text_classifier_test branch from 6b0d54e to 6ac93df Compare May 27, 2020 18:13
@dcfidalgo dcfidalgo changed the title WIP: integration test for the text classification integration test for the text classification May 27, 2020
@dcfidalgo dcfidalgo marked this pull request as ready for review May 27, 2020 18:31
@dcfidalgo dcfidalgo changed the title integration test for the text classification WIP: integration test for the text classification May 27, 2020
@dcfidalgo dcfidalgo marked this pull request as draft May 27, 2020 18:32
@dcfidalgo dcfidalgo changed the title WIP: integration test for the text classification integration test for the text classification May 27, 2020
@dcfidalgo dcfidalgo marked this pull request as ready for review May 27, 2020 22:20
@dcfidalgo
Copy link
Contributor Author

dcfidalgo commented May 27, 2020

Ok, i think this can go in.
@frascuchon i added a few asserts regarding the vocab, not sure if you had some more thorough checks in mind. We could add them here or in a follow-up PR.

It would be nice to programatically create the pipeline configuration PipelineConfiguration and the save as a yaml file.

@frascuchon If you are ok with it, i will add this test in a specific def test_pipeline_configuration in another PR, in order not to delay further this one ...

@frascuchon frascuchon merged commit 66b9bb7 into master May 28, 2020
@frascuchon frascuchon deleted the feat/integration_text_classifier_test branch May 28, 2020 06:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants