Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable doc classification recipe to work with all text datasets
Summary: ## Summary - Updated datamodule to work with any torchtext dataset - No longer checking to see whether the dataset is an intance of the SST2Dataset - Updated `DocClassificationDataModuleConf` dataclass to take in user provided `columns` and `label_column` fields since different datasets have different column orderings - Updated tests to use patching for testing with mocked datasets similar to what is done in OSS for the [AmazonReviewPolarity dataset test](pytorch/text#1532) - Removed dataset test from torchrecipe since the torchtext repo unittests provide adequate coverage ## Followup Items - [ ] Update instantiation call for datasets to work with functional API as opposed to class API once the SST2 dataset has been migrated out of experimental ([reference GH issue](pytorch/text#1494)) Reviewed By: abhinavarora, mthrok, parmeet Differential Revision: D33775443 fbshipit-source-id: 1e6545949808ec5bd0e13cf3f9e7aaea08d68a59
- Loading branch information
1 parent
dd50d86
commit bef8c4e
Showing
9 changed files
with
82 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 0 additions & 1 deletion
1
torchrecipes/text/doc_classification/conf/datamodule/dataset/sst2_dataset.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
_target_: torchtext.experimental.datasets.sst2.SST2 | ||
root: ~/.torchtext/cache | ||
validate_hash: True |
4 changes: 4 additions & 0 deletions
4
torchrecipes/text/doc_classification/conf/datamodule/doc_classification_datamodule.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
46 changes: 0 additions & 46 deletions
46
torchrecipes/text/doc_classification/tests/test_doc_classification_dataset.py
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters