Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

pytext multi-label support (#729) #731

Closed
wants to merge 1 commit into from

Conversation

haowu666
Copy link

Summary:
Pull Request resolved: #729

Add multi-label support to Pytext training workflow including

  • LabelListTensorizer to read label list

  • MultiLabelSoftMarginLoss with n_hot_encoding encoding to calculate the loss of multi-label task
    (need to take care of the padded -1 in n_hot_encoding)

  • MultiLabelOutputLayer with predictions of all potential labels for each example

  • LabelListPrediction(NamedTuple) for an example including

    • label_scores: List[float]
    • predicted_label: List[int]
    • expected_label: List[int]
  • MultiLabelClassificationMetricReporter

    • compute_multi_label_classification_metrics with both predicted and expected labels in lists
    • compute_multi_label_soft_metrics with both predicted and expected labels in lists
  • handle the label / label list inputs at the same time in channel.py

  • In input arguments, users are able to choose

    • LabelTensorizer / LabelListTensorizer
    • BinaryClassificationOutputLayer/ MultiLabelOutputLayer / MulticlassOutputLayer
    • loss including BinaryCrossEntropyLoss, MultiLabelSoftMarginLoss, etc.
    • associated with loss, user can choose ClassificationMetricReporter / MultiLabelClassificationMetricReporter
  • define@register_adapter(from_version=12) v11_to_v12 in config_adapter.py to make ClassificationMetricReporter expansible

  • keep RECALL_AT_PRECISION_THREHOLDS as it was, users can change the values to test their data set (for example, add 0.7 into the list of thresholds)

It has been tested on single-label and multi-label examples for DocNN and BERT models

Differential Revision: D15777482

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 26, 2019
Summary:
Pull Request resolved: facebookresearch#729

Add multi-label support to Pytext training workflow including
* LabelListTensorizer to read label list
* MultiLabelSoftMarginLoss with n_hot_encoding encoding to calculate the loss of multi-label task
(need to take care of the padded -1 in n_hot_encoding)
* MultiLabelOutputLayer with predictions of all potential labels for each example
* LabelListPrediction(NamedTuple) for an example including
   * label_scores: List[float]
   * predicted_label: List[int]
   * expected_label: List[int]
* MultiLabelClassificationMetricReporter
  * compute_multi_label_classification_metrics with both predicted and expected labels in lists
  * compute_multi_label_soft_metrics with both predicted and expected labels in lists
* handle the label / label list inputs at the same time in channel.py
* In input arguments, users are able to choose
  * LabelTensorizer / LabelListTensorizer
  * BinaryClassificationOutputLayer/ MultiLabelOutputLayer / MulticlassOutputLayer
  * loss including BinaryCrossEntropyLoss, MultiLabelSoftMarginLoss, etc.
  * associated with loss, user can choose ClassificationMetricReporter / MultiLabelClassificationMetricReporter
* define@register_adapter(from_version=12) v11_to_v12 in config_adapter.py to make ClassificationMetricReporter expansible

* keep RECALL_AT_PRECISION_THREHOLDS as it was, users can change the values to test their data set (for example, add 0.7 into the list of thresholds)

It has been tested on single-label and multi-label examples for DocNN and BERT models

Differential Revision: D15777482

fbshipit-source-id: 7eff3b27eff076d6c36a0feaef223573608b0d5d
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1a44019.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants