-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding RankMatchFailure metric #184
Conversation
using converting y_true to 1-hot for aux loss for cat cross entropy
- Add basic cross entropy (Still not fully working)
@mohazahran , I added the tests too |
dtype : str, optional | ||
data type of the metric result. | ||
rank : Tensor object | ||
2D tensor representing ranks/rankitions of records in a query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a typo in 'rankitions'
@@ -36,18 +36,24 @@ def _loss_fn(y_true, y_pred): | |||
mask : [batch_size, num_classes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only auto formatting
metrics: List[Union[str, Type[Metric]]], | ||
feature_config: FeatureConfig, | ||
metadata_features: Dict, | ||
for_aux_output: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be added to the Parameters list doc string.
|
||
from typing import Optional, Dict | ||
|
||
class CombinationMetric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me, why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to distinguish single label metrics from multi-label metrics. Multi-label metrics should be computed only for one of the outputs
@@ -41,6 +43,9 @@ def get_metrics_impl( | |||
metrics_impl: List[Union[Metric, str]] = list() | |||
|
|||
for metric in metrics: | |||
if isinstance(metric, ranking_metrics_impl.CombinationMetric) and for_aux_output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it's no allowing RankMatchFailure to be a metric for aux_ouput, but it allows the other metrics as (MRR) to be a metric for aux output, right?
It seems to me that MRR and the other non NMF metrics shouldn't be computed for the aux output because they don't convey any info. In fact it's useless to learn the value of train_aux_ranking_score_new_MRR as it's being measured against the title scores which are not clicks (correct me if I'm wrong here) . However, the NMF metric should be part of either the primary output or the aux output (if it's allowed for both, then both should give the same metric value, right? i.e. for example: train_ranking_score_new_RankMatchFailure should be equals to train_aux_ranking_score_new_RankMatchFailure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the implementation straight-forward, combination metrics are defined only for the "main output". The other metrics can/cannot be relevant to the aux-output (depending on the actual aux label). That's the reasoning behind the current design. Let me know if you feel another approach is more appropriate.
However, the NMF metric should be part of either the primary output or the aux output (if it's allowed for both, then both should give the same metric value, right? i.e. for example: train_ranking_score_new_RankMatchFailure should be equals to train_aux_ranking_score_new_RankMatchFailure.
This is accurate, but an implementation nightmare at this point. Open to suggestions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it seems to me that no metrics other than loss should be tracked for the aux_output. and all other metrics loss, MRR, ACR and RankNMF should be tracked for the main output. In other words, the only change from training a model without aux_labels is that
- we should we should add RankNMF to the set of metrics tracked by the main output whenever aux_label is given.
- track loss only for aux_output
let me know what do you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense for the particular aux feature we're talking about. We might want to measure MRR on a different aux output, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's possible, but that aux source has to be clicks, no?
If we want to make this account for all different cases of possible aux targets/outputs, then perhaps we need to make the metrics to be tracked a user input. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I was thinking too. Will track this in a follow up
primary_training_loss = float(ml4ir_results.loc[ml4ir_results[0] == 'train_ranking_score_loss'][1]) | ||
assert np.isclose(primary_training_loss, 1.1877643, atol=0.0001) | ||
aux_training_loss = float(ml4ir_results.loc[ml4ir_results[0] == 'train_aux_ranking_score_loss'][1]) | ||
assert np.isclose(aux_training_loss, 2.3386843, atol=0.0001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expected value "2.3386843" is not correct and it's failing the test case. I checked, this value is changed from my PR. This is the value from my PR:
assert np.isclose(aux_training_loss, 1.2242277, atol=0.0001)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I might have messed up a merge conflict resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Arvind for this PR!
I recommend a follow up story to better select metrics for each output (primary and aux.)
No description provided.