Skip to content
This repository has been archived by the owner on Aug 18, 2020. It is now read-only.

One must numericalize labels in FastAI2 metrics, unlike SKLearn metrics #487

Closed
algal opened this issue Aug 7, 2020 · 2 comments
Closed

Comments

@algal
Copy link

algal commented Aug 7, 2020

Please confirm you have the latest versions of fastai, fastcore, fastscript, and nbdev prior to reporting a bug (delete one):

Problem is visible in fastai12 version 0.0.23 with fastcore 0.1.25 and sklearn 0.22.2.post1

Describe the bug

When a data set uses string labels, and when FastAI2 metric functions (like Precision etc.) take a labels arguments that they pass on to the underlying SciKit learn functions (like precision_score etc.), it is necessary to pass in the numericalized labels rather than the labels themselves for the FastAI2 metric function to work.

This is different from the behavior of the underlying scikit functions, which allow you to pass in string labels with data sets that use string labels, and numerical labels with data sets that use numerical labels.

The bug is either:

  1. at least, that the FastAI2 should document this requirement more clearly, rather than giving the impression that the FastAI2 metrics functions are transparent wrappings of the underlying SKLearn functions;
  2. or, that the FastAI2 wrapping of the SKLearn functions should automatically perform this mapping from string labels to numerical labels when necessary, so that every FastAI2 wrapping function allows you to pass in string label values directly as do the wrapped SKLearn functions

I don't know which solution is more consistent with the project's design goals. I'd vote for number 2.

To Reproduce
Steps to reproduce the behavior:

Load and run this Colab notebook: https://gist.github.com/algal/c99461631787887bcefeead067fe1232

It shows the use of string values for labels in sklearn (where it works), in fastai2 (where it does not), and the fix with numericalizing the labels (which works for fastai2)

Expected behavior
I had expected to be able to pass in the raw string labels to the labels argument for the fastai2 metrics given that this worked for the underlying sklearn metric functions.

Error with full stack trace

Full stack trace is visible in the gist linked above.

Additional context

Forum discussion: https://forums.fast.ai/t/passing-parameters-in-custom-metrics-in-fastai2/76113/4

@github-actions
Copy link

github-actions bot commented Aug 7, 2020

👋 @algal! Thank you for opening your first issue in this repo. We are so happy that you have decided to contribute and value your contribution. Please read these materials before proceeding: Contributing Guide and Code of Conduct.

@jph00
Copy link
Member

jph00 commented Aug 7, 2020

I'm not sure this is directly fixable in the way you're looking for, since metrics in PyTorch need to be passed tensors, which can't be strings. And metrics are independent of Learners, at least for now. So making this all work transparently would require an API change - which isn't out of the question, but won't be added prior to the release of v2.

I've added documentation of this issue now, and also added a little helper method to make things easier (dl.vocab.map_objs(labels) will give you the label indices to pass to the metric labels param). See 6acdeea

@jph00 jph00 closed this as completed Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants