Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more metrics #56

Open
maciejkula opened this issue Sep 6, 2017 · 3 comments
Open

Add more metrics #56

maciejkula opened this issue Sep 6, 2017 · 3 comments

Comments

@maciejkula
Copy link
Owner

  • precision@k
  • recall@k
  • AUC
@maciejkula
Copy link
Owner Author

As per #55

@maciejkula maciejkula added this to To Do in Spotlight Sep 6, 2017
@mokarakaya
Copy link

mokarakaya commented Jan 23, 2018

Hello @maciejkula ,

First, thank you very much for implementing Spotlight. I am planning to use Spotlight for my further study.

I'd like to contribute by implementing AUC. Here is my plan for implementation with some questions;

  • We can create a precision-recall curve (axises are precision and recall) or we can create ROC curve (axises are true positive rate to false positive rate) (See ref1). I think the precision-recall curve is fine. What is your opinion?

  • We will use different k values (number of recommended items) to produce different points of the curve.

  • New evaluation metric will return a single result since AUC is a reduced result of curve graphs. I see that results (e.g. precision and recall) in Spotlight are generally arrays rather than single results. Do you think AUC metric should return an array or a single result?

  • We can calculate the area under the curve by using the Trapezoidal rule or Simpson's rule. By default, the metric will calculate the area by using Trapezoidal rule. Simpson's rule will be optional.

Trapezoidal rule: https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html
Simpson's rule: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html#scipy.integrate.simps

Do you think the plan is ok for implementation? Please let me know your comments.

Ref1 - Recommender Systems Handbook 2nd edition - 8.3.2.2 Measuring Usage Prediction

ps: we need to hit x=1 and y=1 values, since this metric is generally used to compare multiple algorithms.

@nikisix
Copy link
Contributor

nikisix commented Jul 10, 2018

@mkarakaya,
You should contribute your idea! Here are my thoughts having helped out in the past on spotlight evaluation metrics:

  1. Yes prec-recall is fine
  2. Of course
  3. I asked the same question, and @maciejkula's response was as you guess -- an array. Personally, I would not be opposed to a single result however, in the case that integrating tons of AUCs were slow and could be sped up somehow by aggregating first.
  4. Sure

Also, have you considered a confusion matrix or at least F1-score?

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants