Adding measurements directory for DMT and other data measurement work… #35

meg-huggingface · 2022-05-10T05:14:10Z

… to utilize.

sashavor · 2022-05-10T13:06:12Z

Tagging @lvwerra since we haven't really discussed what the inputs and outputs of measurements should be (as opposed to metrics) -- maybe worth discussing, so once we start adding more, they all have a similar format?

lvwerra · 2022-05-10T16:34:18Z

Hi @meg-huggingface, thanks a lot for adding the data measurements! This is also connected to #18. A few general remarks about the PR:

Measurement API

Similar to how it is done in #34 I think it would be great if we establish a uniform interface for all measurements. Following metrics we could establish the following:

Metrics

from evaluate import load_metric

metric = load_metric("accuracy")
metric.compute(references=references, predictions=predictions)
>>> {"accuracy": 0.7}

Measurement

from evaluate import load_measure

measure = load_measure("npmi")
measure.compute(data=data)
>>> {"nmpi": 0.88}

A uniform interface will help users familiar with the metrics part of evaluate/datasets easily adapt to measurements.

Classes

It would simplify making the measurements uniform if they built on top of a class to metrics (see here). The above could be easiest achieved by creating a Measurement class that inherits what's currently called Metric. We probably want to refactor this a bit such that:

EvaluationModule #or whatever we want to call this
 |-- Metric
 |-- Comparison
 |-- Measurement

Where EvaluationModule is essentially Metric and then Metric as well as the other subclasses inherit from it.

Template

Once we have the class structure we could add a template for measurements in templates/ and expand the evaluate-cli create command to allow to create a new measurement and push it to the hub.

Miscellanious

For the README template of measurements we could probably build on @sashavor's template for metrics templates/{{ cookiecutter.metric_slug }}/README.md
Dependencies: I see that there is e.g. a streamlit dependency - is this necessary?
it would be good to use the evaluate logger for consistency

cc @lhoestq @douwekiela

lvwerra · 2022-05-23T09:12:58Z

Ok with you to close this PR and create smaller PRs like #44 for each measurement with the same structure? Also what do you think @sashavor?

sashavor · 2022-05-24T13:15:43Z

Yes! Perfect :)

Adding measurements directory for DMT and other data measurement work…

6d334da

… to utilize.

meg-huggingface requested a review from sashavor May 10, 2022 05:15

meg-huggingface added 2 commits May 9, 2022 22:23

Porting nPMI measure to be in a general Hugging Face library.

80ff3cf

Adding Zipf measurement

dd5e51e

lvwerra mentioned this pull request May 11, 2022

Refactor for loading multiple evaluation categories #38

Closed

lvwerra closed this May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding measurements directory for DMT and other data measurement work… #35

Adding measurements directory for DMT and other data measurement work… #35

meg-huggingface commented May 10, 2022

sashavor commented May 10, 2022

lvwerra commented May 10, 2022

lvwerra commented May 23, 2022

sashavor commented May 24, 2022

Adding measurements directory for DMT and other data measurement work… #35

Adding measurements directory for DMT and other data measurement work… #35

Conversation

meg-huggingface commented May 10, 2022

sashavor commented May 10, 2022

lvwerra commented May 10, 2022

Measurement API

Classes

Template

Miscellanious

lvwerra commented May 23, 2022

sashavor commented May 24, 2022