Remove metrics #6983

albertvillanova · 2024-06-19T09:08:55Z

Remove all metrics, as part of the 3.0 release.

Note they are deprecated since 2.5.0 version.

HuggingFaceDocBuilderDev · 2024-06-19T09:58:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq

Awesome ! You can also remove all the dependencies from https://github.com/huggingface/datasets/blob/main/additional-tests-requirements.txt (except torchdata)

github-actions · 2024-06-28T06:57:37Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.005566 / 0.011353 (-0.005787)	0.003977 / 0.011008 (-0.007031)	0.063250 / 0.038508 (0.024742)	0.030907 / 0.023109 (0.007798)	0.244989 / 0.275898 (-0.030909)	0.272139 / 0.323480 (-0.051341)	0.004332 / 0.007986 (-0.003653)	0.002960 / 0.004328 (-0.001368)	0.050147 / 0.004250 (0.045896)	0.044740 / 0.037052 (0.007688)	0.256947 / 0.258489 (-0.001542)	0.290372 / 0.293841 (-0.003469)	0.030444 / 0.128546 (-0.098102)	0.012675 / 0.075646 (-0.062971)	0.203852 / 0.419271 (-0.215420)	0.036977 / 0.043533 (-0.006556)	0.244401 / 0.255139 (-0.010738)	0.270020 / 0.283200 (-0.013179)	0.018177 / 0.141683 (-0.123506)	1.122189 / 1.452155 (-0.329966)	1.176688 / 1.492716 (-0.316028)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.100721 / 0.018006 (0.082715)	0.311824 / 0.000490 (0.311335)	0.000222 / 0.000200 (0.000022)	0.000043 / 0.000054 (-0.000012)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.020039 / 0.037411 (-0.017373)	0.062084 / 0.014526 (0.047558)	0.074317 / 0.176557 (-0.102240)	0.123935 / 0.737135 (-0.613200)	0.076186 / 0.296338 (-0.220153)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.284827 / 0.215209 (0.069618)	2.782727 / 2.077655 (0.705072)	1.417624 / 1.504120 (-0.086496)	1.294476 / 1.541195 (-0.246718)	1.332658 / 1.468490 (-0.135832)	0.724820 / 4.584777 (-3.859957)	2.384546 / 3.745712 (-1.361166)	2.866759 / 5.269862 (-2.403103)	1.930756 / 4.565676 (-2.634921)	0.083090 / 0.424275 (-0.341185)	0.005566 / 0.007607 (-0.002041)	0.340117 / 0.226044 (0.114072)	3.342417 / 2.268929 (1.073488)	1.807842 / 55.444624 (-53.636782)	1.511647 / 6.876477 (-5.364830)	1.653893 / 2.142072 (-0.488179)	0.803983 / 4.805227 (-4.001244)	0.136205 / 6.500664 (-6.364459)	0.042815 / 0.075469 (-0.032654)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.962346 / 1.841788 (-0.879442)	11.792239 / 8.074308 (3.717931)	9.236256 / 10.191392 (-0.955136)	0.143200 / 0.680424 (-0.537224)	0.015050 / 0.534201 (-0.519151)	0.304623 / 0.579283 (-0.274660)	0.266417 / 0.434364 (-0.167947)	0.341213 / 0.540337 (-0.199124)	0.454258 / 1.386936 (-0.932678)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.005917 / 0.011353 (-0.005436)	0.004005 / 0.011008 (-0.007003)	0.049781 / 0.038508 (0.011273)	0.033310 / 0.023109 (0.010200)	0.271881 / 0.275898 (-0.004017)	0.296855 / 0.323480 (-0.026625)	0.004479 / 0.007986 (-0.003507)	0.002818 / 0.004328 (-0.001510)	0.048213 / 0.004250 (0.043962)	0.043480 / 0.037052 (0.006428)	0.285963 / 0.258489 (0.027473)	0.317304 / 0.293841 (0.023463)	0.031619 / 0.128546 (-0.096928)	0.012312 / 0.075646 (-0.063335)	0.059904 / 0.419271 (-0.359368)	0.033152 / 0.043533 (-0.010381)	0.274198 / 0.255139 (0.019059)	0.290469 / 0.283200 (0.007269)	0.019424 / 0.141683 (-0.122258)	1.133669 / 1.452155 (-0.318485)	1.194427 / 1.492716 (-0.298290)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.101561 / 0.018006 (0.083555)	0.312617 / 0.000490 (0.312127)	0.000216 / 0.000200 (0.000016)	0.000045 / 0.000054 (-0.000009)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.023705 / 0.037411 (-0.013706)	0.076781 / 0.014526 (0.062255)	0.089922 / 0.176557 (-0.086634)	0.129182 / 0.737135 (-0.607953)	0.092022 / 0.296338 (-0.204317)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.300977 / 0.215209 (0.085768)	2.909088 / 2.077655 (0.831433)	1.592821 / 1.504120 (0.088701)	1.466627 / 1.541195 (-0.074568)	1.497558 / 1.468490 (0.029068)	0.720986 / 4.584777 (-3.863791)	0.958039 / 3.745712 (-2.787673)	3.023413 / 5.269862 (-2.246448)	1.933245 / 4.565676 (-2.632432)	0.080500 / 0.424275 (-0.343775)	0.005243 / 0.007607 (-0.002364)	0.361259 / 0.226044 (0.135215)	3.447317 / 2.268929 (1.178389)	1.938234 / 55.444624 (-53.506390)	1.671563 / 6.876477 (-5.204913)	1.674647 / 2.142072 (-0.467425)	0.790606 / 4.805227 (-4.014621)	0.133312 / 6.500664 (-6.367352)	0.041241 / 0.075469 (-0.034228)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.996167 / 1.841788 (-0.845621)	12.460877 / 8.074308 (4.386569)	10.608415 / 10.191392 (0.417023)	0.134076 / 0.680424 (-0.546348)	0.016166 / 0.534201 (-0.518035)	0.301218 / 0.579283 (-0.278065)	0.128979 / 0.434364 (-0.305385)	0.336453 / 0.540337 (-0.203884)	0.435561 / 1.386936 (-0.951375)

albertvillanova added 26 commits June 19, 2024 09:59

Remove metrics subpackage

2d40534

Update Makefile

4b6d798

Delete tests of metric module factories

f9ad81b

Delete metric tests

9f7f950

Delete metric warning tests

e564c5f

Delete inspect_metric tests

c22f49f

Delete inspect_metric and list_metrics

2c4ff3b

Delete load_metric

4cd9614

Update import_main_class

4b4095c

Delete Metric

9b30cb0

Delete MetricInfo

6aa3d6c

Update CI

75e4172

Delete metrics-tests extras require and update CI

348bb29

Update .gitignore

be4e6a5

Update docs

8dee635

Delete config.HF_METRICS_CACHE

084a828

Update setup keywords

c620ee1

Update increase_load_count

2285a97

Update hf_github_url

3e76736

Update cache docs

ab7f94e

Delete metric card template

4e4c66c

Delete metric_loading_script_dir test fixture

17d5161

Update comments and docstrings

e7bf4d3

Delete config METRIC_INFO_FILENAME

70fa519

Update main classes docs

01dcd88

Delete MetricModule

d9bde71

albertvillanova added 2 commits June 19, 2024 11:58

Update docstring

e076faf

Merge remote-tracking branch 'upstream/main' into remove-metrics

b6ec0c4

albertvillanova marked this pull request as ready for review June 20, 2024 06:27

albertvillanova added this to the 3.0 milestone Jun 20, 2024

albertvillanova mentioned this pull request Jun 27, 2024

Remove deprecated code #6996

Merged

3 tasks

Merge branch 'main' into remove-metrics

7c7a876

albertvillanova requested a review from a team June 27, 2024 13:39

lhoestq approved these changes Jun 27, 2024

View reviewed changes

albertvillanova added 2 commits June 28, 2024 06:55

Delete metrics additional tests requirements

9ac3038

Merge branch 'main' into remove-metrics

425b164

albertvillanova merged commit 70e7355 into main Jun 28, 2024
13 checks passed

albertvillanova deleted the remove-metrics branch June 28, 2024 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove metrics #6983

Remove metrics #6983

albertvillanova commented Jun 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 19, 2024

lhoestq left a comment

github-actions bot commented Jun 28, 2024

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Remove metrics #6983

Remove metrics #6983

Conversation

albertvillanova commented Jun 19, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jun 19, 2024

lhoestq left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 28, 2024

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova commented Jun 19, 2024 •

edited

Loading