Replace metadata utils with `huggingface_hub`'s RepoCard API #5949

mariosasko · 2023-06-13T13:03:19Z

Use huggingface_hub's RepoCard API instead of DatasetMetadata for modifying the card's YAML, and deprecate datasets.utils.metadata and datasets.utils.readme.

After removing these modules, we can also delete datasets.utils.resources since the moon landing repo now stores its own version of these resources for the metadata UI.

PS: this change requires bumping huggingface_hub to 0.13.0 (Transformers requires 0.14.0, so should be ok)

HuggingFaceDocBuilderDev · 2023-06-13T13:35:55Z

The documentation is not available anymore as the PR was closed or merged.

github-actions · 2023-06-13T13:37:50Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006635 / 0.011353 (-0.004718)	0.004439 / 0.011008 (-0.006570)	0.107831 / 0.038508 (0.069323)	0.035664 / 0.023109 (0.012555)	0.393733 / 0.275898 (0.117835)	0.418336 / 0.323480 (0.094856)	0.005739 / 0.007986 (-0.002247)	0.005737 / 0.004328 (0.001408)	0.079820 / 0.004250 (0.075569)	0.045402 / 0.037052 (0.008349)	0.396108 / 0.258489 (0.137619)	0.422951 / 0.293841 (0.129110)	0.030506 / 0.128546 (-0.098040)	0.009785 / 0.075646 (-0.065861)	0.375302 / 0.419271 (-0.043969)	0.054355 / 0.043533 (0.010823)	0.399652 / 0.255139 (0.144513)	0.410825 / 0.283200 (0.127625)	0.109238 / 0.141683 (-0.032445)	1.687532 / 1.452155 (0.235378)	1.736829 / 1.492716 (0.244113)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.226514 / 0.018006 (0.208508)	0.487010 / 0.000490 (0.486520)	0.006436 / 0.000200 (0.006236)	0.000102 / 0.000054 (0.000048)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.029097 / 0.037411 (-0.008315)	0.122979 / 0.014526 (0.108453)	0.129454 / 0.176557 (-0.047103)	0.194006 / 0.737135 (-0.543129)	0.137968 / 0.296338 (-0.158370)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.466425 / 0.215209 (0.251216)	4.627307 / 2.077655 (2.549652)	2.108840 / 1.504120 (0.604720)	1.882547 / 1.541195 (0.341353)	1.891077 / 1.468490 (0.422587)	0.590646 / 4.584777 (-3.994131)	4.176918 / 3.745712 (0.431205)	2.071475 / 5.269862 (-3.198386)	1.173815 / 4.565676 (-3.391862)	0.075330 / 0.424275 (-0.348945)	0.012944 / 0.007607 (0.005337)	0.587080 / 0.226044 (0.361036)	5.827053 / 2.268929 (3.558125)	2.694258 / 55.444624 (-52.750366)	2.276997 / 6.876477 (-4.599480)	2.329678 / 2.142072 (0.187605)	0.721860 / 4.805227 (-4.083367)	0.159238 / 6.500664 (-6.341426)	0.073013 / 0.075469 (-0.002456)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.345396 / 1.841788 (-0.496391)	16.619283 / 8.074308 (8.544975)	14.754754 / 10.191392 (4.563362)	0.180784 / 0.680424 (-0.499639)	0.020376 / 0.534201 (-0.513825)	0.451010 / 0.579283 (-0.128273)	0.481524 / 0.434364 (0.047160)	0.564777 / 0.540337 (0.024440)	0.683232 / 1.386936 (-0.703704)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007243 / 0.011353 (-0.004110)	0.005262 / 0.011008 (-0.005746)	0.084090 / 0.038508 (0.045581)	0.037429 / 0.023109 (0.014320)	0.404038 / 0.275898 (0.128140)	0.445040 / 0.323480 (0.121560)	0.006220 / 0.007986 (-0.001766)	0.004256 / 0.004328 (-0.000072)	0.083794 / 0.004250 (0.079544)	0.052655 / 0.037052 (0.015603)	0.414083 / 0.258489 (0.155594)	0.458190 / 0.293841 (0.164349)	0.032719 / 0.128546 (-0.095828)	0.010063 / 0.075646 (-0.065583)	0.092281 / 0.419271 (-0.326990)	0.053888 / 0.043533 (0.010355)	0.407813 / 0.255139 (0.152674)	0.431692 / 0.283200 (0.148493)	0.119799 / 0.141683 (-0.021884)	1.709853 / 1.452155 (0.257698)	1.771592 / 1.492716 (0.278876)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.246540 / 0.018006 (0.228534)	0.483199 / 0.000490 (0.482709)	0.002514 / 0.000200 (0.002315)	0.000096 / 0.000054 (0.000042)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.031576 / 0.037411 (-0.005835)	0.130020 / 0.014526 (0.115495)	0.140285 / 0.176557 (-0.036272)	0.196164 / 0.737135 (-0.540972)	0.143924 / 0.296338 (-0.152414)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.488549 / 0.215209 (0.273340)	4.888055 / 2.077655 (2.810400)	2.389163 / 1.504120 (0.885043)	2.184626 / 1.541195 (0.643431)	2.260227 / 1.468490 (0.791737)	0.601331 / 4.584777 (-3.983446)	4.386159 / 3.745712 (0.640447)	3.345814 / 5.269862 (-1.924048)	1.734360 / 4.565676 (-2.831317)	0.073199 / 0.424275 (-0.351076)	0.012397 / 0.007607 (0.004790)	0.601411 / 0.226044 (0.375366)	6.135000 / 2.268929 (3.866072)	2.930169 / 55.444624 (-52.514456)	2.532631 / 6.876477 (-4.343845)	2.619351 / 2.142072 (0.477279)	0.740954 / 4.805227 (-4.064274)	0.162936 / 6.500664 (-6.337728)	0.073885 / 0.075469 (-0.001585)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.502493 / 1.841788 (-0.339294)	17.026756 / 8.074308 (8.952448)	15.880958 / 10.191392 (5.689566)	0.167261 / 0.680424 (-0.513163)	0.020347 / 0.534201 (-0.513854)	0.452902 / 0.579283 (-0.126381)	0.481614 / 0.434364 (0.047250)	0.539893 / 0.540337 (-0.000445)	0.653401 / 1.386936 (-0.733535)

github-actions · 2023-06-13T13:53:24Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008268 / 0.011353 (-0.003084)	0.005538 / 0.011008 (-0.005470)	0.126136 / 0.038508 (0.087628)	0.046100 / 0.023109 (0.022991)	0.366882 / 0.275898 (0.090984)	0.408912 / 0.323480 (0.085432)	0.007090 / 0.007986 (-0.000895)	0.004820 / 0.004328 (0.000491)	0.091432 / 0.004250 (0.087181)	0.058390 / 0.037052 (0.021338)	0.368787 / 0.258489 (0.110298)	0.419429 / 0.293841 (0.125588)	0.034958 / 0.128546 (-0.093588)	0.010526 / 0.075646 (-0.065120)	0.463063 / 0.419271 (0.043791)	0.070544 / 0.043533 (0.027011)	0.366182 / 0.255139 (0.111043)	0.390851 / 0.283200 (0.107652)	0.128377 / 0.141683 (-0.013306)	1.819385 / 1.452155 (0.367231)	1.928834 / 1.492716 (0.436117)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.228413 / 0.018006 (0.210407)	0.485511 / 0.000490 (0.485021)	0.005395 / 0.000200 (0.005195)	0.000119 / 0.000054 (0.000064)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.035209 / 0.037411 (-0.002203)	0.144492 / 0.014526 (0.129967)	0.150467 / 0.176557 (-0.026089)	0.223861 / 0.737135 (-0.513274)	0.156363 / 0.296338 (-0.139975)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.517751 / 0.215209 (0.302542)	5.150438 / 2.077655 (3.072783)	2.483601 / 1.504120 (0.979481)	2.279786 / 1.541195 (0.738592)	2.374510 / 1.468490 (0.906020)	0.637547 / 4.584777 (-3.947230)	4.845393 / 3.745712 (1.099681)	2.241554 / 5.269862 (-3.028307)	1.290105 / 4.565676 (-3.275572)	0.079791 / 0.424275 (-0.344484)	0.014915 / 0.007607 (0.007308)	0.640468 / 0.226044 (0.414423)	6.394810 / 2.268929 (4.125881)	3.012748 / 55.444624 (-52.431876)	2.625565 / 6.876477 (-4.250912)	2.792435 / 2.142072 (0.650363)	0.782284 / 4.805227 (-4.022944)	0.171628 / 6.500664 (-6.329036)	0.081714 / 0.075469 (0.006245)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.592411 / 1.841788 (-0.249377)	18.999604 / 8.074308 (10.925295)	18.469946 / 10.191392 (8.278554)	0.200878 / 0.680424 (-0.479546)	0.021595 / 0.534201 (-0.512606)	0.519247 / 0.579283 (-0.060036)	0.534940 / 0.434364 (0.100576)	0.656325 / 0.540337 (0.115987)	0.789658 / 1.386936 (-0.597278)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008093 / 0.011353 (-0.003260)	0.005524 / 0.011008 (-0.005484)	0.092339 / 0.038508 (0.053831)	0.045619 / 0.023109 (0.022510)	0.449376 / 0.275898 (0.173478)	0.478587 / 0.323480 (0.155107)	0.006978 / 0.007986 (-0.001007)	0.004622 / 0.004328 (0.000294)	0.090618 / 0.004250 (0.086368)	0.059321 / 0.037052 (0.022269)	0.450989 / 0.258489 (0.192500)	0.491652 / 0.293841 (0.197811)	0.033308 / 0.128546 (-0.095238)	0.010677 / 0.075646 (-0.064969)	0.099836 / 0.419271 (-0.319435)	0.055937 / 0.043533 (0.012404)	0.440560 / 0.255139 (0.185421)	0.475305 / 0.283200 (0.192105)	0.130829 / 0.141683 (-0.010854)	1.857943 / 1.452155 (0.405789)	1.989534 / 1.492716 (0.496818)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.244715 / 0.018006 (0.226709)	0.482866 / 0.000490 (0.482377)	0.001100 / 0.000200 (0.000900)	0.000095 / 0.000054 (0.000041)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.036288 / 0.037411 (-0.001124)	0.147903 / 0.014526 (0.133377)	0.154141 / 0.176557 (-0.022416)	0.221863 / 0.737135 (-0.515272)	0.162319 / 0.296338 (-0.134019)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.536972 / 0.215209 (0.321763)	5.382866 / 2.077655 (3.305211)	2.719575 / 1.504120 (1.215456)	2.516596 / 1.541195 (0.975401)	2.699602 / 1.468490 (1.231112)	0.639886 / 4.584777 (-3.944891)	5.109746 / 3.745712 (1.364034)	2.260206 / 5.269862 (-3.009656)	1.305506 / 4.565676 (-3.260170)	0.080262 / 0.424275 (-0.344013)	0.014801 / 0.007607 (0.007194)	0.661228 / 0.226044 (0.435184)	6.596485 / 2.268929 (4.327557)	3.226114 / 55.444624 (-52.218510)	2.859776 / 6.876477 (-4.016701)	3.059355 / 2.142072 (0.917282)	0.793413 / 4.805227 (-4.011814)	0.176521 / 6.500664 (-6.324143)	0.084062 / 0.075469 (0.008593)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.642085 / 1.841788 (-0.199703)	20.355459 / 8.074308 (12.281151)	17.979620 / 10.191392 (7.788228)	0.229329 / 0.680424 (-0.451094)	0.025681 / 0.534201 (-0.508520)	0.534142 / 0.579283 (-0.045141)	0.623439 / 0.434364 (0.189075)	0.621938 / 0.540337 (0.081601)	0.759038 / 1.386936 (-0.627898)

github-actions · 2023-06-13T14:42:17Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007703 / 0.011353 (-0.003649)	0.005362 / 0.011008 (-0.005646)	0.113111 / 0.038508 (0.074602)	0.038891 / 0.023109 (0.015782)	0.348938 / 0.275898 (0.073040)	0.398079 / 0.323480 (0.074599)	0.006707 / 0.007986 (-0.001278)	0.004489 / 0.004328 (0.000160)	0.087194 / 0.004250 (0.082943)	0.054268 / 0.037052 (0.017216)	0.359949 / 0.258489 (0.101460)	0.402959 / 0.293841 (0.109118)	0.032508 / 0.128546 (-0.096038)	0.010224 / 0.075646 (-0.065422)	0.387007 / 0.419271 (-0.032264)	0.058971 / 0.043533 (0.015439)	0.345085 / 0.255139 (0.089946)	0.384306 / 0.283200 (0.101107)	0.122253 / 0.141683 (-0.019430)	1.706353 / 1.452155 (0.254199)	1.840780 / 1.492716 (0.348063)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.254374 / 0.018006 (0.236368)	0.497387 / 0.000490 (0.496897)	0.012294 / 0.000200 (0.012094)	0.000108 / 0.000054 (0.000054)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.030902 / 0.037411 (-0.006509)	0.132098 / 0.014526 (0.117573)	0.140311 / 0.176557 (-0.036245)	0.205887 / 0.737135 (-0.531249)	0.143992 / 0.296338 (-0.152347)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.467367 / 0.215209 (0.252158)	4.669936 / 2.077655 (2.592281)	2.155358 / 1.504120 (0.651238)	1.984132 / 1.541195 (0.442937)	2.102352 / 1.468490 (0.633861)	0.607014 / 4.584777 (-3.977763)	4.396479 / 3.745712 (0.650767)	4.666056 / 5.269862 (-0.603806)	2.176649 / 4.565676 (-2.389028)	0.072657 / 0.424275 (-0.351619)	0.012367 / 0.007607 (0.004759)	0.569706 / 0.226044 (0.343661)	5.749083 / 2.268929 (3.480154)	2.640824 / 55.444624 (-52.803801)	2.310253 / 6.876477 (-4.566224)	2.486748 / 2.142072 (0.344676)	0.737891 / 4.805227 (-4.067336)	0.163507 / 6.500664 (-6.337157)	0.075776 / 0.075469 (0.000307)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.362710 / 1.841788 (-0.479078)	17.010705 / 8.074308 (8.936396)	15.084231 / 10.191392 (4.892839)	0.218274 / 0.680424 (-0.462150)	0.019555 / 0.534201 (-0.514646)	0.456013 / 0.579283 (-0.123270)	0.502772 / 0.434364 (0.068408)	0.581480 / 0.540337 (0.041142)	0.686952 / 1.386936 (-0.699984)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007976 / 0.011353 (-0.003377)	0.005141 / 0.011008 (-0.005868)	0.086629 / 0.038508 (0.048121)	0.039553 / 0.023109 (0.016444)	0.433028 / 0.275898 (0.157130)	0.463444 / 0.323480 (0.139964)	0.006967 / 0.007986 (-0.001018)	0.005814 / 0.004328 (0.001485)	0.086266 / 0.004250 (0.082015)	0.055384 / 0.037052 (0.018332)	0.428733 / 0.258489 (0.170243)	0.475670 / 0.293841 (0.181829)	0.032872 / 0.128546 (-0.095674)	0.010664 / 0.075646 (-0.064983)	0.094357 / 0.419271 (-0.324915)	0.058386 / 0.043533 (0.014854)	0.431114 / 0.255139 (0.175975)	0.441728 / 0.283200 (0.158528)	0.131942 / 0.141683 (-0.009740)	1.782214 / 1.452155 (0.330060)	1.843185 / 1.492716 (0.350469)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.247047 / 0.018006 (0.229041)	0.488931 / 0.000490 (0.488441)	0.002657 / 0.000200 (0.002457)	0.000106 / 0.000054 (0.000052)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.033893 / 0.037411 (-0.003518)	0.131021 / 0.014526 (0.116495)	0.142892 / 0.176557 (-0.033665)	0.200955 / 0.737135 (-0.536180)	0.151329 / 0.296338 (-0.145010)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.521138 / 0.215209 (0.305929)	5.085207 / 2.077655 (3.007552)	2.652901 / 1.504120 (1.148781)	2.401545 / 1.541195 (0.860350)	2.553461 / 1.468490 (1.084971)	0.615347 / 4.584777 (-3.969430)	4.448038 / 3.745712 (0.702326)	2.049997 / 5.269862 (-3.219865)	1.190602 / 4.565676 (-3.375075)	0.073356 / 0.424275 (-0.350919)	0.013685 / 0.007607 (0.006078)	0.626705 / 0.226044 (0.400660)	6.391941 / 2.268929 (4.123012)	3.218864 / 55.444624 (-52.225760)	2.858808 / 6.876477 (-4.017669)	3.005808 / 2.142072 (0.863736)	0.740725 / 4.805227 (-4.064502)	0.161904 / 6.500664 (-6.338760)	0.073727 / 0.075469 (-0.001742)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.488623 / 1.841788 (-0.353164)	17.584367 / 8.074308 (9.510059)	16.281818 / 10.191392 (6.090426)	0.164482 / 0.680424 (-0.515942)	0.020197 / 0.534201 (-0.514003)	0.456750 / 0.579283 (-0.122533)	0.501156 / 0.434364 (0.066792)	0.549779 / 0.540337 (0.009442)	0.650156 / 1.386936 (-0.736780)

lhoestq

LGTM ! :)

src/datasets/arrow_dataset.py

src/datasets/dataset_dict.py

…-metadata-utils

github-actions · 2023-06-27T15:39:05Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008337 / 0.011353 (-0.003016)	0.005911 / 0.011008 (-0.005097)	0.129037 / 0.038508 (0.090529)	0.046071 / 0.023109 (0.022962)	0.418657 / 0.275898 (0.142759)	0.490340 / 0.323480 (0.166860)	0.006387 / 0.007986 (-0.001598)	0.004724 / 0.004328 (0.000396)	0.097953 / 0.004250 (0.093702)	0.069025 / 0.037052 (0.031972)	0.431178 / 0.258489 (0.172689)	0.458363 / 0.293841 (0.164522)	0.049341 / 0.128546 (-0.079205)	0.014637 / 0.075646 (-0.061009)	0.439800 / 0.419271 (0.020529)	0.069905 / 0.043533 (0.026373)	0.406775 / 0.255139 (0.151636)	0.441989 / 0.283200 (0.158790)	0.046009 / 0.141683 (-0.095674)	1.847630 / 1.452155 (0.395475)	1.904067 / 1.492716 (0.411351)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.288305 / 0.018006 (0.270299)	0.594547 / 0.000490 (0.594058)	0.005600 / 0.000200 (0.005400)	0.000106 / 0.000054 (0.000052)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.033847 / 0.037411 (-0.003564)	0.125139 / 0.014526 (0.110613)	0.147982 / 0.176557 (-0.028574)	0.208396 / 0.737135 (-0.528739)	0.144005 / 0.296338 (-0.152334)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.669175 / 0.215209 (0.453966)	6.605289 / 2.077655 (4.527634)	2.720468 / 1.504120 (1.216348)	2.341355 / 1.541195 (0.800160)	2.402069 / 1.468490 (0.933578)	0.939303 / 4.584777 (-3.645474)	5.718545 / 3.745712 (1.972833)	2.856235 / 5.269862 (-2.413627)	1.821555 / 4.565676 (-2.744121)	0.105473 / 0.424275 (-0.318802)	0.014490 / 0.007607 (0.006883)	0.774349 / 0.226044 (0.548305)	8.065048 / 2.268929 (5.796120)	3.508482 / 55.444624 (-51.936143)	2.822881 / 6.876477 (-4.053596)	2.962947 / 2.142072 (0.820875)	1.138944 / 4.805227 (-3.666284)	0.248414 / 6.500664 (-6.252250)	0.095665 / 0.075469 (0.020196)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.688231 / 1.841788 (-0.153557)	18.673305 / 8.074308 (10.598997)	22.768663 / 10.191392 (12.577271)	0.211238 / 0.680424 (-0.469186)	0.031380 / 0.534201 (-0.502821)	0.517175 / 0.579283 (-0.062108)	0.626437 / 0.434364 (0.192073)	0.624225 / 0.540337 (0.083888)	0.743746 / 1.386936 (-0.643191)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008888 / 0.011353 (-0.002464)	0.005491 / 0.011008 (-0.005517)	0.105013 / 0.038508 (0.066505)	0.049456 / 0.023109 (0.026347)	0.528989 / 0.275898 (0.253091)	0.651871 / 0.323480 (0.328391)	0.006683 / 0.007986 (-0.001302)	0.004365 / 0.004328 (0.000037)	0.098161 / 0.004250 (0.093911)	0.075615 / 0.037052 (0.038563)	0.543746 / 0.258489 (0.285257)	0.650855 / 0.293841 (0.357014)	0.050220 / 0.128546 (-0.078327)	0.014471 / 0.075646 (-0.061175)	0.115903 / 0.419271 (-0.303368)	0.065925 / 0.043533 (0.022392)	0.527797 / 0.255139 (0.272658)	0.543834 / 0.283200 (0.260634)	0.043005 / 0.141683 (-0.098678)	1.842846 / 1.452155 (0.390691)	1.970615 / 1.492716 (0.477899)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.287350 / 0.018006 (0.269343)	0.591139 / 0.000490 (0.590649)	0.006423 / 0.000200 (0.006223)	0.000107 / 0.000054 (0.000052)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.034594 / 0.037411 (-0.002818)	0.137155 / 0.014526 (0.122629)	0.154662 / 0.176557 (-0.021894)	0.217834 / 0.737135 (-0.519301)	0.159642 / 0.296338 (-0.136696)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.664288 / 0.215209 (0.449079)	6.926912 / 2.077655 (4.849257)	3.028957 / 1.504120 (1.524837)	2.625178 / 1.541195 (1.083983)	2.725316 / 1.468490 (1.256826)	1.015715 / 4.584777 (-3.569062)	5.834694 / 3.745712 (2.088982)	5.105269 / 5.269862 (-0.164593)	2.316194 / 4.565676 (-2.249483)	0.113802 / 0.424275 (-0.310473)	0.014079 / 0.007607 (0.006472)	0.893727 / 0.226044 (0.667683)	8.577701 / 2.268929 (6.308772)	3.706907 / 55.444624 (-51.737717)	3.087530 / 6.876477 (-3.788947)	3.295004 / 2.142072 (1.152931)	1.204172 / 4.805227 (-3.601055)	0.248720 / 6.500664 (-6.251944)	0.107208 / 0.075469 (0.031739)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.800058 / 1.841788 (-0.041730)	19.253646 / 8.074308 (11.179338)	22.590804 / 10.191392 (12.399412)	0.270687 / 0.680424 (-0.409737)	0.028678 / 0.534201 (-0.505522)	0.534670 / 0.579283 (-0.044613)	0.642881 / 0.434364 (0.208518)	0.615521 / 0.540337 (0.075184)	0.723733 / 1.386936 (-0.663203)

github-actions · 2023-06-27T15:40:20Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.017236 / 0.011353 (0.005883)	0.005341 / 0.011008 (-0.005667)	0.131471 / 0.038508 (0.092963)	0.048868 / 0.023109 (0.025758)	0.448942 / 0.275898 (0.173044)	0.498721 / 0.323480 (0.175241)	0.006825 / 0.007986 (-0.001161)	0.004587 / 0.004328 (0.000259)	0.104142 / 0.004250 (0.099891)	0.075521 / 0.037052 (0.038469)	0.439538 / 0.258489 (0.181049)	0.498720 / 0.293841 (0.204879)	0.051352 / 0.128546 (-0.077194)	0.015070 / 0.075646 (-0.060576)	0.441752 / 0.419271 (0.022480)	0.089166 / 0.043533 (0.045633)	0.428909 / 0.255139 (0.173770)	0.446648 / 0.283200 (0.163448)	0.042371 / 0.141683 (-0.099312)	1.993948 / 1.452155 (0.541793)	2.065756 / 1.492716 (0.573039)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.257279 / 0.018006 (0.239273)	0.575453 / 0.000490 (0.574964)	0.004120 / 0.000200 (0.003920)	0.000114 / 0.000054 (0.000060)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.034012 / 0.037411 (-0.003399)	0.141737 / 0.014526 (0.127211)	0.145241 / 0.176557 (-0.031316)	0.226196 / 0.737135 (-0.510939)	0.149526 / 0.296338 (-0.146813)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.665762 / 0.215209 (0.450553)	6.683737 / 2.077655 (4.606083)	2.869485 / 1.504120 (1.365365)	2.462808 / 1.541195 (0.921613)	2.526808 / 1.468490 (1.058318)	0.957518 / 4.584777 (-3.627259)	5.926261 / 3.745712 (2.180548)	5.027822 / 5.269862 (-0.242040)	2.643185 / 4.565676 (-1.922491)	0.117014 / 0.424275 (-0.307261)	0.015142 / 0.007607 (0.007535)	0.835694 / 0.226044 (0.609650)	8.427356 / 2.268929 (6.158427)	3.649597 / 55.444624 (-51.795027)	2.989607 / 6.876477 (-3.886870)	3.043160 / 2.142072 (0.901088)	1.158872 / 4.805227 (-3.646355)	0.240456 / 6.500664 (-6.260208)	0.089196 / 0.075469 (0.013726)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.689361 / 1.841788 (-0.152427)	18.842158 / 8.074308 (10.767850)	22.604249 / 10.191392 (12.412857)	0.248487 / 0.680424 (-0.431936)	0.029668 / 0.534201 (-0.504533)	0.536283 / 0.579283 (-0.043001)	0.663253 / 0.434364 (0.228890)	0.622973 / 0.540337 (0.082635)	0.735297 / 1.386936 (-0.651639)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009296 / 0.011353 (-0.002057)	0.005955 / 0.011008 (-0.005053)	0.105723 / 0.038508 (0.067215)	0.051184 / 0.023109 (0.028074)	0.527095 / 0.275898 (0.251197)	0.631697 / 0.323480 (0.308217)	0.006577 / 0.007986 (-0.001408)	0.004452 / 0.004328 (0.000124)	0.105921 / 0.004250 (0.101670)	0.071951 / 0.037052 (0.034899)	0.572518 / 0.258489 (0.314029)	0.623957 / 0.293841 (0.330116)	0.050861 / 0.128546 (-0.077686)	0.014897 / 0.075646 (-0.060749)	0.122013 / 0.419271 (-0.297258)	0.067194 / 0.043533 (0.023661)	0.530352 / 0.255139 (0.275213)	0.563912 / 0.283200 (0.280712)	0.034756 / 0.141683 (-0.106927)	1.961580 / 1.452155 (0.509425)	2.052412 / 1.492716 (0.559696)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.304996 / 0.018006 (0.286990)	0.584899 / 0.000490 (0.584409)	0.010444 / 0.000200 (0.010244)	0.000134 / 0.000054 (0.000080)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.032540 / 0.037411 (-0.004871)	0.137349 / 0.014526 (0.122823)	0.146233 / 0.176557 (-0.030323)	0.206978 / 0.737135 (-0.530157)	0.154380 / 0.296338 (-0.141959)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.705438 / 0.215209 (0.490229)	7.042159 / 2.077655 (4.964504)	3.285501 / 1.504120 (1.781381)	2.904710 / 1.541195 (1.363515)	2.952838 / 1.468490 (1.484348)	0.987784 / 4.584777 (-3.596993)	5.949550 / 3.745712 (2.203838)	2.927148 / 5.269862 (-2.342714)	1.870054 / 4.565676 (-2.695622)	0.119548 / 0.424275 (-0.304727)	0.014565 / 0.007607 (0.006958)	0.858311 / 0.226044 (0.632266)	8.721679 / 2.268929 (6.452750)	4.100825 / 55.444624 (-51.343800)	3.358093 / 6.876477 (-3.518383)	3.499637 / 2.142072 (1.357564)	1.208932 / 4.805227 (-3.596295)	0.232961 / 6.500664 (-6.267703)	0.089727 / 0.075469 (0.014258)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.780143 / 1.841788 (-0.061645)	19.074991 / 8.074308 (11.000683)	21.218487 / 10.191392 (11.027095)	0.258690 / 0.680424 (-0.421734)	0.029514 / 0.534201 (-0.504687)	0.541764 / 0.579283 (-0.037519)	0.640603 / 0.434364 (0.206239)	0.635336 / 0.540337 (0.094999)	0.756309 / 1.386936 (-0.630627)

github-actions · 2023-06-27T16:10:08Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009619 / 0.011353 (-0.001734)	0.005683 / 0.011008 (-0.005325)	0.136971 / 0.038508 (0.098463)	0.051607 / 0.023109 (0.028497)	0.439716 / 0.275898 (0.163818)	0.486193 / 0.323480 (0.162713)	0.006304 / 0.007986 (-0.001681)	0.004489 / 0.004328 (0.000160)	0.103837 / 0.004250 (0.099587)	0.082954 / 0.037052 (0.045901)	0.447286 / 0.258489 (0.188797)	0.495434 / 0.293841 (0.201593)	0.049244 / 0.128546 (-0.079302)	0.015176 / 0.075646 (-0.060470)	0.444406 / 0.419271 (0.025134)	0.074766 / 0.043533 (0.031233)	0.438585 / 0.255139 (0.183446)	0.438232 / 0.283200 (0.155032)	0.043372 / 0.141683 (-0.098311)	2.057286 / 1.452155 (0.605131)	2.049540 / 1.492716 (0.556824)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.298038 / 0.018006 (0.280031)	0.630771 / 0.000490 (0.630281)	0.008287 / 0.000200 (0.008087)	0.000123 / 0.000054 (0.000068)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.033637 / 0.037411 (-0.003775)	0.128327 / 0.014526 (0.113801)	0.150672 / 0.176557 (-0.025885)	0.228521 / 0.737135 (-0.508614)	0.142733 / 0.296338 (-0.153606)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.629072 / 0.215209 (0.413863)	6.612047 / 2.077655 (4.534392)	2.715594 / 1.504120 (1.211474)	2.327823 / 1.541195 (0.786628)	2.417508 / 1.468490 (0.949018)	0.959134 / 4.584777 (-3.625643)	5.669921 / 3.745712 (1.924209)	2.977920 / 5.269862 (-2.291941)	1.814564 / 4.565676 (-2.751112)	0.120233 / 0.424275 (-0.304042)	0.015859 / 0.007607 (0.008252)	0.822618 / 0.226044 (0.596574)	8.440306 / 2.268929 (6.171377)	3.721611 / 55.444624 (-51.723013)	2.954867 / 6.876477 (-3.921610)	3.135364 / 2.142072 (0.993292)	1.226475 / 4.805227 (-3.578752)	0.246658 / 6.500664 (-6.254006)	0.093920 / 0.075469 (0.018451)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.665631 / 1.841788 (-0.176157)	19.136369 / 8.074308 (11.062061)	23.659564 / 10.191392 (13.468172)	0.273430 / 0.680424 (-0.406994)	0.028180 / 0.534201 (-0.506021)	0.559588 / 0.579283 (-0.019695)	0.649203 / 0.434364 (0.214840)	0.647113 / 0.540337 (0.106776)	0.737978 / 1.386936 (-0.648958)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009104 / 0.011353 (-0.002249)	0.006838 / 0.011008 (-0.004171)	0.104516 / 0.038508 (0.066008)	0.047986 / 0.023109 (0.024877)	0.521849 / 0.275898 (0.245951)	0.586281 / 0.323480 (0.262801)	0.006225 / 0.007986 (-0.001760)	0.005713 / 0.004328 (0.001384)	0.111507 / 0.004250 (0.107257)	0.072320 / 0.037052 (0.035267)	0.551061 / 0.258489 (0.292572)	0.628034 / 0.293841 (0.334193)	0.055417 / 0.128546 (-0.073129)	0.019613 / 0.075646 (-0.056034)	0.123958 / 0.419271 (-0.295314)	0.066132 / 0.043533 (0.022600)	0.504461 / 0.255139 (0.249322)	0.560428 / 0.283200 (0.277229)	0.036098 / 0.141683 (-0.105585)	1.927398 / 1.452155 (0.475243)	2.015952 / 1.492716 (0.523235)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.313065 / 0.018006 (0.295059)	0.609174 / 0.000490 (0.608684)	0.008755 / 0.000200 (0.008555)	0.000120 / 0.000054 (0.000066)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.040042 / 0.037411 (0.002630)	0.136053 / 0.014526 (0.121527)	0.143406 / 0.176557 (-0.033150)	0.213080 / 0.737135 (-0.524055)	0.154730 / 0.296338 (-0.141609)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.692706 / 0.215209 (0.477497)	6.952968 / 2.077655 (4.875314)	3.232023 / 1.504120 (1.727903)	2.835450 / 1.541195 (1.294256)	2.933821 / 1.468490 (1.465331)	0.984712 / 4.584777 (-3.600065)	6.127651 / 3.745712 (2.381939)	2.956781 / 5.269862 (-2.313081)	1.879928 / 4.565676 (-2.685748)	0.111069 / 0.424275 (-0.313206)	0.014598 / 0.007607 (0.006991)	0.871486 / 0.226044 (0.645442)	8.588500 / 2.268929 (6.319572)	3.910740 / 55.444624 (-51.533885)	3.115781 / 6.876477 (-3.760695)	3.222367 / 2.142072 (1.080294)	1.229680 / 4.805227 (-3.575547)	0.232092 / 6.500664 (-6.268572)	0.097717 / 0.075469 (0.022248)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.774193 / 1.841788 (-0.067595)	19.863087 / 8.074308 (11.788779)	24.058856 / 10.191392 (13.867464)	0.214917 / 0.680424 (-0.465507)	0.028771 / 0.534201 (-0.505430)	0.544548 / 0.579283 (-0.034735)	0.655882 / 0.434364 (0.221518)	0.629110 / 0.540337 (0.088773)	0.749246 / 1.386936 (-0.637690)

github-actions · 2023-06-27T16:47:50Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007075 / 0.011353 (-0.004278)	0.005195 / 0.011008 (-0.005813)	0.113043 / 0.038508 (0.074535)	0.038442 / 0.023109 (0.015333)	0.336310 / 0.275898 (0.060412)	0.381888 / 0.323480 (0.058409)	0.005990 / 0.007986 (-0.001996)	0.003893 / 0.004328 (-0.000435)	0.093123 / 0.004250 (0.088872)	0.058449 / 0.037052 (0.021397)	0.359463 / 0.258489 (0.100974)	0.427485 / 0.293841 (0.133644)	0.041454 / 0.128546 (-0.087092)	0.013016 / 0.075646 (-0.062630)	0.372849 / 0.419271 (-0.046422)	0.059386 / 0.043533 (0.015853)	0.381398 / 0.255139 (0.126259)	0.367603 / 0.283200 (0.084403)	0.033907 / 0.141683 (-0.107775)	1.628903 / 1.452155 (0.176749)	1.764131 / 1.492716 (0.271415)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.298329 / 0.018006 (0.280322)	0.593030 / 0.000490 (0.592540)	0.007653 / 0.000200 (0.007453)	0.000091 / 0.000054 (0.000036)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025445 / 0.037411 (-0.011966)	0.112062 / 0.014526 (0.097536)	0.119863 / 0.176557 (-0.056693)	0.178389 / 0.737135 (-0.558746)	0.129934 / 0.296338 (-0.166404)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.532834 / 0.215209 (0.317625)	5.250908 / 2.077655 (3.173253)	2.086920 / 1.504120 (0.582800)	1.799745 / 1.541195 (0.258550)	1.909648 / 1.468490 (0.441158)	0.825382 / 4.584777 (-3.759395)	5.268304 / 3.745712 (1.522592)	2.533347 / 5.269862 (-2.736515)	1.730187 / 4.565676 (-2.835490)	0.099824 / 0.424275 (-0.324451)	0.012969 / 0.007607 (0.005362)	0.732234 / 0.226044 (0.506189)	6.989066 / 2.268929 (4.720138)	2.873486 / 55.444624 (-52.571138)	2.274351 / 6.876477 (-4.602125)	2.311060 / 2.142072 (0.168987)	1.125366 / 4.805227 (-3.679861)	0.214522 / 6.500664 (-6.286142)	0.077579 / 0.075469 (0.002110)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.670950 / 1.841788 (-0.170838)	18.131528 / 8.074308 (10.057220)	21.277823 / 10.191392 (11.086431)	0.238807 / 0.680424 (-0.441617)	0.032251 / 0.534201 (-0.501950)	0.503859 / 0.579283 (-0.075424)	0.604825 / 0.434364 (0.170461)	0.555623 / 0.540337 (0.015286)	0.647301 / 1.386936 (-0.739635)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.010857 / 0.011353 (-0.000496)	0.005581 / 0.011008 (-0.005427)	0.094346 / 0.038508 (0.055838)	0.053084 / 0.023109 (0.029975)	0.457586 / 0.275898 (0.181688)	0.545475 / 0.323480 (0.221995)	0.006761 / 0.007986 (-0.001225)	0.005094 / 0.004328 (0.000765)	0.095509 / 0.004250 (0.091258)	0.077182 / 0.037052 (0.040130)	0.498717 / 0.258489 (0.240228)	0.542433 / 0.293841 (0.248592)	0.051547 / 0.128546 (-0.076999)	0.014633 / 0.075646 (-0.061014)	0.106843 / 0.419271 (-0.312428)	0.068459 / 0.043533 (0.024926)	0.435793 / 0.255139 (0.180654)	0.475484 / 0.283200 (0.192285)	0.039495 / 0.141683 (-0.102188)	1.684906 / 1.452155 (0.232751)	1.798693 / 1.492716 (0.305976)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.279853 / 0.018006 (0.261847)	0.601016 / 0.000490 (0.600526)	0.002055 / 0.000200 (0.001855)	0.000219 / 0.000054 (0.000165)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.030935 / 0.037411 (-0.006477)	0.121197 / 0.014526 (0.106671)	0.143360 / 0.176557 (-0.033197)	0.200862 / 0.737135 (-0.536274)	0.138656 / 0.296338 (-0.157683)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.613904 / 0.215209 (0.398695)	6.155422 / 2.077655 (4.077767)	2.777238 / 1.504120 (1.273118)	2.473045 / 1.541195 (0.931851)	2.604470 / 1.468490 (1.135980)	0.898871 / 4.584777 (-3.685906)	5.739666 / 3.745712 (1.993954)	4.719822 / 5.269862 (-0.550040)	2.727354 / 4.565676 (-1.838322)	0.108232 / 0.424275 (-0.316043)	0.013632 / 0.007607 (0.006025)	0.771802 / 0.226044 (0.545757)	7.987466 / 2.268929 (5.718537)	3.609856 / 55.444624 (-51.834768)	2.974421 / 6.876477 (-3.902056)	2.956567 / 2.142072 (0.814495)	1.093792 / 4.805227 (-3.711435)	0.213369 / 6.500664 (-6.287295)	0.084486 / 0.075469 (0.009017)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.693855 / 1.841788 (-0.147933)	18.055027 / 8.074308 (9.980719)	21.397964 / 10.191392 (11.206571)	0.240549 / 0.680424 (-0.439875)	0.031212 / 0.534201 (-0.502989)	0.513657 / 0.579283 (-0.065626)	0.651348 / 0.434364 (0.216985)	0.603740 / 0.540337 (0.063402)	0.752287 / 1.386936 (-0.634649)

mariosasko added 3 commits June 12, 2023 17:40

Deprecate metadata/readme utils

3dc8f0c

Use RepoCard API instead of DatasetMetadata

734e7d9

Bump hfh

6a57812

Bump hfh in CI

6a98ff4

Nit

2b6cc63

mariosasko requested a review from lhoestq June 19, 2023 13:07

lhoestq approved these changes Jun 27, 2023

View reviewed changes

src/datasets/arrow_dataset.py Outdated Show resolved Hide resolved

src/datasets/dataset_dict.py Outdated Show resolved Hide resolved

mariosasko added 3 commits June 27, 2023 17:15

Rename dataset_metadata to dataset_card_data

07bdf9c

Merge branch 'main' of github.com:huggingface/datasets into deprecate…

2591cd4

…-metadata-utils

Fix

1b525c1

One more fix

f4a5ea6

mariosasko merged commit 6f3f38d into main Jun 27, 2023
13 checks passed

mariosasko deleted the deprecate-metadata-utils branch June 27, 2023 16:38

Replace metadata utils with huggingface_hub's RepoCard API #5949

Replace metadata utils with huggingface_hub's RepoCard API #5949

Conversation

mariosasko commented Jun 13, 2023 • edited

HuggingFaceDocBuilderDev commented Jun 13, 2023 • edited

github-actions bot commented Jun 13, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Jun 13, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Jun 13, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

lhoestq left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 27, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Jun 27, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Jun 27, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Jun 27, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Replace metadata utils with `huggingface_hub`'s RepoCard API #5949

Replace metadata utils with `huggingface_hub`'s RepoCard API #5949

mariosasko commented Jun 13, 2023 •

edited

HuggingFaceDocBuilderDev commented Jun 13, 2023 •

edited