Skip to content

Commit

Permalink
Add course banner (#2506)
Browse files Browse the repository at this point in the history
  • Loading branch information
sgugger committed Jun 15, 2021
1 parent 4cee629 commit fd1ff89
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@

[🔎 **Find a dataset in the Hub**](https://huggingface.co/datasets) [🌟 **Add a new dataset to the Hub**](https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md)

<h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/datasets/master/docs/source/imgs/course_banner.png"></a>
</h3>

`🤗Datasets` also provides access to +15 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics.

`🤗Datasets` has many additional interesting features:
Expand Down
Binary file added docs/source/imgs/course_banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

1 comment on commit fd1ff89

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==1.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.020600 / 0.011353 (0.009247) 0.013815 / 0.011008 (0.002806) 0.043594 / 0.038508 (0.005086) 0.130313 / 0.023109 (0.107203) 0.317361 / 0.275898 (0.041463) 0.348145 / 0.323480 (0.024665) 0.010640 / 0.007986 (0.002655) 0.005050 / 0.004328 (0.000722) 0.010570 / 0.004250 (0.006319) 0.052653 / 0.037052 (0.015601) 0.315705 / 0.258489 (0.057216) 0.348086 / 0.293841 (0.054245) 0.134357 / 0.128546 (0.005811) 0.101267 / 0.075646 (0.025621) 0.367242 / 0.419271 (-0.052030) 0.305279 / 0.043533 (0.261746) 0.309456 / 0.255139 (0.054317) 0.341789 / 0.283200 (0.058590) 1.526240 / 0.141683 (1.384557) 1.542416 / 1.452155 (0.090262) 1.635751 / 1.492716 (0.143035)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.014145 / 0.018006 (-0.003861) 0.523001 / 0.000490 (0.522512) 0.002874 / 0.000200 (0.002674) 0.000201 / 0.000054 (0.000146)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039364 / 0.037411 (0.001952) 0.023607 / 0.014526 (0.009082) 0.027448 / 0.176557 (-0.149109) 0.043385 / 0.737135 (-0.693750) 0.028188 / 0.296338 (-0.268150)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.366953 / 0.215209 (0.151744) 3.654711 / 2.077655 (1.577056) 1.874264 / 1.504120 (0.370144) 1.720019 / 1.541195 (0.178824) 1.789258 / 1.468490 (0.320768) 5.334234 / 4.584777 (0.749457) 4.858089 / 3.745712 (1.112377) 7.180136 / 5.269862 (1.910274) 6.364526 / 4.565676 (1.798850) 0.539284 / 0.424275 (0.115009) 0.009600 / 0.007607 (0.001992) 0.469785 / 0.226044 (0.243741) 4.780352 / 2.268929 (2.511424) 2.313619 / 55.444624 (-53.131006) 1.914870 / 6.876477 (-4.961607) 2.007248 / 2.142072 (-0.134824) 5.515004 / 4.805227 (0.709777) 4.945386 / 6.500664 (-1.555278) 8.794366 / 0.075469 (8.718897)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.142235 / 1.841788 (9.300447) 13.409735 / 8.074308 (5.335427) 25.929131 / 10.191392 (15.737739) 0.743724 / 0.680424 (0.063300) 0.527258 / 0.534201 (-0.006943) 0.665889 / 0.579283 (0.086606) 0.491152 / 0.434364 (0.056788) 0.583351 / 0.540337 (0.043014) 1.329381 / 1.386936 (-0.057555)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017631 / 0.011353 (0.006278) 0.013485 / 0.011008 (0.002477) 0.042357 / 0.038508 (0.003849) 0.033346 / 0.023109 (0.010237) 0.292560 / 0.275898 (0.016662) 0.325230 / 0.323480 (0.001750) 0.009444 / 0.007986 (0.001458) 0.005073 / 0.004328 (0.000745) 0.010593 / 0.004250 (0.006342) 0.043931 / 0.037052 (0.006879) 0.291214 / 0.258489 (0.032724) 0.333809 / 0.293841 (0.039968) 0.112224 / 0.128546 (-0.016322) 0.098002 / 0.075646 (0.022356) 0.348803 / 0.419271 (-0.070468) 0.348414 / 0.043533 (0.304881) 0.288444 / 0.255139 (0.033305) 0.276472 / 0.283200 (-0.006728) 1.356135 / 0.141683 (1.214452) 1.575869 / 1.452155 (0.123714) 1.461333 / 1.492716 (-0.031383)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.054461 / 0.018006 (0.036455) 0.525184 / 0.000490 (0.524694) 0.037002 / 0.000200 (0.036802) 0.000291 / 0.000054 (0.000237)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.032164 / 0.037411 (-0.005247) 0.020971 / 0.014526 (0.006445) 0.022187 / 0.176557 (-0.154370) 0.038513 / 0.737135 (-0.698622) 0.023359 / 0.296338 (-0.272979)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.345558 / 0.215209 (0.130349) 3.447854 / 2.077655 (1.370199) 1.683330 / 1.504120 (0.179210) 1.511848 / 1.541195 (-0.029346) 1.591854 / 1.468490 (0.123363) 5.220326 / 4.584777 (0.635549) 4.625000 / 3.745712 (0.879288) 6.797323 / 5.269862 (1.527462) 5.261775 / 4.565676 (0.696098) 0.467236 / 0.424275 (0.042961) 0.008427 / 0.007607 (0.000820) 0.400014 / 0.226044 (0.173970) 3.976132 / 2.268929 (1.707204) 1.894172 / 55.444624 (-53.550452) 1.616258 / 6.876477 (-5.260219) 1.650921 / 2.142072 (-0.491151) 5.300950 / 4.805227 (0.495722) 4.590880 / 6.500664 (-1.909784) 9.549721 / 0.075469 (9.474252)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.149815 / 1.841788 (9.308027) 12.233428 / 8.074308 (4.159120) 22.421912 / 10.191392 (12.230520) 0.726083 / 0.680424 (0.045659) 0.446665 / 0.534201 (-0.087536) 0.545466 / 0.579283 (-0.033818) 0.468461 / 0.434364 (0.034097) 0.552571 / 0.540337 (0.012233) 1.146446 / 1.386936 (-0.240490)

CML watermark

Please sign in to comment.