Skip to content

Commit

Permalink
Release: 1.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Sep 11, 2020
1 parent 537de11 commit 7c9d2b5
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.0.0'
release = '1.0.1'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@

setup(
name='datasets',
version="1.0.0",
version="1.0.1",
description=DOCLINES[0],
long_description='\n'.join(DOCLINES[2:]),
author='HuggingFace Inc.',
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.0.0"
__version__ = "1.0.1"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit 7c9d2b5

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017524 / 0.011353 (0.006171) 0.014906 / 0.011008 (0.003897) 0.048144 / 0.038508 (0.009636) 0.033248 / 0.023109 (0.010139) 0.207761 / 0.275898 (-0.068137) 0.233792 / 0.323480 (-0.089688) 0.010391 / 0.007986 (0.002405) 0.004417 / 0.004328 (0.000089) 0.007209 / 0.004250 (0.002958) 0.048454 / 0.037052 (0.011401) 0.206914 / 0.258489 (-0.051575) 0.235856 / 0.293841 (-0.057985) 0.150854 / 0.128546 (0.022307) 0.109273 / 0.075646 (0.033627) 0.474241 / 0.419271 (0.054969) 0.494148 / 0.043533 (0.450615) 0.210729 / 0.255139 (-0.044410) 0.232321 / 0.283200 (-0.050878) 0.087238 / 0.141683 (-0.054444) 1.798743 / 1.452155 (0.346588) 1.857549 / 1.492716 (0.364832)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.040743 / 0.037411 (0.003332) 0.020057 / 0.014526 (0.005531) 0.080024 / 0.176557 (-0.096533) 0.091787 / 0.737135 (-0.645348) 0.028848 / 0.296338 (-0.267491)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.197777 / 0.215209 (-0.017432) 1.960749 / 2.077655 (-0.116906) 1.209420 / 1.504120 (-0.294700) 1.145505 / 1.541195 (-0.395690) 1.206787 / 1.468490 (-0.261703) 5.678803 / 4.584777 (1.094026) 4.770096 / 3.745712 (1.024384) 7.099757 / 5.269862 (1.829896) 5.943824 / 4.565676 (1.378147) 0.579212 / 0.424275 (0.154937) 0.011018 / 0.007607 (0.003411) 0.228887 / 0.226044 (0.002842) 2.310537 / 2.268929 (0.041608) 1.701750 / 55.444624 (-53.742874) 1.579074 / 6.876477 (-5.297403) 1.604054 / 2.142072 (-0.538018) 6.259134 / 4.805227 (1.453907) 5.531951 / 6.500664 (-0.968713) 9.077466 / 0.075469 (9.001997)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 87.894399 / 1.841788 (86.052611) 14.583336 / 8.074308 (6.509028) 12.908224 / 10.191392 (2.716831) 0.769353 / 0.680424 (0.088930) 0.284844 / 0.534201 (-0.249357) 0.755784 / 0.579283 (0.176501) 0.530526 / 0.434364 (0.096162) 0.724259 / 0.540337 (0.183922) 1.552518 / 1.386936 (0.165582)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017417 / 0.011353 (0.006064) 0.014671 / 0.011008 (0.003663) 0.051827 / 0.038508 (0.013319) 0.034549 / 0.023109 (0.011440) 0.349270 / 0.275898 (0.073372) 0.399157 / 0.323480 (0.075677) 0.009349 / 0.007986 (0.001363) 0.006687 / 0.004328 (0.002359) 0.009243 / 0.004250 (0.004992) 0.047348 / 0.037052 (0.010296) 0.353922 / 0.258489 (0.095433) 0.395749 / 0.293841 (0.101908) 0.140002 / 0.128546 (0.011456) 0.108984 / 0.075646 (0.033338) 0.470931 / 0.419271 (0.051660) 0.409738 / 0.043533 (0.366205) 0.341225 / 0.255139 (0.086086) 0.364671 / 0.283200 (0.081472) 0.098224 / 0.141683 (-0.043459) 1.778603 / 1.452155 (0.326448) 1.833615 / 1.492716 (0.340898)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045053 / 0.037411 (0.007642) 0.021157 / 0.014526 (0.006631) 0.042997 / 0.176557 (-0.133559) 0.089937 / 0.737135 (-0.647198) 0.050528 / 0.296338 (-0.245811)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.262003 / 0.215209 (0.046794) 2.529962 / 2.077655 (0.452307) 1.910566 / 1.504120 (0.406446) 1.845317 / 1.541195 (0.304122) 1.915949 / 1.468490 (0.447459) 5.893121 / 4.584777 (1.308344) 4.869569 / 3.745712 (1.123857) 6.956391 / 5.269862 (1.686530) 5.972397 / 4.565676 (1.406721) 0.576637 / 0.424275 (0.152362) 0.011047 / 0.007607 (0.003439) 0.282898 / 0.226044 (0.056853) 2.912559 / 2.268929 (0.643631) 22.452970 / 55.444624 (-32.991654) 3.978750 / 6.876477 (-2.897727) 2.145153 / 2.142072 (0.003081) 6.256698 / 4.805227 (1.451471) 2.877672 / 6.500664 (-3.622992) 0.038134 / 0.075469 (-0.037335)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 88.727407 / 1.841788 (86.885620) 15.248330 / 8.074308 (7.174022) 13.006195 / 10.191392 (2.814803) 0.886603 / 0.680424 (0.206180) 0.626198 / 0.534201 (0.091998) 0.769838 / 0.579283 (0.190555) 0.535416 / 0.434364 (0.101052) 0.721490 / 0.540337 (0.181153) 1.621136 / 1.386936 (0.234200)

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.020320 / 0.011353 (0.008967) 0.016385 / 0.011008 (0.005377) 0.046928 / 0.038508 (0.008420) 0.035288 / 0.023109 (0.012179) 0.210816 / 0.275898 (-0.065082) 0.231834 / 0.323480 (-0.091646) 0.012314 / 0.007986 (0.004328) 0.005391 / 0.004328 (0.001063) 0.006772 / 0.004250 (0.002522) 0.043256 / 0.037052 (0.006204) 0.214563 / 0.258489 (-0.043926) 0.229413 / 0.293841 (-0.064427) 0.166489 / 0.128546 (0.037943) 0.129593 / 0.075646 (0.053947) 0.452626 / 0.419271 (0.033355) 0.540712 / 0.043533 (0.497179) 0.212177 / 0.255139 (-0.042962) 0.223544 / 0.283200 (-0.059655) 0.089595 / 0.141683 (-0.052088) 1.832778 / 1.452155 (0.380623) 1.865998 / 1.492716 (0.373282)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039508 / 0.037411 (0.002097) 0.019868 / 0.014526 (0.005342) 0.023267 / 0.176557 (-0.153289) 0.090640 / 0.737135 (-0.646495) 0.029645 / 0.296338 (-0.266693)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.217846 / 0.215209 (0.002637) 2.163425 / 2.077655 (0.085770) 1.241875 / 1.504120 (-0.262245) 1.156025 / 1.541195 (-0.385170) 1.233065 / 1.468490 (-0.235425) 6.801158 / 4.584777 (2.216381) 5.651866 / 3.745712 (1.906154) 8.430519 / 5.269862 (3.160657) 7.131857 / 4.565676 (2.566180) 0.709563 / 0.424275 (0.285287) 0.011551 / 0.007607 (0.003944) 0.228385 / 0.226044 (0.002341) 2.461282 / 2.268929 (0.192353) 1.745377 / 55.444624 (-53.699247) 1.608081 / 6.876477 (-5.268395) 1.664353 / 2.142072 (-0.477719) 7.388348 / 4.805227 (2.583121) 6.159960 / 6.500664 (-0.340704) 7.858515 / 0.075469 (7.783046)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 124.938721 / 1.841788 (123.096933) 14.097457 / 8.074308 (6.023149) 15.039903 / 10.191392 (4.848511) 0.518424 / 0.680424 (-0.162000) 0.300904 / 0.534201 (-0.233297) 0.834503 / 0.579283 (0.255220) 0.657290 / 0.434364 (0.222926) 0.832955 / 0.540337 (0.292617) 1.692240 / 1.386936 (0.305304)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.021807 / 0.011353 (0.010455) 0.017734 / 0.011008 (0.006726) 0.045506 / 0.038508 (0.006998) 0.035886 / 0.023109 (0.012776) 0.340726 / 0.275898 (0.064827) 0.370450 / 0.323480 (0.046970) 0.011563 / 0.007986 (0.003577) 0.006359 / 0.004328 (0.002031) 0.008074 / 0.004250 (0.003823) 0.051232 / 0.037052 (0.014179) 0.327694 / 0.258489 (0.069205) 0.363641 / 0.293841 (0.069800) 0.176542 / 0.128546 (0.047995) 0.132411 / 0.075646 (0.056765) 0.461104 / 0.419271 (0.041833) 0.424982 / 0.043533 (0.381449) 0.338866 / 0.255139 (0.083727) 0.339020 / 0.283200 (0.055820) 0.092057 / 0.141683 (-0.049626) 1.829759 / 1.452155 (0.377604) 1.937484 / 1.492716 (0.444767)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041102 / 0.037411 (0.003691) 0.021929 / 0.014526 (0.007403) 0.028188 / 0.176557 (-0.148368) 0.094774 / 0.737135 (-0.642361) 0.030485 / 0.296338 (-0.265853)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.259924 / 0.215209 (0.044715) 2.753805 / 2.077655 (0.676151) 1.949279 / 1.504120 (0.445159) 1.751607 / 1.541195 (0.210412) 1.834902 / 1.468490 (0.366412) 6.916289 / 4.584777 (2.331512) 5.722064 / 3.745712 (1.976352) 8.139766 / 5.269862 (2.869905) 7.020143 / 4.565676 (2.454467) 0.747854 / 0.424275 (0.323578) 0.011374 / 0.007607 (0.003767) 0.306029 / 0.226044 (0.079985) 3.091757 / 2.268929 (0.822828) 16.310392 / 55.444624 (-39.134233) 3.393416 / 6.876477 (-3.483060) 2.128837 / 2.142072 (-0.013235) 7.265690 / 4.805227 (2.460463) 2.221492 / 6.500664 (-4.279172) 0.030780 / 0.075469 (-0.044689)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 120.788024 / 1.841788 (118.946236) 13.892661 / 8.074308 (5.818353) 16.158838 / 10.191392 (5.967446) 0.803574 / 0.680424 (0.123150) 0.639952 / 0.534201 (0.105751) 0.880895 / 0.579283 (0.301612) 0.654536 / 0.434364 (0.220172) 0.866551 / 0.540337 (0.326214) 1.656871 / 1.386936 (0.269935)

Please sign in to comment.