Skip to content

Commit

Permalink
fix flaky test
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Dec 6, 2021
1 parent 73ed661 commit 811469a
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions tests/test_arrow_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2818,11 +2818,12 @@ def test_dummy_dataset_serialize_s3(s3, dataset):
def test_build_local_temp_path(uri_or_path):
extracted_path = extract_path_from_uri(uri_or_path)
local_temp_path = Dataset._build_local_temp_path(extracted_path)
path_relative_to_tmp_dir = local_temp_path.as_posix().split("tmp", 1)[1].split("/", 1)[1]

assert (
"tmp" in local_temp_path.as_posix()
and "hdfs" not in local_temp_path.as_posix()
and "s3" not in local_temp_path.as_posix()
and "hdfs" not in path_relative_to_tmp_dir
and "s3" not in path_relative_to_tmp_dir
and not local_temp_path.as_posix().startswith(extracted_path)
and local_temp_path.as_posix().endswith(extracted_path)
), f"Local temp path: {local_temp_path.as_posix()}"
Expand Down

1 comment on commit 811469a

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.062912 / 0.011353 (0.051559) 0.003613 / 0.011008 (-0.007395) 0.030337 / 0.038508 (-0.008171) 0.031547 / 0.023109 (0.008437) 0.306951 / 0.275898 (0.031053) 0.338944 / 0.323480 (0.015464) 0.068836 / 0.007986 (0.060850) 0.003719 / 0.004328 (-0.000609) 0.009181 / 0.004250 (0.004931) 0.034440 / 0.037052 (-0.002612) 0.315535 / 0.258489 (0.057046) 0.341270 / 0.293841 (0.047429) 0.076013 / 0.128546 (-0.052533) 0.008571 / 0.075646 (-0.067075) 0.242025 / 0.419271 (-0.177247) 0.041013 / 0.043533 (-0.002520) 0.297285 / 0.255139 (0.042146) 0.344340 / 0.283200 (0.061140) 0.068695 / 0.141683 (-0.072988) 1.709301 / 1.452155 (0.257147) 1.779958 / 1.492716 (0.287242)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.229565 / 0.018006 (0.211559) 0.389985 / 0.000490 (0.389495) 0.002867 / 0.000200 (0.002667) 0.000074 / 0.000054 (0.000020)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.032867 / 0.037411 (-0.004545) 0.021656 / 0.014526 (0.007130) 0.026246 / 0.176557 (-0.150310) 0.191604 / 0.737135 (-0.545532) 0.026607 / 0.296338 (-0.269731)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.442135 / 0.215209 (0.226925) 4.433167 / 2.077655 (2.355512) 1.898676 / 1.504120 (0.394556) 1.679012 / 1.541195 (0.137817) 1.717934 / 1.468490 (0.249443) 0.436522 / 4.584777 (-4.148255) 4.299276 / 3.745712 (0.553564) 1.877680 / 5.269862 (-3.392181) 0.806399 / 4.565676 (-3.759278) 0.052095 / 0.424275 (-0.372180) 0.010620 / 0.007607 (0.003013) 0.552816 / 0.226044 (0.326772) 5.535037 / 2.268929 (3.266109) 2.352794 / 55.444624 (-53.091831) 1.975657 / 6.876477 (-4.900819) 2.024485 / 2.142072 (-0.117588) 0.554379 / 4.805227 (-4.250848) 0.113670 / 6.500664 (-6.386994) 0.058400 / 0.075469 (-0.017069)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.601399 / 1.841788 (-0.240389) 11.771175 / 8.074308 (3.696867) 28.680543 / 10.191392 (18.489151) 0.854885 / 0.680424 (0.174461) 0.550057 / 0.534201 (0.015856) 0.357867 / 0.579283 (-0.221416) 0.480666 / 0.434364 (0.046303) 0.236383 / 0.540337 (-0.303955) 0.251858 / 1.386936 (-1.135078)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.060812 / 0.011353 (0.049459) 0.003401 / 0.011008 (-0.007607) 0.027422 / 0.038508 (-0.011086) 0.028817 / 0.023109 (0.005708) 0.293475 / 0.275898 (0.017577) 0.331411 / 0.323480 (0.007932) 0.072004 / 0.007986 (0.064019) 0.003607 / 0.004328 (-0.000722) 0.006482 / 0.004250 (0.002231) 0.035495 / 0.037052 (-0.001557) 0.296089 / 0.258489 (0.037600) 0.343960 / 0.293841 (0.050119) 0.074767 / 0.128546 (-0.053779) 0.008697 / 0.075646 (-0.066949) 0.237454 / 0.419271 (-0.181817) 0.039532 / 0.043533 (-0.004001) 0.299725 / 0.255139 (0.044586) 0.325169 / 0.283200 (0.041970) 0.066757 / 0.141683 (-0.074926) 1.761910 / 1.452155 (0.309756) 1.766555 / 1.492716 (0.273839)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.293659 / 0.018006 (0.275653) 0.384773 / 0.000490 (0.384283) 0.033439 / 0.000200 (0.033239) 0.000439 / 0.000054 (0.000385)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.029790 / 0.037411 (-0.007621) 0.019930 / 0.014526 (0.005404) 0.024418 / 0.176557 (-0.152139) 0.188149 / 0.737135 (-0.548986) 0.026165 / 0.296338 (-0.270173)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.440486 / 0.215209 (0.225276) 4.378379 / 2.077655 (2.300725) 1.866348 / 1.504120 (0.362228) 1.648643 / 1.541195 (0.107449) 1.709261 / 1.468490 (0.240770) 0.438776 / 4.584777 (-4.146001) 4.191390 / 3.745712 (0.445677) 3.614909 / 5.269862 (-1.654953) 0.801913 / 4.565676 (-3.763764) 0.052584 / 0.424275 (-0.371691) 0.011082 / 0.007607 (0.003475) 0.550697 / 0.226044 (0.324652) 5.526946 / 2.268929 (3.258018) 2.314940 / 55.444624 (-53.129685) 1.942731 / 6.876477 (-4.933745) 1.993812 / 2.142072 (-0.148260) 0.551346 / 4.805227 (-4.253881) 0.114751 / 6.500664 (-6.385913) 0.058288 / 0.075469 (-0.017181)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.615081 / 1.841788 (-0.226707) 11.505331 / 8.074308 (3.431023) 29.232889 / 10.191392 (19.041497) 0.762121 / 0.680424 (0.081697) 0.565329 / 0.534201 (0.031128) 0.341697 / 0.579283 (-0.237586) 0.469790 / 0.434364 (0.035426) 0.223205 / 0.540337 (-0.317133) 0.231180 / 1.386936 (-1.155757)

CML watermark

Please sign in to comment.