Skip to content

Commit

Permalink
fix flaky test (#3388)
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Dec 6, 2021
1 parent 127746c commit 7044abc
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions tests/test_arrow_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2818,11 +2818,12 @@ def test_dummy_dataset_serialize_s3(s3, dataset):
def test_build_local_temp_path(uri_or_path):
extracted_path = extract_path_from_uri(uri_or_path)
local_temp_path = Dataset._build_local_temp_path(extracted_path)
path_relative_to_tmp_dir = local_temp_path.as_posix().split("tmp", 1)[1].split("/", 1)[1]

assert (
"tmp" in local_temp_path.as_posix()
and "hdfs" not in local_temp_path.as_posix()
and "s3" not in local_temp_path.as_posix()
and "hdfs" not in path_relative_to_tmp_dir
and "s3" not in path_relative_to_tmp_dir
and not local_temp_path.as_posix().startswith(extracted_path)
and local_temp_path.as_posix().endswith(extracted_path)
), f"Local temp path: {local_temp_path.as_posix()}"
Expand Down

1 comment on commit 7044abc

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.070699 / 0.011353 (0.059346) 0.004193 / 0.011008 (-0.006815) 0.031715 / 0.038508 (-0.006793) 0.035752 / 0.023109 (0.012643) 0.294863 / 0.275898 (0.018965) 0.337742 / 0.323480 (0.014262) 0.084470 / 0.007986 (0.076485) 0.004458 / 0.004328 (0.000130) 0.009437 / 0.004250 (0.005187) 0.042981 / 0.037052 (0.005929) 0.312178 / 0.258489 (0.053689) 0.327885 / 0.293841 (0.034044) 0.085835 / 0.128546 (-0.042711) 0.008983 / 0.075646 (-0.066664) 0.254496 / 0.419271 (-0.164775) 0.047083 / 0.043533 (0.003551) 0.299892 / 0.255139 (0.044753) 0.325850 / 0.283200 (0.042650) 0.089218 / 0.141683 (-0.052464) 1.761106 / 1.452155 (0.308952) 1.805954 / 1.492716 (0.313238)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.304031 / 0.018006 (0.286025) 0.537692 / 0.000490 (0.537202) 0.005011 / 0.000200 (0.004812) 0.000105 / 0.000054 (0.000050)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035792 / 0.037411 (-0.001619) 0.022816 / 0.014526 (0.008290) 0.026836 / 0.176557 (-0.149720) 0.195998 / 0.737135 (-0.541137) 0.027922 / 0.296338 (-0.268417)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.412201 / 0.215209 (0.196992) 4.121954 / 2.077655 (2.044300) 1.765394 / 1.504120 (0.261274) 1.563060 / 1.541195 (0.021865) 1.641955 / 1.468490 (0.173465) 0.415973 / 4.584777 (-4.168804) 4.608737 / 3.745712 (0.863025) 2.157667 / 5.269862 (-3.112195) 0.874131 / 4.565676 (-3.691545) 0.049380 / 0.424275 (-0.374895) 0.011044 / 0.007607 (0.003437) 0.522776 / 0.226044 (0.296731) 5.207844 / 2.268929 (2.938916) 2.262288 / 55.444624 (-53.182336) 1.881652 / 6.876477 (-4.994825) 2.011469 / 2.142072 (-0.130604) 0.524869 / 4.805227 (-4.280359) 0.113064 / 6.500664 (-6.387600) 0.056980 / 0.075469 (-0.018489)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.527436 / 1.841788 (-0.314352) 12.168665 / 8.074308 (4.094357) 26.702812 / 10.191392 (16.511420) 0.814171 / 0.680424 (0.133747) 0.519133 / 0.534201 (-0.015068) 0.372972 / 0.579283 (-0.206311) 0.501903 / 0.434364 (0.067539) 0.262499 / 0.540337 (-0.277838) 0.277645 / 1.386936 (-1.109291)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.068532 / 0.011353 (0.057179) 0.003917 / 0.011008 (-0.007091) 0.029450 / 0.038508 (-0.009058) 0.033211 / 0.023109 (0.010102) 0.307644 / 0.275898 (0.031746) 0.331848 / 0.323480 (0.008368) 0.085020 / 0.007986 (0.077035) 0.004239 / 0.004328 (-0.000089) 0.007312 / 0.004250 (0.003061) 0.043183 / 0.037052 (0.006131) 0.322696 / 0.258489 (0.064207) 0.340425 / 0.293841 (0.046584) 0.084017 / 0.128546 (-0.044529) 0.008731 / 0.075646 (-0.066916) 0.251657 / 0.419271 (-0.167614) 0.044978 / 0.043533 (0.001445) 0.320003 / 0.255139 (0.064864) 0.333715 / 0.283200 (0.050516) 0.079514 / 0.141683 (-0.062169) 1.709808 / 1.452155 (0.257653) 1.729313 / 1.492716 (0.236596)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.339593 / 0.018006 (0.321586) 0.527659 / 0.000490 (0.527169) 0.001053 / 0.000200 (0.000853) 0.000085 / 0.000054 (0.000031)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.033144 / 0.037411 (-0.004268) 0.021736 / 0.014526 (0.007211) 0.028254 / 0.176557 (-0.148303) 0.197171 / 0.737135 (-0.539965) 0.028358 / 0.296338 (-0.267981)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.421011 / 0.215209 (0.205802) 4.179572 / 2.077655 (2.101917) 1.801029 / 1.504120 (0.296909) 1.590766 / 1.541195 (0.049571) 1.649338 / 1.468490 (0.180848) 0.414645 / 4.584777 (-4.170132) 4.695258 / 3.745712 (0.949545) 3.350324 / 5.269862 (-1.919537) 0.887645 / 4.565676 (-3.678031) 0.050123 / 0.424275 (-0.374152) 0.011121 / 0.007607 (0.003514) 0.530582 / 0.226044 (0.304537) 5.338311 / 2.268929 (3.069383) 2.278648 / 55.444624 (-53.165976) 1.894590 / 6.876477 (-4.981887) 1.990421 / 2.142072 (-0.151652) 0.532955 / 4.805227 (-4.272272) 0.115689 / 6.500664 (-6.384975) 0.056927 / 0.075469 (-0.018542)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.547316 / 1.841788 (-0.294472) 12.060450 / 8.074308 (3.986142) 26.891271 / 10.191392 (16.699879) 0.792661 / 0.680424 (0.112237) 0.560662 / 0.534201 (0.026461) 0.367206 / 0.579283 (-0.212077) 0.498945 / 0.434364 (0.064581) 0.251794 / 0.540337 (-0.288544) 0.261860 / 1.386936 (-1.125076)

CML watermark

Please sign in to comment.