Re-enable import sorting disabled by flake8:noqa directive when using ruff linter #6946

albertvillanova · 2024-06-03T06:24:47Z

Re-enable import sorting that was wrongly disabled by flake8: noqa directive after switching to ruff linter in datasets-2.10.0 PR:

Lint code with ruff #5519

Note that after the linter switch, we wrongly replaced flake8: noqa with ruff: noqa in datasets-2.17.0 PR:

Migrate from setup.cfg to pyproject.toml #6619

That replacement was wrong because we kept the isort: skip directives although they were indeed disabled by flake8: noqa first and by ruff: noqa afterwards. See for example __init__.py file after the linter switch:

We kept the flake8: noqa directive

datasets/src/datasets/__init__.py

Line 1 in 06ae3f6

# flake8: noqa

Whereas we also kept the isort: skip directives (that were disabled)

datasets/src/datasets/__init__.py

Lines 82 to 84 in 06ae3f6

    
           from datasets import arrow_dataset as _arrow_dataset  # isort:skip 
        
           from datasets import utils as _utils  # isort:skip 
        
           from datasets.utils import download_manager as _deprecated_download_manager  # isort:skip

Fix #6942.

HuggingFaceDocBuilderDev · 2024-06-03T06:27:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2024-06-04T10:00:07Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004847 / 0.011353 (-0.006506)	0.003199 / 0.011008 (-0.007810)	0.060677 / 0.038508 (0.022169)	0.030544 / 0.023109 (0.007435)	0.240870 / 0.275898 (-0.035028)	0.261320 / 0.323480 (-0.062160)	0.002816 / 0.007986 (-0.005170)	0.002483 / 0.004328 (-0.001845)	0.048527 / 0.004250 (0.044277)	0.045496 / 0.037052 (0.008444)	0.251296 / 0.258489 (-0.007193)	0.285746 / 0.293841 (-0.008095)	0.025076 / 0.128546 (-0.103470)	0.009417 / 0.075646 (-0.066229)	0.191361 / 0.419271 (-0.227911)	0.033778 / 0.043533 (-0.009755)	0.235581 / 0.255139 (-0.019558)	0.261069 / 0.283200 (-0.022131)	0.018255 / 0.141683 (-0.123428)	1.098437 / 1.452155 (-0.353718)	1.127124 / 1.492716 (-0.365592)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.004479 / 0.018006 (-0.013527)	0.283706 / 0.000490 (0.283216)	0.000214 / 0.000200 (0.000014)	0.000043 / 0.000054 (-0.000011)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.018364 / 0.037411 (-0.019048)	0.058398 / 0.014526 (0.043872)	0.073056 / 0.176557 (-0.103501)	0.117147 / 0.737135 (-0.619989)	0.073683 / 0.296338 (-0.222656)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.265121 / 0.215209 (0.049912)	2.636981 / 2.077655 (0.559327)	1.380192 / 1.504120 (-0.123928)	1.270779 / 1.541195 (-0.270416)	1.295729 / 1.468490 (-0.172762)	0.523768 / 4.584777 (-4.061009)	2.295720 / 3.745712 (-1.449992)	2.519211 / 5.269862 (-2.750650)	1.618712 / 4.565676 (-2.946965)	0.058321 / 0.424275 (-0.365954)	0.004492 / 0.007607 (-0.003115)	0.316101 / 0.226044 (0.090057)	3.169913 / 2.268929 (0.900984)	1.793412 / 55.444624 (-53.651213)	1.473784 / 6.876477 (-5.402693)	1.565325 / 2.142072 (-0.576748)	0.592734 / 4.805227 (-4.212493)	0.109333 / 6.500664 (-6.391331)	0.039063 / 0.075469 (-0.036406)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.935504 / 1.841788 (-0.906284)	10.865520 / 8.074308 (2.791212)	9.219337 / 10.191392 (-0.972055)	0.135284 / 0.680424 (-0.545140)	0.013664 / 0.534201 (-0.520537)	0.271601 / 0.579283 (-0.307682)	0.260456 / 0.434364 (-0.173908)	0.302931 / 0.540337 (-0.237406)	0.414643 / 1.386936 (-0.972293)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.004801 / 0.011353 (-0.006552)	0.003092 / 0.011008 (-0.007917)	0.046471 / 0.038508 (0.007963)	0.031337 / 0.023109 (0.008228)	0.258920 / 0.275898 (-0.016978)	0.269842 / 0.323480 (-0.053638)	0.003976 / 0.007986 (-0.004009)	0.002661 / 0.004328 (-0.001668)	0.045676 / 0.004250 (0.041426)	0.038199 / 0.037052 (0.001146)	0.277382 / 0.258489 (0.018893)	0.289351 / 0.293841 (-0.004490)	0.028452 / 0.128546 (-0.100094)	0.009737 / 0.075646 (-0.065910)	0.055201 / 0.419271 (-0.364071)	0.032686 / 0.043533 (-0.010847)	0.259617 / 0.255139 (0.004478)	0.277163 / 0.283200 (-0.006037)	0.017825 / 0.141683 (-0.123858)	1.102797 / 1.452155 (-0.349357)	1.105018 / 1.492716 (-0.387699)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.094844 / 0.018006 (0.076838)	0.290519 / 0.000490 (0.290029)	0.000211 / 0.000200 (0.000012)	0.000050 / 0.000054 (-0.000004)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.021917 / 0.037411 (-0.015494)	0.075278 / 0.014526 (0.060753)	0.085971 / 0.176557 (-0.090586)	0.127072 / 0.737135 (-0.610063)	0.088244 / 0.296338 (-0.208095)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.276704 / 0.215209 (0.061495)	2.736960 / 2.077655 (0.659305)	1.519634 / 1.504120 (0.015514)	1.403026 / 1.541195 (-0.138168)	1.418465 / 1.468490 (-0.050025)	0.552425 / 4.584777 (-4.032352)	0.955244 / 3.745712 (-2.790468)	2.556563 / 5.269862 (-2.713298)	1.705095 / 4.565676 (-2.860582)	0.061212 / 0.424275 (-0.363063)	0.004707 / 0.007607 (-0.002900)	0.326284 / 0.226044 (0.100239)	3.253911 / 2.268929 (0.984983)	1.868649 / 55.444624 (-53.575976)	1.598697 / 6.876477 (-5.277780)	1.682617 / 2.142072 (-0.459455)	0.606379 / 4.805227 (-4.198848)	0.114126 / 6.500664 (-6.386538)	0.038869 / 0.075469 (-0.036601)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	0.966354 / 1.841788 (-0.875433)	11.575918 / 8.074308 (3.501609)	9.816597 / 10.191392 (-0.374795)	0.141492 / 0.680424 (-0.538932)	0.015375 / 0.534201 (-0.518826)	0.276027 / 0.579283 (-0.303256)	0.118979 / 0.434364 (-0.315385)	0.313467 / 0.540337 (-0.226870)	0.403539 / 1.386936 (-0.983397)

albertvillanova added 5 commits June 3, 2024 08:00

Replace wrong ruff:noqa directive with per-file-ignores setting

1fa740d

Remove unnecessary noqa F401

e92916b

Replace isort:skip with isort:split

1a1b97e

Fix import sorting

15f8d5d

Replace inline noqa:F401 with per-file

f7efc01

albertvillanova merged commit 1b59c75 into main Jun 4, 2024
12 checks passed

albertvillanova deleted the fix-6942 branch June 4, 2024 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable import sorting disabled by flake8:noqa directive when using ruff linter #6946

Re-enable import sorting disabled by flake8:noqa directive when using ruff linter #6946

albertvillanova commented Jun 3, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 3, 2024

github-actions bot commented Jun 4, 2024

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

	from datasets import arrow_dataset as _arrow_dataset # isort:skip
	from datasets import utils as _utils # isort:skip
	from datasets.utils import download_manager as _deprecated_download_manager # isort:skip

Re-enable import sorting disabled by flake8:noqa directive when using ruff linter #6946

Re-enable import sorting disabled by flake8:noqa directive when using ruff linter #6946

Conversation

albertvillanova commented Jun 3, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jun 3, 2024

github-actions bot commented Jun 4, 2024

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova commented Jun 3, 2024 •

edited

Loading