Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy queries cause excessive network requests. #11934

Closed
ehuss opened this issue Apr 5, 2023 · 10 comments
Closed

Fuzzy queries cause excessive network requests. #11934

ehuss opened this issue Apr 5, 2023 · 10 comments
Labels
A-sparse-registry Area: http sparse registries C-bug Category: bug

Comments

@ehuss
Copy link
Contributor

ehuss commented Apr 5, 2023

When cargo can't find a crate name in a registry, it will also query every permutation of - and _ replacements (up to 1024 times) to see if there is a different match. When this code was written, it was assumed it would just be a simple query to git, but now that the sparse index does a network round trip for each one, it probably isn't a good assumption that doing 1024 is OK.

I think at a minimum it should lower the limit significantly.

Another option is to only do up to 3 queries, the original, all underscore, and all dashes.

@ehuss ehuss added C-bug Category: bug A-sparse-registry Area: http sparse registries labels Apr 5, 2023
@epage
Copy link
Contributor

epage commented Apr 5, 2023

Another option is to only do up to 3 queries, the original, all underscore, and all dashes.

I think this is personally reasonable. I was surprised when I saw what lengths it went through.

@ehuss
Copy link
Contributor Author

ehuss commented Apr 5, 2023

Here's a list of names on crates.io that use a mix of dash and underscore. There are 623 entries. That's not as many as I feared.

Click to expand list of 623 entries
a-star_traitbased
a42-__-
advancedresearch-agent_safety_layers
advancedresearch-asi_core0
advancedresearch-error_predictive_learning
advancedresearch-graph_builder
advancedresearch-higher_order_core
advancedresearch-higher_order_point
advancedresearch-max_tree
advancedresearch-nano_ecs
advancedresearch-path_iter
advancedresearch-rigid_body
advancedresearch-tree_mem_sort
advancedresearch-utility_programming
aflak_imgui-glium-renderer
aflak_imgui-sys
air-interpreter-wasm_branch_release
air-interpreter-wasm_ci_fix_version_increment
air-interpreter-wasm_crate_release
air-interpreter-wasm_fix
air-interpreter-wasm_fold_iterable
air-interpreter-wasm_get_rid_current_peer_id
air-interpreter-wasm_jvalue_flatten
air-interpreter-wasm_logger-i32
air-interpreter-wasm_new_it_types
air-interpreter-wasm_new_stepper_outcome
air-interpreter-wasm_prerelease
air-interpreter-wasm_security_tetraplets_reborn
air-interpreter-wasm_update_example
air-interpreter-wasm_update_images
air-interpreter-wasm_wasm_log
air-interpreter-wasm_wit
allegro_acodec-sys
allegro_audio-sys
allegro_color-sys
allegro_dialog-sys
allegro_font-sys
allegro_image-sys
allegro_memfile-sys
allegro_primitives-sys
allegro_ttf-sys
android_liblog-sys
android_log-sys
android_looper-sys
android_sensor-sys
aorist_extendr-api
aorist_extendr-engine
aorist_extendr-macros
asmkit-x86_64
asteracea_proc-macro-definitions
async_ach-cell
async_ach-mpmc
async_ach-notify
async_ach-pubsub
async_ach-ring
async_ach-spsc
async_ach-waker
async_ach-watch
async_fn-proc_macros
aux_ulib-sys
aws-codebuild-status_aws
aws-codebuild-status_derive
aws-codebuild-status_server
aws-codebuild-status_terminal
aws-codebuild-status_web
azul-wr_malloc_size_of
bachue-auto_impl
bech32-no_std
bird_tool_utils-man
black_marlin-compiler
black_marlin-macros
blake2-rfc_bellman_edition
blip_buf-sys
blizzard-engine_derive
bls_sigs_ref-rs
bootloader-x86_64-bios-boot-sector
bootloader-x86_64-bios-common
bootloader-x86_64-bios-stage-2
bootloader-x86_64-bios-stage-3
bootloader-x86_64-bios-stage-4
bootloader-x86_64-common
bootloader-x86_64-uefi
bsd_auth-sys
byte-strings-proc_macros
cargo-arch_test
cargo-generate-license_impl-license
cargo-wasi-exe-x86_64-apple-darwin
cargo-wasi-exe-x86_64-pc-windows-msvc
cargo-wasi-exe-x86_64-unknown-linux-musl
cargo_crates-io_docs-rs_test
cargo_crates-io_docs-rs_test2
clap_derive-v3
clean-node_modules
cloudflare-dns-updater_rs
const_env--value
const_env_impl--value
core_rustc-serialize
crate-_-name
crates_io-duplicates
create-rust-app_cli
crevice_notan-derive
crokey-proc_macros
crossbeam-skiplist_piedb
cursive_buffered_backend-git-branchless
custom-try-proc_macros
d-_-b
d3d10_1-sys
dash-and_underscore
defile-proc_macros
deno-x86_64-apple-darwin
deno-x86_64-pc-windows-msvc
deno-x86_64-unknown-linux-gnu
displaydoc-lite-proc_macros
django-query_derive
do-not-use-this-hello_cratesio
download_rs-silent
dscfg-cached_file_storage
dusk-bls12_381
dusk-bls12_381-sign
dyn_safe-proc_macros
egui-latest_wgpu_backend
encoding-next_index_tests
entity_data-macros
erlang_nif-sys
eskom_se_push-api
event_store-core
evercrypt_tiny-sys
evr_vista-sys
exonum_librocksdb-sys
exonum_libsodium-sys
ext-trait-proc_macros
fast-rustc-ap-rustc_ast
fast-rustc-ap-rustc_ast_pretty
fast-rustc-ap-rustc_attr
fast-rustc-ap-rustc_data_structures
fast-rustc-ap-rustc_errors
fast-rustc-ap-rustc_feature
fast-rustc-ap-rustc_fs_util
fast-rustc-ap-rustc_index
fast-rustc-ap-rustc_lexer
fast-rustc-ap-rustc_macros
fast-rustc-ap-rustc_parse
fast-rustc-ap-rustc_session
fast-rustc-ap-rustc_span
fast-rustc-ap-rustc_target
fawkes-crypto-bellman_ce
fawkes-crypto-pairing_ce
fawkes-crypto_derive
ff-uint_derive
ff_derive-zeroize
fix-hidden-lifetime-bug-proc_macros
flexible-locks_derive
foley-cargoexample-guessing_game
fruit-salad_proc-macro-definitions
fuel-pest_derive
fuel-pest_generator
fuel-pest_grammars
fuel-pest_meta
fullcodec-bls12_381
function_name-proc-macro
future-parking_lot
gdnative-visual_script
geng-debug_overlay
get_if_addrs-sys
ggez-assets_manager
ghosts-proc_macros
gluon_c-api
gluon_language-server
gmt_dos-actors
gmt_dos-clients_arrow
gmt_dos-clients_crseo
gmt_dos-clients_domeseeing
gmt_dos-clients_io
gmt_dos-clients_mount
gmt_m1-ctrl
gmt_m1-ctrl_actuators
gmt_m1-ctrl_center-actuators
gmt_m1-ctrl_hardpoints-dynamics
gmt_m1-ctrl_outer-actuators
gmt_mount-ctrl
gmt_mount-ctrl_sampling1000-damping002
gmt_mount-ctrl_sampling1000-damping002_ze00
gmt_mount-ctrl_sampling1000-damping002_ze30
gmt_mount-ctrl_sampling1000-damping002_ze30-500hz
gmt_mount-ctrl_sampling1000-damping002_ze60
gmt_mount-ctrl_sampling8000-damping0005
gmt_mount-ctrl_sampling8000-damping002
gmt_mount-ctrl_sampling8000-damping002_ze00
gmt_mount-ctrl_sampling8000-damping002_ze30
gmt_mount-ctrl_sampling8000-damping002_ze60
google-accessapproval1_beta1
google-accessapproval1_beta1-cli
google-accesscontextmanager1_beta
google-accesscontextmanager1_beta-cli
google-adexchangebuyer2_v2_beta1
google-adexchangebuyer2_v2_beta1-cli
google-admin1_directory
google-admin1_directory-cli
google-admin1_reports
google-admin1_reports-cli
google-admin2_email_migration
google-alertcenter1_beta1
google-alertcenter1_beta1-cli
google-analyticsadmin1_alpha
google-analyticsadmin1_alpha-cli
google-analyticsdata1_beta
google-analyticsdata1_beta-cli
google-appengine1_beta4
google-appengine1_beta4-cli
google-appengine1_beta5
google-appengine1_beta5-cli
google-area120tables1_alpha1
google-area120tables1_alpha1-cli
google-artifactregistry1_beta1
google-artifactregistry1_beta1-cli
google-autoscaler1_beta2
google-autoscaler1_beta2-cli
google-bigqueryconnection1_beta1
google-bigqueryconnection1_beta1-cli
google-billingbudgets1_beta1
google-billingbudgets1_beta1-cli
google-binaryauthorization1_beta1
google-binaryauthorization1_beta1-cli
google-cloudasset1_beta1
google-cloudasset1_beta1-cli
google-clouderrorreporting1_beta1
google-clouderrorreporting1_beta1-cli
google-cloudkms1_beta1
google-cloudkms1_beta1-cli
google-cloudmonitoring2_beta2
google-cloudmonitoring2_beta2-cli
google-cloudprivatecatalog1_beta1
google-cloudprivatecatalog1_beta1-cli
google-cloudprivatecatalogproducer1_beta1
google-cloudprivatecatalogproducer1_beta1-cli
google-cloudresourcemanager1_beta1
google-cloudresourcemanager1_beta1-cli
google-cloudscheduler1_beta1
google-cloudscheduler1_beta1-cli
google-cloudsupport2_beta
google-cloudsupport2_beta-cli
google-cloudtasks2_beta2
google-cloudtasks2_beta2-cli
google-cloudtasks2_beta3
google-cloudtasks2_beta3-cli
google-clouduseraccountsvm_beta
google-clouduseraccountsvm_beta-cli
google-commentanalyzer1_alpha1
google-commentanalyzer1_alpha1-cli
google-container1_beta1
google-containeranalysis1_beta1
google-containeranalysis1_beta1-cli
google-content2_sandbox
google-content2_sandbox-cli
google-datacatalog1_beta1
google-datacatalog1_beta1-cli
google-dataflow1_b4
google-datafusion1_beta1
google-datafusion1_beta1-cli
google-datalabeling1_beta1
google-datalabeling1_beta1-cli
google-datastore1_beta2
google-datastore1_beta3
google-datastore1_beta3-cli
google-deploymentmanager2_beta1
google-deploymentmanager2_beta2
google-deploymentmanager2_beta2-cli
google-dialogflow2_beta1
google-dialogflow2_beta1-cli
google-dlp2_beta1
google-dlp2_beta1-cli
google-dns1_beta1
google-documentai1_beta2
google-documentai1_beta2-cli
google-domains1_beta1
google-domains1_beta1-cli
google-factchecktools1_alpha1
google-factchecktools1_alpha1-cli
google-fcmdata1_beta1
google-fcmdata1_beta1-cli
google-file1_beta1
google-file1_beta1-cli
google-firebase1_beta1
google-firebase1_beta1-cli
google-firebaseappcheck1_beta
google-firebaseappcheck1_beta-cli
google-firebasedatabase1_beta
google-firebasedatabase1_beta-cli
google-firebasehosting1_beta1
google-firebasehosting1_beta1-cli
google-firebasestorage1_beta
google-firebasestorage1_beta-cli
google-firestore1_beta1
google-firestore1_beta1-cli
google-freebase1_sandbox
google-gamesconfiguration1_configuration
google-gamesconfiguration1_configuration-cli
google-gamesmanagement1_management
google-gamesmanagement1_management-cli
google-gan1_beta1
google-gan1_beta1-cli
google-genomics1_beta2
google-gmailpostmastertools1_beta1
google-gmailpostmastertools1_beta1-cli
google-healthcare1_beta1
google-healthcare1_beta1-cli
google-iap1_beta1
google-iap1_beta1-cli
google-ideahub1_beta
google-ideahub1_beta-cli
google-language1_beta1
google-language1_beta1-cli
google-lifesciences2_beta
google-lifesciences2_beta-cli
google-logging1_beta3
google-logging2_beta1
google-logging2_beta1-cli
google-manager1_beta2
google-manager1_beta2-cli
google-memcache1_beta2
google-memcache1_beta2-cli
google-metastore1_beta
google-metastore1_beta-cli
google-ml1_beta1
google-ml1_beta1-cli
google-networkconnectivity1_alpha1
google-networkconnectivity1_alpha1-cli
google-oauth2_v2
google-oslogin1_beta
google-oslogin1_beta-cli
google-privateca1_beta1
google-privateca1_beta1-cli
google-prod_tt_sasportal1_alpha1
google-prod_tt_sasportal1_alpha1-cli
google-proximitybeacon1_beta1
google-proximitybeacon1_beta1-cli
google-pubsub1_beta2
google-pubsub1_beta2-cli
google-recommendationengine1_beta1
google-recommendationengine1_beta1-cli
google-recommender1_beta1
google-recommender1_beta1-cli
google-replicapool1_beta2
google-replicapool1_beta2-cli
google-replicapoolupdater1_beta1
google-replicapoolupdater1_beta1-cli
google-reseller1_sandbox
google-reseller1_sandbox-cli
google-resourceviews1_beta2
google-resourceviews1_beta2-cli
google-runtimeconfig1_beta1
google-runtimeconfig1_beta1-cli
google-sasportal1_alpha1
google-sasportal1_alpha1-cli
google-secretmanager1_beta1
google-secretmanager1_beta1-cli
google-servicedirectory1_beta1
google-servicedirectory1_beta1-cli
google-spectrum1_explorer
google-spectrum1_explorer-cli
google-speech1_beta1
google-speech1_beta1-cli
google-sql1_beta4
google-sql1_beta4-cli
google-sqladmin1_beta4
google-sqladmin1_beta4-cli
google-taskqueue1_beta2
google-taskqueue1_beta2-cli
google-tpu1_alpha1
google-tpu1_alpha1-cli
google-transcoder1_beta1
google-transcoder1_beta1-cli
google-videointelligence1_beta1
google-videointelligence1_beta1-cli
gothack-future-parking_lot
gps-gunnlaug_18
graphannis-malloc_size_of
graphannis-malloc_size_of_derive
gt-bijective_connection_graph
guessing_game-79888c11-a5a6-4548-a26a-c85ea39d0499
guessing_game-panzerstadt
guessing_game-xfossdotcom
hkalbasi-rustc-ap-rustc_abi
hkalbasi-rustc-ap-rustc_data_structures
hkalbasi-rustc-ap-rustc_graphviz
hkalbasi-rustc-ap-rustc_index
hkalbasi-rustc-ap-rustc_macros
hkalbasi-rustc-ap-rustc_serialize
hvcg_governance_openapi_catholic-polity
ignite-rs_derive
inside-vm_arch_support
ip_network_table-deps-treebitmap
istherepie-hello_world
iter-identify_first_last
jelly-mem_access
jh-x86_64
jit_rs-sys
jpark011-guessing_game
jsonpath_lib-fl
koibumi-unifont_jp
konektor_db-sys
lattice_qcd_rs-procedural_macro
lazy-regex-proc_macros
lending-iterator-proc_macros
lens-rs_derive
lens-rs_generator
lettre-openssl111_email
lex-map_editor_std
lib_remotebuild-rs
libaeron_driver-sys
libpg_query-sys
libpg_query2-sys
libzfs_core-sys
lim1988_cargo-demo
limn-text_layout
linux_input-sys
liquidcrystal_i2c-rs
login_cap-sys
lpc177x_8x-hal
lukas_zech_software_hello-world
lz4_flex-util
m2-ctrl_asm
m2-ctrl_fsm
macro_rules_attribute-proc_macro
matio-rs_derive
mbnapi_uuid-sys
mdbook-pdf-headless_chrome
merge_derive-hashmap
metrics-exporter-async_std
metrics-exporter-http-async_std
mf_vista-sys
mfplat_vista-sys
mincore_downlevel-sys
mini_paste-proc_macro
minimp3_ex-sys
minimult_cortex-m
mock-it_codegen
mount-ctrl_sampling1000-damping002
mount-ctrl_sampling8000-damping0005
mproxy-socket_dispatch
msiz_rustc-ap-arena
msiz_rustc-ap-graphviz
msiz_rustc-ap-rustc_data_structures
msiz_rustc-ap-rustc_errors
msiz_rustc-ap-rustc_lexer
msiz_rustc-ap-rustc_macros
msiz_rustc-ap-rustc_target
msiz_rustc-ap-serialize
msiz_rustc-ap-syntax
msiz_rustc-ap-syntax_pos
msv1_0-sys
my_crate_72a9eb54-5002-4540-a6e6-f2620f51b550
mycodee-project_manager
nacos-api_macro
nameless-clap_derive
nameless-clap_generate
nameless-clap_up
navinfo-add_two
next-gen-proc_macros
next-gen_proc-macro
nix-ptsname_r-shim
nnye-user_server-gateway-business
noise_search_deps_librocksdb-sys
nougat-proc_macros
nt-list_macros
ntstc_libcmt-sys
ntstc_msvcrt-sys
object_store-fork
optimisticpeach-opengles_graphics
paho_mqtt_c-sys
paste_rs-cli
patch-test-crate_121213131213131
persian-rug_derive
piston-ai_behavior
piston-button_controller
piston-button_tracker
piston-dyon_interactive
piston-fake_dpi
piston-gfx_texture
piston-graphics_api_version
piston-history_tree
piston-shaders_graphics2d
piston-split_controller
piston-timer_controller
piston2d-deform_grid
piston2d-drag_controller
piston2d-gfx_graphics
piston2d-glium_graphics
piston2d-graphics_tree
piston2d-opengl_graphics
piston2d-scroll_controller
piston2d-touch_visualizer
piston2d-wgpu_graphics
piston3d-gfx_voxel
pistoncore-event_loop
pistoncore-glfw_window
pistoncore-glutin_window
pistoncore-sdl2_window
pistoncore-winit_window
plonk-bls12_381
pocket_prover-derive
pocket_prover-set
protoc-bin-vendored-linux-aarch_64
protoc-bin-vendored-linux-ppcle_64
protoc-bin-vendored-linux-x86_32
protoc-bin-vendored-linux-x86_64
protoc-bin-vendored-macos-x86_64
ptr_eq-macros
qemu-x86_64
qpid_proton-sys
quine-mc_cluskey
r2d2-influx_db_client
ra_ap_la-arena
ra_ap_lsp-server
ra_ap_proc-macro-srv-cli
ra_ap_rust-analyzer
ra_ap_vfs-notify
rest-client_codegen
retryable-proc_macros
rhizome_proc-macro-definitions
rime_api-sys
rpi_ws281x-c
rpi_ws281x-sys
rs-data-formats_derive
rs-odbc_derive
rs-tiled_json
rsudo-rs_support
rumblebars-rustlex_codegen
rust-riemann_health
rust_sodium-sys
rust_sodium_holochain_fork-sys
rustc-ap-proc_macro
rustc-ap-rustc_arena
rustc-ap-rustc_ast
rustc-ap-rustc_ast_passes
rustc-ap-rustc_ast_pretty
rustc-ap-rustc_attr
rustc-ap-rustc_cratesio_shim
rustc-ap-rustc_data_structures
rustc-ap-rustc_error_codes
rustc-ap-rustc_errors
rustc-ap-rustc_expand
rustc-ap-rustc_feature
rustc-ap-rustc_fs_util
rustc-ap-rustc_graphviz
rustc-ap-rustc_index
rustc-ap-rustc_lexer
rustc-ap-rustc_lint_defs
rustc-ap-rustc_macros
rustc-ap-rustc_parse
rustc-ap-rustc_serialize
rustc-ap-rustc_session
rustc-ap-rustc_span
rustc-ap-rustc_target
rustc-ap-syntax_pos
rustc_codegen_spirv-types
rustdt-json_rpc
rustfmt-config_proc_macro
rv32m1_ri5cy-hal
rv32m1_ri5cy-pac
safer_ffi-proc_macro
sapling-crypto_ce
screenruster-saver-laughing_man
sea-strum_macros
secret-keeper-test_util
serde_sane-rs
server-sent_events
simple-tlv_derive
simple_predicates-rs
simple_tables-core
simple_tables-derive
solana_libra_crypto-derive
solana_libra_grpcio-client
soroban-wasmi_core
sqlx-core_wasi
sqlx-macros_wasi
sqlx-rt_wasi
stm32g473-hal_oppe
stream-tls-client_hello-detector
struct_deser-derive
sudo_plugin-sys
sugiura-hiromichi_dot
sugiura-hiromichi_gc
sugiura-hiromichi_mylibrary
sugiura-hiromichi_tp
tamasfe-schemars_derive
tdameritrade_rust-async
thiserror_core2-impl
tibco_ems-sys
tls-client_hello-parser
to-syn-value_derive
tokio-postgres_wasi
tokio-stream_wasi
tokio-tungstenite_wasi
tokio-util_wasi
tokio_modbus-winets
toml-query_derive
toy-arms_derive
twain_32-sys
unic-ucd-name_aliases
update-sync_derive
user_doc-doc_data
user_doc-doc_proc_macro
user_doc-tests
va_list-helper
va_list-rs
va_list-test
vds_uuid-sys
vss_uuid-sys
web-view_suppress
webview-sys_suppress
winapi-x86_64-pc-windows-gnu
with_builtin_macros-proc_macros
with_locals-proc_macros
wlc-sys-with_elogind
wlc-with_elogind
ws2_32-sys
wsbapp_uuid-sys
x86_64-linux-nolibc
x86_64-xsave
xinput9_1_0-sys
xmltree-parse_with_config
zenoh-link-unixsock_stream

@arlosi
Copy link
Contributor

arlosi commented Apr 5, 2023

I did some more analysis on the names on crates.io. The currently limit of 1024 is definitely excessive, as nothing uses more than 8 dashes/underscores.

We'd need to set the limit to at least 16 in order to have fewer misses than the suggested algorithm of (original, underscores, dashes).

Request limit Potentially missed crates
1 68282
2 21214
4 4437
8 897
16 177
32 37
64 12
128 2
256 0

bors added a commit that referenced this issue Apr 6, 2023
Stop using UncanonicalizedIter for QueryKind::Exact

`QueryKind::Exact` causes unnecessary HTTP requests when querying for crates that don't exist. Even though the query is `Exact`, when fetching `Summaries`, Cargo still uses `UncanonicalizedIter` and requests additional possible crate names if the first one isn't found.

This change moves the use of `UncanonicalizedIter` further up the call stack such that we have the `QueryKind` available, and only do the additional queries for `QueryKind::Fuzzy`.

The impact is most noticeable when publishing a new crate that contains `-` or `_`. Since Cargo waits for publish by querying the registry, if the crate name is `my-example-test-crate`, Cargo currently makes 8 HTTP requests each second while waiting for the crate to be available. With this fix, Cargo makes only 1 request per second.

cc #11934
@est31
Copy link
Member

est31 commented Apr 7, 2023

@est31
Copy link
Member

est31 commented Apr 7, 2023

What about doing the requests in parallel? This is also how cargo update works, and a cargo update to a normal-sized repository will create requests to a couple hundred crates, so number wise it's not that excessive (cat Cargo.lock | rg "^name =" | sort | uniq | wc -l gives me 279 for the cargo repo for example). The largest number of _'s is 8 which gives 256 many requests. The problem might be the high ratio of 404s returned though, maybe some cloud providers don't like that.

If one wants to do "escalation" then one can first make a request for all names with 1 bit flipped, then wait for the answer, then if nothing is found make another request for all names with 2 bits flipped, etc, with increasing numbers of bits flipped. This is the binomial coefficients game so it decreases again after you reach the half, so maybe at that point one can make one last request with all the bit flippings exceeding n/2.

@arlosi
Copy link
Contributor

arlosi commented Apr 7, 2023

What about doing the requests in parallel?

Cargo does issue the requests in parallel. The motivation for investigating this was the high number of 404s that infra was seeing in logs.

@est31
Copy link
Member

est31 commented Apr 7, 2023

@arlosi I see. Is there some public discussion about the 404 issue?

I suppose if only 623 entries are affected then it's okay to cut some corners and emit suggestion-free diagnostics in those cases, in all other cases the - and _ only queries will show up the right result.

Ideally one would have used the migration to http indices to migrate to using the canonicalized name (as suggested in 2020), but I guess now it's too late for that.

@arlosi
Copy link
Contributor

arlosi commented Apr 8, 2023

Is there some public discussion about the 404 issue?

Yes, on the crates-io zulip.

@Eh2406
Copy link
Contributor

Eh2406 commented Apr 10, 2023

Sparse registries have the same format as a checkout of the git index. You can transition by checking out the git index and running a static file server in the resulting folder. That transition path was important enough to me to justify not fixing known problems with the index format. Feel free to blame me for that decision, especially as the decision was made knowingly.

Another decision that complicates things is that alternative registries are allowed to have multiple packages whose names only differ by -/_. Thanks to package renaming, you can even use both packages in the same project.

Thinking about it more, the currently stable implementation tries all kinds of files and returns the first one it finds. (All kinds of files include ridiculously incorrect things like a_/b-/a-b_foo as fixed in #11936.) It then reads the available packages from that first file. Rows whose name matches the value requested by the dependency get tried by the resolver, and if there are none the names that do not match are included as a suggestion. If I am correct, a registry could have multiple crates that differ only in -/_ with them all listed in the same file in the index. It would work correctly, ignoring the inefficient implementation, even if that one file wast at something hilariously invalid.

Assuming all of this is correct, I suggest that:

  1. Cargo be changed to only try the name as given, followed by the name canonicalized to _.
  2. The registry documentation should be updated to recommend that:
  • If the registry mainly supports cargoes after the change from 1., keeping one index file (canonicalized to _) with rows for all packages whose name canonical eyes to the same thing.
  • If the registry needs to support cargoes from before the un-canonical loop was added, then include a file per package based on its name.
  • Explain that in between "the un-canonical loop was added" and "change from 1." Cargo will get the correct answer if the new system is used but may not do so efficiently.
  1. Update the crates.io index to have both a-/b_/a-b_foo and a_/b_/a_b_foo files in both the git and sparse index.
  2. (If the crates.io git and sparse indexes are allowed to differ, and we have Sparse registry indexes should be viewable atomically #10928) consider removing a-/b_/a-b_foo from the sparse and removing a_/b_/a_b_foo from the git.

bors added a commit that referenced this issue May 5, 2023
do not try an exponential number of package names

re #11934, and as discussed in the cargo team meeting, this changes the strategy to "the original, all underscore, and all dashes".

I was excessively proud of the `hyphen_combination_num` based implementation when I came up with it. But it's always been a hack. I'm glad to be the one to remove it.
@weihanglo
Copy link
Member

I am going to close this since #11937 and #12083 significantly reduces network requests. If there is something still missing after #12083, feel free to reopen :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sparse-registry Area: http sparse registries C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

6 participants